Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suddenly unable to bypass CloudFlare challenge (Ubuntu Server) #2842

Closed
Jobine23 opened this issue Jun 7, 2024 · 73 comments
Closed

Suddenly unable to bypass CloudFlare challenge (Ubuntu Server) #2842

Jobine23 opened this issue Jun 7, 2024 · 73 comments
Labels
feature or fix already exists Upgrade to the latest version as needed Fun Something big happened / (maybe some sarcasm) UC Mode Undetected Chromedriver Mode (--uc) workaround exists You can reach your destination if you do this...

Comments

@Jobine23
Copy link

Jobine23 commented Jun 7, 2024

Hello, overnight my instances of seleniumbase became unable to bypass the CloudFlare challenge ( which uses CloudFlare turnstile ).

I was using an older version of SB so I updated to latest ( 4.27.4 ), and it is still not passing the challenge.

cloudflare_chal

I am using your demo code for clicking on the CloudFlare turnstile captcha:

from seleniumbase import SB

def open_the_turnstile_page(sb):
    url = "https://wildbet.gg/"
    sb.driver.uc_open_with_reconnect(url, reconnect_time=5)

def click_turnstile_and_verify(sb):
    sb.switch_to_frame("iframe")
    sb.driver.uc_click("span")
    sb.assert_element("img#captcha-success", timeout=3)

with SB(uc=True, test=True) as sb:
    open_the_turnstile_page(sb)
    try:
        click_turnstile_and_verify(sb)
    except Exception:
        open_the_turnstile_page(sb)
        click_turnstile_and_verify(sb)
    sb.set_messenger_theme(location="top_left")
    sb.post_message("SeleniumBase wasn't detected", duration=3)

if I instead use:
sb.driver.uc_open_with_reconnect(url, reconnect_time=9999)

and click manually, it works. This means they are detecting something ?

I also tried adding reconnect_time=5 on uc_click and it did not help.

I'm a big fan of your project and I've been using it for some time :)

@mdmintz mdmintz added can't reproduce We tried to see what you saw, but didn't UC Mode Undetected Chromedriver Mode (--uc) labels Jun 7, 2024
@mdmintz
Copy link
Member

mdmintz commented Jun 7, 2024

Just tested on seleniumbase 4.27.5, and the following script is working for me:

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://wildbet.gg/"
    sb.driver.uc_open_with_reconnect(url, 10)

If you're still getting blocked, it might be that you exceeded a rate limit for your IP Address.

@mdmintz mdmintz closed this as completed Jun 7, 2024
@Jobine23
Copy link
Author

Jobine23 commented Jun 7, 2024

Thank you for your reply. I am not sure to understand. Why would it be related to any kind of ratelimiting if I can edit the code to disconnect permanently then click myself so that it goes through ? The only difference between error and success is that I click manually.

@mdmintz
Copy link
Member

mdmintz commented Jun 7, 2024

You can try that if it helps. When I ran the UC Mode script for that page, the script never had to click anything to bypass the CAPTCHA.

@JimKarvo
Copy link

JimKarvo commented Jun 7, 2024

We have the same problem at headless systems -ubuntu server-. 2 different servers, the ip is rotated - whitelisted

@vmolostvov
Copy link

vmolostvov commented Jun 8, 2024

@mdmintz same problem here on linux vds (ubuntu without gpu), seleniumbase became unable to bypass the CloudFlare challenge. Using latest sb version. On local macos and windows keep working without any problem.
2567-06-08 в 16 35 40

@vmolostvov
Copy link

vmolostvov commented Jun 8, 2024

We have the same problem at headless systems -ubuntu server-. 2 different servers, the ip is rotated - whitelisted

did you find any solution sir?

@JimKarvo
Copy link

JimKarvo commented Jun 8, 2024

We have the same problem at headless systems -ubuntu server-. 2 different servers, the ip is rotated - whitelisted

did you find any solution sir?

Not yet, waiting if @mdmintz has any suggestions

@mdmintz mdmintz reopened this Jun 8, 2024
@mdmintz mdmintz added more info needed not enough info / more info needed and removed can't reproduce We tried to see what you saw, but didn't labels Jun 8, 2024
@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

OK, sounds like something has changed because there are a few different people showing up here. And this also sounds like this is specific to Linux because everything is working normally on macOS and Windows for me.

I'm still going to need more details to assist:

  • What version of Linux are you using?
  • What version of SeleniumBase are you using?
  • Does it still work on an earlier version of SeleniumBase, and if so, what's the highest version that it still works on?
  • Which Chrome/Version is installed on your machine? (Note that google-chrome and chromium are different!)
  • Does it work if you use xvfb=True (virtual display) with headed=True (override default headless mode on Linux)?

This will help me figure out what's going on. Everything is working normally for me on macOS and Windows.

@EnmeiRyuuDev
Copy link

Hello @mdmintz ,
I also had the issue yesterday at all my VMs,
Not only Linux (Ubuntu 22.04, Debian 11), but also in a Windows Server 2022,
It works perfectly only in my personal laptop having Windows 11 Home,
Basically:
At VMs:

  • it never passes without displaying the CF checkbox challenge
  • uc_click seems to be detected, and it never recovers
  • on the same SB browser, when clicking manually instead of letting SB clicks, it works pretty quick after the manual click
  • tried with or without proxy, the same results
  • versions:
    google-chrome version 125.0.6422.142 under Windows Server, google-chrome version 124.0.6367.118 under Ubuntu 22.04, google-chrome version 122.0.6261.111 under Debian 11.
    I also tried going in SeleniumBase version as old as v4.23.0, v4.20.8, but it didn't help (same blockage behavior).

At my personal laptop (Windows 11):

  • most of the time, it passes without having to click anything
  • when it has to click, it passes as well
  • tried with or without proxy (same proxy that's used for the VMs) -> always working fine
  • versions:
    google-chrome version 125.0.6422.142

It looks like Cloudflare is able to tell if I'm using a VM somehow, but yet, it works when clicking manually..
I can say it can be quickly reproduced on any Google Compute Engine VM.

I hope my description is helpful,
And thanks for the great project ^^

@EnmeiRyuuDev
Copy link

@mdmintz
To complete answering your questions:

  • Used SeleniumBase version, the latest: v4.27.5
  • On Linux instances, I tried: xvfb=True, with headed=True, but it didn't help.

@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

Looks like we've narrowed it down to an environment issue.
Try this script, and let me know which environments it works on and which it fails on:

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://visa.vfsglobal.com/fra/en/hrv/login"
    sb.driver.uc_open_with_reconnect(url, 6)
    sb.switch_to_frame("iframe")
    sb.driver.uc_click("span")
    breakpoint()

Worked for me on my Mac, but sounds like it won't work on some Linux server environments.

@SaberTawfiq
Copy link

2024-06-08_19-28-57.mp4

@JimKarvo
Copy link

JimKarvo commented Jun 8, 2024

@mdmintz For Ubuntu 20.04.6 LTS - server edtition, the script you provided not even clicks at span for some reason

seleniumbase.common.exceptions.NoSuchElementException: Message: 
 Element {span} was not present after 7 seconds!

image

@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

OK, getting somewhere. Seeing differences on Mac vs Windows. Two videos below.


On a Mac, script worked as is. (Adjust reconnect_time based on your Internet connection speed.):

Screen.Recording.2024-06-08.at.1.45.15.PM.mov

On Windows, had to add incognito=True in order for the script to work:

IMG_0750.mov

Did adding incognito=True to your SB() call help any of you?

If so, add it to all your scripts.
If not, let me know details, such as "worked locally on Windows, but not on Ubuntu Server", for example.
In order for me to fix this properly, I need to know exactly what conditions & environments are bad vs good.

@OpsecGuy
Copy link

OpsecGuy commented Jun 8, 2024

"environment": {
    "implementation_name": "cpython",
    "implementation_version": "3.12.1",
    "os_name": "nt",
    "platform_machine": "AMD64",
    "platform_release": "10",
    "platform_system": "Windows",
    "platform_version": "10.0.19045",
    "python_full_version": "3.12.1",
    "platform_python_implementation": "CPython",
    "python_version": "3.12",
    "sys_platform": "win32"
  }
Name: seleniumbase
Version: 4.27.5

Driver object:

def create(self):
        return Driver(
            browser='chrome',
            incognito=True,
            dark_mode=True,
            headless2=self.summary.headless,
            proxy=self.proxy_addr,
            uc=True,
            uc_subprocess=True,
            # extension_dir=self.extension_dir if self.summary.captcha == True else None
        )

Logic:

target = "https://netguard.io/sduyafgewsdfn.php" # Website with cf captcha
self.session = self.create()
self.session.uc_open(target)
self.session.reconnect(2.55 if not self.summary.emulation_force else 6.0)

# In here getting Cloudflare captcha loop
try:
    self.session.switch_to_frame("iframe")
    self.session.uc_click("span")
    self.session.switch_to.default_content()
    self.session.sleep(3.5)
except Exception: pass

Worth to add I have tested on my local IP address, tested on proxyscrape.com dedicated proxies and rotating ones and also other provider with IPv6 proxies. On all proxy versions same issue.

@JimKarvo
Copy link

JimKarvo commented Jun 8, 2024

I have this code (tryied also with incognito):

from seleniumbase import SB
import time 

with SB(uc=True, test=True) as sb:
    url = "https://visa.vfsglobal.com/fra/en/hrv/login"
    sb.driver.uc_open_with_reconnect(url, 6)
    sb.switch_to_frame("iframe")
    sb.driver.uc_click("span")
    time.sleep(3)
    sb.save_screenshot("bypass.png")

at windows, I get the CF passed.
at Linux, i get the CF not clicked

windows 11:
bypass

linux - ubuntu server 20.04:
bypass

@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

@JimKarvo Thank you... That is helpful. I see the same thing: Working on both macOS and Windows, but not for Linux (Ubuntu server 20.04). I will look into improvements specifically on the Ubuntu side (since things appear to be working on macOS and Windows, at least with incognito=True). Possibly Cloudflare found a way to see if automation is running on Ubuntu.

I'll be working on improvements to SeleniumBase Ubuntu configuration. In the meantime, people can keep passing on any information that I might find useful. (If anyone is still having macOS or Windows issues, I'll need an example that reproduces your issue, because so far things appear to be normal on the local desktop front, at least with incognito=True.)

@mdmintz mdmintz changed the title Suddenly unable to bypass CloudFlare challenge Suddenly unable to bypass CloudFlare challenge (Ubuntu Server) Jun 8, 2024
@JimKarvo
Copy link

JimKarvo commented Jun 8, 2024

@mdmintz I don't know, but changing any user agent, maybe will help the situation?

If I can help with access to any of my VMs (ubuntu server), I can send ssh details

@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

@JimKarvo Can you tell me the user-agent that appears when you ran on Ubuntu? And the agent you saw when running locally? Maybe there's a clue there.

@JimKarvo
Copy link

JimKarvo commented Jun 8, 2024

@mdmintz

From Linux I get:

GET / HTTP/1.1
Host: test.requestcatcher.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br, zstd
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Sec-Ch-Ua: "Chromium";v="125", "Not.A/Brand";v="24"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Linux"
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: cross-site
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.141 Safari/537.36

from Windows I get:

GET /test HTTP/1.1
Host: test.requestcatcher.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br, zstd
Accept-Language: el-GR,el;q=0.9
Connection: keep-alive
Sec-Ch-Ua: "Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Windows"
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: cross-site
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

If there's a human-controlled web browser with that same user-agent on that system (Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.141 Safari/537.36) I'm wondering if they would experience the same issue. Anyone having any issues if they use a Linux computer with a GUI and a regular web browser? (Possibly Cloudflare is only blocking based on that agent - not because they detected Selenium.)

@OpsecGuy
Copy link

OpsecGuy commented Jun 8, 2024

@JimKarvo Thank you... That is helpful. I see the same thing: Working on both macOS and Windows, but not for Linux (Ubuntu server 20.04). I will look into improvements specifically on the Ubuntu side (since things appear to be working on macOS and Windows, at least with incognito=True). Possibly Cloudflare found a way to see if automation is running on Ubuntu.

I'll be working on improvements to SeleniumBase Ubuntu configuration. In the meantime, people can keep passing on any information that I might find useful. (If anyone is still having macOS or Windows issues, I'll need an example that reproduces your issue, because so far things appear to be normal on the local desktop front, at least with incognito=True.)

Not really. As mentioned by me above. I still struggle with bypassing on Windows. I don't know if there is anything else I could add to it. If you need more info tell me exactly what should I be looking for.

@mdmintz
Copy link
Member

mdmintz commented Jun 8, 2024

@OpsecGuy You seem to be the only one having Windows issues at this time. You also introduced a lot of variables into the equation in your example. Eg. 1. using Driver() instead of SB(). 2. Using Dark Mode. 3. Using any headless mode. 4. Changing proxy settings. 5. Adding an extension. You should stick with the specific examples I tried in this thread (or another UC Mode example from the SeleniumBase/examples folder) for debugging purposes.

@JimKarvo
Copy link

JimKarvo commented Jun 10, 2024

@mdmintz I tryied this one, after failing to resolve the captcha

look like the disable_features="UserAgentClientHint" not working at all

from seleniumbase import SB

ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
with SB(uc=True, test=True, disable_features="UserAgentClientHint", agent=ua) as sb:
    print("getting req catcher") 
    url = "https://jimkarvo.requestcatcher.com/test"
    sb.driver.uc_open_with_reconnect(url, 1)
    breakpoint()

the data i received:

GET /test HTTP/1.1
Host: jimkarvo.requestcatcher.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br, zstd
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Sec-Ch-Ua: "Chromium";v="125", "Not.A/Brand";v="24"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Linux"
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
**User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0** Safari/537.36

@sakarimov
Copy link

sakarimov commented Jun 10, 2024

from seleniumbase import SB

ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
with SB(uc=True, test=True, disable_features="UserAgentClientHint", agent=ua) as sb:
    print("getting req catcher") 
    url = "https://jimkarvo.requestcatcher.com/test"
    sb.driver.uc_open_with_reconnect(url, 1)
    breakpoint()

@JimKarvo your reconnect duration seems too small, make it bigger like 7 / 8, in my case i use 20 and it works just fine

@JimKarvo
Copy link

from seleniumbase import SB

ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
with SB(uc=True, test=True, disable_features="UserAgentClientHint", agent=ua) as sb:
    print("getting req catcher") 
    url = "https://jimkarvo.requestcatcher.com/test"
    sb.driver.uc_open_with_reconnect(url, 1)
    breakpoint()

@JimKarvo your reconnect duration seems too small, make it bigger like 7 / 8, in my case i use 20 and it works just fine

the above code, it's just for getting the user-agent and all data that browser sends to a server while request a page.

@mdmintz
Copy link
Member

mdmintz commented Jun 10, 2024

@JimKarvo This site is a good one for seeing all the headers: https://browserleaks.com/client-hints
@jens4626 My Windows machine had the same UA, and bypassed without issue.
@sakarimov Seeing similar. Changing the User Agent makes a difference.

So what have we learned? Cloudflare made changes. Previously, they only blocked you if they detected Selenium, but now they are blocking you for other things, such as User Agent.

Three types of User Agents now (in combination with UC Mode):

  • Good: Bypass CAPTCHA immediately.
  • Not that good: You have to click the CAPTCHA in a stealthy way to bypass it.
  • Really bad: Blocked immediately. (Eg. You have HeadlessChrome in your User Agent.)

You may have to change your User Agent on Linux to be "Good".

For the "Not that good", you'll need to use pyautogui to click. (They are currently detecting any type of JS used to click Turnstile checkboxes, even with uc_click.)

This should work every time as long as the machine has a GUI:

import pyautogui
from seleniumbase import SB

ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"

with SB(uc=True, test=True, agent=ua, disable_features="UserAgentClientHint", incognito=True) as sb:
    url = "https://www.virtualmanager.com/da/login"
    sb.driver.uc_open_with_reconnect(url, 8)
    if sb.is_element_visible("iframe"):
        sb.switch_to_frame("iframe")
        sb.execute_script('document.querySelector("input").focus()')
        sb.disconnect()
        pyautogui.press(" ")
        sb.driver.reconnect(4)
    breakpoint()

@jens4626
Copy link

Thanks @mdmintz!
Workaround using pyautogui does indeed work - I do hope to see a fix using uc_click fix soon!

@jens4626
Copy link

Sadly the pyautogui does not always seems to bypass.
Hope you're working on a uc_click fix :)

@mdmintz
Copy link
Member

mdmintz commented Jun 13, 2024

@jens4626 Make sure all your pyautogui actions happen after the sb.disconnect(). Then when done, call sb.connect() / sb.reconnect() to use Selenium actions again.

@jens4626
Copy link

@mdmintz It does work - but its only 50% chance that it works.

I currently have 3 situations:

  1. The space bar presses doesn't get recognized so it won't bypass - I tried adding a few more hoping it would solve it, but negative.
  2. It does recognize space bar pressing as click but Cloudflare detects it, so it will loop through again.
  3. It sends the space bar and it bypasses Cloudflare.

I did use that code you provided and it works - but just not always.

So not sure what to do tbh.

` sb.switch_to_frame("iframe")
print("Switched to iframe")

        # Waiting to ensure the iframe is loaded
        time.sleep(2)
        
        # Focus on the input element
        sb.execute_script('document.querySelector("input").focus()')
        time.sleep(2)
        
        # Disconnecting the SeleniumBase driver
        print("Disconnecting SB")
        sb.disconnect()
        time.sleep(2)
        
        # Press the space bar with a short delay in between
        pyautogui.press(" ")
        time.sleep(1)
        pyautogui.press(" ")
        time.sleep(1)
        pyautogui.press(" ")
        time.sleep(1)
        pyautogui.press(" ")
        print("Pressed space four times")
        
        # Waiting for the actions to complete
        time.sleep(2)
        
        # Reconnecting the SeleniumBase driver
        print("Reconnecting SB")
        sb.driver.reconnect(4)

`

@mdmintz
Copy link
Member

mdmintz commented Jun 13, 2024

@jens4626 The spacebar from pyautogui might not get recognized if your Selenium window is not the active window on top. Try that (making sure the window is on top and active) while I'm still working on improvements...

@ismayilibrahimov
Copy link

ismayilibrahimov commented Jun 14, 2024

This was also not working for me when disconnecting from remote windows 10 (azure vm). So as @mdmintz mentioned, we have to keep chrome window active.

Updated code:

from seleniumbase import SB

ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"

with SB(uc=True, test=True, agent=ua, disable_features="UserAgentClientHint", incognito=True) as sb:
    sb.driver.maximize_window()
    url = "https://www.virtualmanager.com/da/login"
    sb.driver.uc_open_with_reconnect(url, 8)
    if sb.is_element_visible("iframe"):
        sb.switch_to_frame("iframe")
        sb.execute_script('document.querySelector("input").focus()')
        sb.disconnect()
        pyautogui.press(" ")
        sb.driver.reconnect(4)
    breakpoint()

As a note, when I disconnect from remote desktop, windows gui is disabled. So, in order to keep your current session active, you can use this instruction.

@SaberTawfiq
Copy link

I test https://github.com/sarperavci/CloudflareBypassForScraping the script uses the DrissionPage
When you open the browser normally, it automatically succeeds without clicking to confirm that you are a human.
When verify you are human appears, he clicks on the box and success is given. Can you merge or modify uc_click to work on the same principle on the Seleniumbase

@EnmeiRyuuDev
Copy link

I test https://github.com/sarperavci/CloudflareBypassForScraping the script uses the DrissionPage When you open the browser normally, it automatically succeeds without clicking to confirm that you are a human. When verify you are human appears, he clicks on the box and success is given. Can you merge or modify uc_click to work on the same principle on the Seleniumbase

I confirm, the DrissionPage solution bypasses the cloudflare click under Linux.

@EnmeiRyuuDev
Copy link

EnmeiRyuuDev commented Jun 14, 2024

The pyautogui.press(" ") solution works consistently as well under Linux.
You can make it work in headless mode, and in a multi-process environment, by attaching it to a virtual display.
This code worked for me under Ubuntu/ Debian (note that headed=True; but Selenium will run anyway headless)..

from seleniumbase import SB
import pyautogui
from pyvirtualdisplay.display import Display
disp = Display(visible=True, size=(1366, 768), backend="xvfb", use_xauth=True)
disp.start()

import Xlib.display
pyautogui._pyautogui_x11._display = Xlib.display.Display(os.environ['DISPLAY'])

with SB(uc=True, headed=True) as sb:
     ...

@mdmintz
Copy link
Member

mdmintz commented Jun 14, 2024

@EnmeiRyuuDev SeleniumBase uses the built-in sbvirtualdisplay like this:

self._xvfb_display = Display(visible=0, size=(width, height))
self._xvfb_display.start()

Will that work with the code you added?

import Xlib.display
pyautogui._pyautogui_x11._display = Xlib.display.Display(os.environ['DISPLAY'])

I assume you installed this: python-xlib?
Does pyautogui need that to succeed on Linux?

@EnmeiRyuuDev
Copy link

EnmeiRyuuDev commented Jun 14, 2024

@mdmintz in my tests, this piece of code was required:

from pyvirtualdisplay.display import Display
disp = Display(visible=True, size=(1366, 768), backend="xvfb", use_xauth=True)
disp.start()

Otherwise, Selenium will not run headless.

Also, headed=True was required but still SB runs headless which is perfect, otherwise pyautogui will not work.
Also, the code will not work under Windows, only valid under Linux. I remember I only had to install the pyvirtualdisplay.
Also in the Linux environment, some packages are necessary:
sudo apt-get install python3.10-tk python3-dev tk-dev
And rebuilding the Python afterwards:

sudo ./configure --enable-optimizations
sudo make -j 2
sudo make altinstall

This is my complete test code:

import os
from seleniumbase import SB
import time
import sys
import random
import math
import pyautogui

from pyvirtualdisplay.display import Display
disp = Display(visible=True, size=(1366, 768), backend="xvfb", use_xauth=True)
disp.start()

import Xlib.display
pyautogui._pyautogui_x11._display = Xlib.display.Display(os.environ['DISPLAY'])


with SB(uc=True, headed=True, proxy=None) as sb:
    print('Started..')
    url = "https://gitlab.com/users/sign_in"
    sb.driver.uc_open_with_reconnect(url, 10)
    if sb.is_element_visible("iframe"):
        sb.switch_to_frame("iframe")
        sb.execute_script('document.querySelector("input").focus()')
        sb.disconnect()
        print('Click..')
        pyautogui.press(" ")
        sb.driver.reconnect(10)
    random_number = random.randint(1000, 9999)
    filename = f"screenshot_{random_number}.png"
    sb.save_screenshot(filename)
    print('End.')

What was interesting, is that when running multiple headless instances (+20 chrome driver instances), they all click independently without that window overlapping issue.

@jens4626
Copy link

Thanks for the input @mdmintz and @ismayilibrahimov.

I think the issue with not clicking was due to me.
But I still face issues with it not being able to bypass as you can see:
https://github.com/seleniumbase/SeleniumBase/assets/45258332/ee31069b-8d09-4210-9d70-cac90e5a4b18

It might be caused by bad IP score and now using pyautogui - it was never an issue with uc.click.

I was already using what you mentioned @ismayilibrahimov so thats not the problem either.

@mdmintz
Copy link
Member

mdmintz commented Jun 18, 2024

More details:

Now, if your User-Agent looks untrustworthy, CloudFlare makes you click the CAPTCHA (which has been improved). If they detect either Selenium in the browser or JavaScript involvement in clicking the CAPTCHA, they don't let the click through. That's why pyautogui is now required for clicking the CAPTCHA if the User-Agent isn't trustworthy enough. The default user-agent set on macOS and Windows by SeleniumBase is generally good enough. On Linux, the default User-Agent might not be good enough: You may need to specify a better one to avoid needing to click the CAPTCHA in that scenario. (Or just use the pyautogui workaround for clicking it... Scroll up to see some examples that use it.)

I'm working on an update that can optionally utilize the pyautogui workaround if needed. That will likely need an update to examples because the existing uc_click might not be good enough if the User-Agent isn't trustworthy enough.

This probably means a new UC Mode Video Tutorial (Part 3) is likely to happen soon to explain the changes.

@ismayilibrahimov
Copy link

I am using windows 10 (without headless mode) at azure, and CloudFlare requires to click. I think user-agent is not the only issue

More details:

Now, if your User-Agent looks untrustworthy, CloudFlare makes you click the CAPTCHA (which has been improved). If they detect either Selenium in the browser or JavaScript involvement in clicking the CAPTCHA, they don't let the click through. That's why pyautogui is now required for clicking the CAPTCHA if the User-Agent isn't trustworthy enough. The default user-agent set on macOS and Windows by SeleniumBase is generally good enough. On Linux, the default User-Agent might not be good enough: You may need to specify a better one to avoid needing to click the CAPTCHA in that scenario. (Or just use the pyautogui workaround for clicking it... Scroll up to see some examples that use it.)

I'm working on an update that can optionally utilize the pyautogui workaround if needed. That will likely need an update to examples because the existing uc_click might not be good enough if the User-Agent isn't trustworthy enough.

This probably means a new UC Mode Video Tutorial (Part 3) is likely to happen soon to explain the changes.

@mdmintz
Copy link
Member

mdmintz commented Jun 20, 2024

@ismayilibrahimov Azure has a known IP-range (just like AWS or GCP). That's why residential proxies have become so popular lately for web-scraping.

@enricodvn
Copy link

Were you guys able to use this alternative with proxy?

So, for me CF started showing the challenge the same time around, and it only happens when I am using proxy (on servers).

Local without proxy it works fine. But when I use proxy, even on local env, bam there is the captcha. They somehow are detecting the proxy.

If I try to use the alternative with pyautogui, it works without proxy, but if I use proxy this is what happens:

asd1

@mdmintz
Copy link
Member

mdmintz commented Jun 20, 2024

@enricodvn Which alternative are you using? The one with Xlib.display? As for proxies, I haven't seen any local issues with using them, although maybe it works better when the time zone of the proxy is in the same time zone as your browser.

@enricodvn
Copy link

Yes, the last one from #2842 (comment).

Hmm, this time zone setting is interesting, anyway I can set it through driver?

I will try to tweak with it.

@mdmintz
Copy link
Member

mdmintz commented Jun 20, 2024

@enricodvn The time zone can be set via execute_cdp_cmd, but CDP changes go away when the driver is disconnected. Would need a way to configure it before the browser is launched. (Also possible that it's unrelated to the time zone difference.)

@ankushkumarpatiyal
Copy link

def open_the_turnstile_page(sb,url):
    url = url
    sb.driver.uc_open_with_reconnect(url, 15)
    print('getting websie')
    screen = 'new.jpg'
    sb.save_screenshot(os.path.join(settings.MEDIA_ROOT,screen))
    if sb.is_element_visible("iframe"):
        print('inside if')
        sb.switch_to_frame("iframe")
        sleep(1)
        print('iframe found')
        sb.execute_script('document.querySelector("input").focus()')
        sb.disconnect()
        pyautogui.press(" ")
        print('pressed')
        file='new_screenshot.png'
        print('file saved')
        sb.driver.reconnect(3)
        sb.save_screenshot(os.path.join(settings.MEDIA_ROOT,file))
    random_number = random.randint(1000, 9999)
    filename = f"screenshot_{random_number}.png"
    sb.save_screenshot(os.path.join(settings.MEDIA_ROOT,filename))
    print('saved screenshot')
    page_source = sb.get_page_source()
    print(page_source)
    
    
    most of the time after pyautogui presses and i get print statement in my console('pressed') my code stops there like it doesnt fail but still it gets stuck, why? can anyone help me out here and i am on ubuntu server and on local machine everything works just fine 

@mdmintz mdmintz added feature or fix already exists Upgrade to the latest version as needed and removed more info needed not enough info / more info needed labels Jun 23, 2024
@mdmintz
Copy link
Member

mdmintz commented Jun 23, 2024

This was resolved in 4.28.0 - https://github.com/seleniumbase/SeleniumBase/releases/tag/v4.28.0

Read #2865 for all the details. You may need to use the new UC Mode methods in 4.28.0, such as driver.uc_gui_handle_cf(), in order to successfully click through CF CAPTCHA checkboxes on Linux.

@mdmintz mdmintz closed this as completed Jun 23, 2024
@mdmintz mdmintz added the Fun Something big happened / (maybe some sarcasm) label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature or fix already exists Upgrade to the latest version as needed Fun Something big happened / (maybe some sarcasm) UC Mode Undetected Chromedriver Mode (--uc) workaround exists You can reach your destination if you do this...
Projects
None yet
Development

No branches or pull requests