Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDP Mode - Patch 1 #3217

Merged
merged 4 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 43 additions & 7 deletions examples/cdp_mode/ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

👤 <b translate="no">UC Mode</b> avoids bot-detection by first disconnecting WebDriver from the browser at strategic times, calling special <code>PyAutoGUI</code> methods to bypass CAPTCHAs (as needed), and finally reconnecting the <code>driver</code> afterwards so that WebDriver actions can be performed again. Although this approach works for bypassing simple CAPTCHAs, more flexibility is needed for bypassing bot-detection on websites with advanced protection. (That's where <b translate="no">CDP Mode</b> comes in.)

🐙 <b translate="no">CDP Mode</b> is based on <a href="https://github.com/HyperionGray/python-chrome-devtools-protocol" translate="no">python-cdp</a>, <a href="https://github.com/HyperionGray/trio-chrome-devtools-protocol" translate="no">trio-cdp</a>, and <a href="https://github.com/ultrafunkamsterdam/nodriver" translate="no">nodriver</a>. <code>trio-cdp</code> was an early implementation of <code>python-cdp</code>, whereas <code>nodriver</code> is a modern implementation of <code>python-cdp</code>. (Refactored CDP code is imported from <a href="https://github.com/mdmintz/MyCDP" translate="no">MyCDP</a>.)
🐙 <b translate="no">CDP Mode</b> is based on <a href="https://github.com/HyperionGray/python-chrome-devtools-protocol" translate="no">python-cdp</a>, <a href="https://github.com/HyperionGray/trio-chrome-devtools-protocol" translate="no">trio-cdp</a>, and <a href="https://github.com/ultrafunkamsterdam/nodriver" translate="no">nodriver</a>. <code>trio-cdp</code> is an early implementation of <code>python-cdp</code>, and <code>nodriver</code> is a modern implementation of <code>python-cdp</code>. (Refactored Python-CDP code is imported from <a href="https://github.com/mdmintz/MyCDP" translate="no">MyCDP</a>.)

🐙 <b translate="no">CDP Mode</b> includes multiple updates to the above, such as:

Expand All @@ -19,12 +19,41 @@

--------

### 🐙 <b translate="no">CDP Mode</b> initialization:
### 🐙 <b translate="no">CDP Mode</b> usage:

* `sb.activate_cdp_mode(url)`
* **`sb.activate_cdp_mode(url)`**

> (Call that from a **UC Mode** script)

That disconnects WebDriver from Chrome (which prevents detection), and gives you access to `sb.cdp` methods (which don't trigger anti-bot checks).

### 🐙 Here are some common `sb.cdp` methods:

* `sb.cdp.click(selector)`
* `sb.cdp.click_if_visible(selector)`
* `sb.cdp.type(selector, text)`
* `sb.cdp.press_keys(selector, text)`
* `sb.cdp.select_all(selector)`
* `sb.cdp.get_text(selector)`

When `type()` is too fast, use the slower `press_keys()` to avoid detection. You can also use `sb.sleep(seconds)` to slow things down.

To use WebDriver methods again, call:

* **`sb.reconnect()`** or **`sb.connect()`**

(Note that reconnecting allows anti-bots to detect you, so only reconnect if it is safe to do so.)

To disconnect again, call:

* **`sb.disconnect()`**

While disconnected, if you accidentally call a WebDriver method, then SeleniumBase will attempt to use the CDP Mode version of that method (if available). For example, if you accidentally call `sb.click(selector)` instead of `sb.cdp.click(selector)`, then your WebDriver call will automatically be redirected to the CDP Mode version. Not all WebDriver methods have a matching CDP Mode method. In that scenario, calling a WebDriver method while disconnected could raise an error, or make WebDriver automatically reconnect first.

To find out if WebDriver is connected or disconnected, call:

* **`sb.is_connected()`**

--------

### 🐙 <b translate="no">CDP Mode</b> examples:
Expand All @@ -45,13 +74,15 @@ from seleniumbase import SB
with SB(uc=True, test=True, locale_code="en") as sb:
url = "https://www.pokemon.com/us"
sb.activate_cdp_mode(url)
sb.sleep(1)
sb.sleep(1.5)
sb.cdp.click_if_visible("button#onetrust-reject-all-handler")
sb.sleep(0.5)
sb.cdp.click('a[href="https://www.pokemon.com/us/pokedex/"]')
sb.sleep(1)
sb.cdp.click('b:contains("Show Advanced Search")')
sb.sleep(1)
sb.cdp.click('span[data-type="type"][data-value="electric"]')
sb.sleep(0.5)
sb.cdp.click("a#advSearch")
sb.sleep(1)
sb.cdp.click('img[src*="img/pokedex/detail/025.png"]')
Expand Down Expand Up @@ -99,7 +130,7 @@ from seleniumbase import SB
with SB(uc=True, test=True, locale_code="en") as sb:
url = "https://www.hyatt.com/"
sb.activate_cdp_mode(url)
sb.sleep(1)
sb.sleep(1.5)
sb.cdp.click_if_visible('button[aria-label="Close"]')
sb.sleep(0.5)
sb.cdp.click('span:contains("Explore")')
Expand Down Expand Up @@ -188,10 +219,14 @@ with SB(uc=True, test=True, locale_code="en") as sb:

```python
sb.cdp.get(url)
sb.cdp.reload()
sb.cdp.open(url)
sb.cdp.reload(ignore_cache=True, script_to_evaluate_on_load=None)
sb.cdp.refresh()
sb.cdp.get_event_loop()
sb.cdp.add_handler(event, handler)
sb.cdp.find_element(selector)
sb.cdp.find(selector)
sb.cdp.locator(selector)
sb.cdp.find_all(selector)
sb.cdp.find_elements_by_text(text, tag_name=None)
sb.cdp.select(selector)
Expand All @@ -205,6 +240,7 @@ sb.cdp.load_cookies(*args, **kwargs)
sb.cdp.clear_cookies(*args, **kwargs)
sb.cdp.sleep(seconds)
sb.cdp.bring_active_window_to_front()
sb.cdp.bring_to_front()
sb.cdp.get_active_element()
sb.cdp.get_active_element_css()
sb.cdp.click(selector)
Expand All @@ -231,7 +267,7 @@ sb.cdp.medimize()
sb.cdp.set_window_rect()
sb.cdp.reset_window_size()
sb.cdp.get_window()
sb.cdp.get_text()
sb.cdp.get_text(selector)
sb.cdp.get_title()
sb.cdp.get_current_url()
sb.cdp.get_origin()
Expand Down
4 changes: 2 additions & 2 deletions examples/cdp_mode/raw_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


async def main():
driver = await cdp_driver.cdp_util.start()
driver = await cdp_driver.cdp_util.start_async()
page = await driver.get("https://www.priceline.com/")
time.sleep(3)
print(await page.evaluate("document.title"))
Expand All @@ -21,7 +21,7 @@ async def main():
loop.run_until_complete(main())

# Call everything without using async / await
driver = loop.run_until_complete(cdp_driver.cdp_util.start())
driver = cdp_driver.cdp_util.start_sync()
page = loop.run_until_complete(driver.get("https://www.pokemon.com/us"))
time.sleep(3)
print(loop.run_until_complete(page.evaluate("document.title")))
Expand Down
10 changes: 5 additions & 5 deletions examples/cdp_mode/raw_footlocker.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@
url = "https://www.footlocker.com/"
sb.activate_cdp_mode(url)
sb.sleep(3)
sb.cdp.click_if_visible("button#touAgreeBtn")
sb.sleep(1)
sb.cdp.click_if_visible('button[id*="Agree"]')
sb.sleep(1.5)
sb.cdp.mouse_click('input[aria-label="Search"]')
sb.sleep(1.5)
search = "Nike Shoes"
sb.cdp.click('input[aria-label="Search"]')
sb.sleep(1)
sb.cdp.press_keys('input[aria-label="Search"]', search)
sb.sleep(2)
sb.cdp.click('ul[id*="typeahead"] li div')
sb.cdp.mouse_click('ul[id*="typeahead"] li div')
sb.sleep(2)
elements = sb.cdp.select_all("a.ProductCard-link")
if elements:
Expand Down
2 changes: 1 addition & 1 deletion examples/cdp_mode/raw_hyatt.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
with SB(uc=True, test=True, locale_code="en") as sb:
url = "https://www.hyatt.com/"
sb.activate_cdp_mode(url)
sb.sleep(1)
sb.sleep(1.5)
sb.cdp.click_if_visible('button[aria-label="Close"]')
sb.sleep(0.5)
sb.cdp.click('span:contains("Explore")')
Expand Down
4 changes: 3 additions & 1 deletion examples/cdp_mode/raw_pokemon.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
with SB(uc=True, test=True, locale_code="en") as sb:
url = "https://www.pokemon.com/us"
sb.activate_cdp_mode(url)
sb.sleep(1)
sb.sleep(1.5)
sb.cdp.click_if_visible("button#onetrust-reject-all-handler")
sb.sleep(0.5)
sb.cdp.click('a[href="https://www.pokemon.com/us/pokedex/"]')
sb.sleep(1)
sb.cdp.click('b:contains("Show Advanced Search")')
sb.sleep(1)
sb.cdp.click('span[data-type="type"][data-value="electric"]')
sb.sleep(0.5)
sb.cdp.click("a#advSearch")
sb.sleep(1)
sb.cdp.click('img[src*="img/pokedex/detail/025.png"]')
Expand Down
2 changes: 1 addition & 1 deletion examples/cdp_mode/raw_req_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ async def request_paused_handler(self, event, tab):
)

async def start_test(self):
driver = await cdp_driver.cdp_util.start(incognito=True)
driver = await cdp_driver.cdp_util.start_async(incognito=True)
tab = await driver.get("about:blank")
tab.add_handler(mycdp.fetch.RequestPaused, self.request_paused_handler)
url = "https://gettyimages.com/photos/firefly-2003-nathan"
Expand Down
2 changes: 1 addition & 1 deletion seleniumbase/__version__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# seleniumbase package
__version__ = "4.32.0"
__version__ = "4.32.1"
4 changes: 1 addition & 3 deletions seleniumbase/core/browser_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,9 +525,6 @@ def uc_open_with_cdp_mode(driver, url=None):
js_utils.call_me_later(driver, script, 3)
time.sleep(0.012)
driver.close()
driver.clear_cdp_listeners()
driver.delete_all_cookies()
driver.delete_network_conditions()
driver.disconnect()

cdp_details = driver._get_cdp_details()
Expand All @@ -546,6 +543,7 @@ def uc_open_with_cdp_mode(driver, url=None):
cdp_util.start(host=cdp_host, port=cdp_port)
)
page = loop.run_until_complete(driver.cdp_base.get(url))
loop.run_until_complete(page.activate())
if not safe_url:
time.sleep(constants.UC.CDP_MODE_OPEN_WAIT)
cdp = types.SimpleNamespace()
Expand Down
7 changes: 5 additions & 2 deletions seleniumbase/undetected/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,11 @@ def __init__(
options = ChromeOptions()
try:
if hasattr(options, "_session") and options._session is not None:
# Prevent reuse of options
raise RuntimeError("You cannot reuse the ChromeOptions object")
# Prevent reuse of options.
# (Probably a port overlap. Quit existing driver and continue.)
logger.debug("You cannot reuse the ChromeOptions object")
with suppress(Exception):
options._session.quit()
except AttributeError:
pass
options._session = self
Expand Down
16 changes: 16 additions & 0 deletions seleniumbase/undetected/cdp_driver/cdp_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ async def start(
Helper function to launch a browser. It accepts several keyword parameters.
Conveniently, you can just call it bare (no parameters) to quickly launch
an instance with best practice defaults.
Note: Due to a Chrome-130 bug, use start_async or start_sync instead.
(Calling this method directly could lead to an unresponsive browser)
Note: New args are expected: Use kwargs only!
Note: This should be called ``await start()``
:param user_data_dir:
Expand Down Expand Up @@ -88,6 +90,20 @@ async def start(
return await Browser.create(config)


async def start_async(*args, **kwargs) -> Browser:
headless = False
if "headless" in kwargs:
headless = kwargs["headless"]
decoy_args = kwargs
decoy_args["headless"] = True
driver = await start(**decoy_args)
kwargs["headless"] = headless
kwargs["user_data_dir"] = driver.config.user_data_dir
driver.stop() # Due to Chrome-130, must stop & start
time.sleep(0.15)
return await start(*args, **kwargs)


def start_sync(*args, **kwargs) -> Browser:
loop = asyncio.get_event_loop()
headless = False
Expand Down