Skip to content

Commit

Permalink
docs: remove dead link (#865)
Browse files Browse the repository at this point in the history
  • Loading branch information
TC-MO authored Feb 22, 2024
1 parent 669bc4e commit fe6357b
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ slug: /anti-scraping/mitigation/cloudflare-challenge.md

---

There are a few strategies that can be employed if you find yourself stuck. One key strategy is to ensure that your browser fingerprint is consistent. In some cases, the default browser fingerprint may actually be more effective than an inconsistently generated fingerprint. Additionally, it may be beneficial to avoid masking a Linux browser to look like a Windows or macOS browser, although this will depend on the specific configuration of the website you are targeting.
If you find yourself stuck, there are few strategies that you can employ. One key strategy is to ensure that your browser fingerprint is consistent. In some cases, the default browser fingerprint may actually be more effective than an inconsistently generated fingerprint. Additionally, it may be beneficial to avoid masking a Linux browser to look like a Windows or macOS browser, although this will depend on the specific configuration of the website you are targeting.

For those using Crawlee, the library provides out-of-the-box support for generating consistent fingerprints that are able to pass the Cloudflare challenge. However, it's important to note that in some cases, the Cloudflare challenge screen may return a 403 status code even if it is evaluating the fingerprint and the request is not blocked. This can cause the default Crawlee browser crawlers to throw an error and not wait until the challenge is submitted and the page is redirected to the target webpage.

Expand All @@ -28,7 +28,7 @@ const crawler = new PlaywrightCrawler({

It's important to note that by removing default blocked status code handling, you should also add custom session retire logic on blocked pages to reduce retries. Additionally, you should add waiting logic to start the automation logic only after the Cloudflare challenge is solved and the page is redirected. This can be accomplished by waiting for a common selector that is available on all pages, such as a header logo.

In some cases, the browser may not pass the check and you may be presented with a captcha, indicating that your IP address has been graylisted. If you are working with a large pool of proxies you can retire the session and use another IP. However if you have small pool of proxies you might want to whitelist the IP. To do this, you'll need to solve the captcha to improve your IP address's reputation. There are various captcha-solving services available, such as [AntiCaptcha](https://anti-captcha.com/) or [AnyCaptcha](https://anycaptcha.com/), that you can use for this purpose. For more info check the section about [Captchas](../techniques/captchas.md).
In some cases, the browser may not pass the check and you may be presented with a captcha, indicating that your IP address has been graylisted. If you are working with a large pool of proxies you can retire the session and use another IP. However if you have small pool of proxies you might want to whitelist the IP. To do this, you'll need to solve the captcha to improve your IP address's reputation. You can find various captcha-solving services, such as [AntiCaptcha](https://anti-captcha.com/), that you can use for this purpose. For more info check the section about [Captchas](../techniques/captchas.md).

![Cloudflare captcha](https://images.ctfassets.net/slt3lc6tev37/6sN2VXiUaJpjxqVfTbZEJd/9a4e13cbf08ce29797167c133c534e1f/image1.png)

Expand Down
1 change: 1 addition & 0 deletions vale.ini
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ Microsoft.Contractions = NO
Microsoft.Foreign = NO
Microsoft.We = NO
Microsoft.Quotes = NO
Microsoft.ThereIs = NO

0 comments on commit fe6357b

Please sign in to comment.