Block AI bots #2542

kimadactyl · 2024-07-29T15:28:06Z

Use case

Promoted by this: https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/

Given that our Community site is only for hosting open source projects, AWS and Cloudflare do give us sponsored plans, but we only have a limited number of credits each year. The additional bandwidth costs AI crawlers are currently causing will likely mean we will run out of AWS credits early.

By blocking these crawlers, bandwidth for our downloaded files has decreased by 75% (~800GB/day to ~200GB/day). If all this traffic hit our origin servers, it would cost around $50/day, or $1,500/month, along with the increased load on our servers.

Normal traffic gets cached by our CDN, and doesn't cost us anything for bandwidth. But because many of these files are not downloaded often (and they're large), the cache is usually expired and the requests hit our origin servers directly, causing substantial bandwidth charges. Zipped documentation was designed for offline consumption by users, not for crawlers.

We should look into filtering out AI requests.

This also obviously has legal and moral benefits - keeping github to the technical ones though.

Acceptance criteria

AI requests are filtered out before hitting PlaceCal

Implementation notes & questions

"Ask" bots to not access with robots.txt (feat: Add robots.txt to ask AI bots not to crawl the site #2569)
Forbid them entirely with .htaccess`
Enable Cloudflare proxy #2515

Implementation plan

To be written by the developer

kimadactyl added the requires research Is this still an issue? What are viable solutions? label Jul 29, 2024

This was referenced Aug 19, 2024

[Bug]: current_site sometimes undefined #2567

Open

feat: Add robots.txt to ask AI bots not to crawl the site #2569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block AI bots #2542

Block AI bots #2542

kimadactyl commented Jul 29, 2024 •

edited

Loading

Block AI bots #2542

Block AI bots #2542

Comments

kimadactyl commented Jul 29, 2024 • edited Loading

Use case

Acceptance criteria

Implementation notes & questions

Implementation plan

kimadactyl commented Jul 29, 2024 •

edited

Loading