Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block AI bots #2542

Open
4 tasks
kimadactyl opened this issue Jul 29, 2024 — with Huly for GitHub · 0 comments
Open
4 tasks

Block AI bots #2542

kimadactyl opened this issue Jul 29, 2024 — with Huly for GitHub · 0 comments
Labels
requires research Is this still an issue? What are viable solutions?

Comments

Copy link
Member

kimadactyl commented Jul 29, 2024

Use case

Promoted by this: https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/

Given that our Community site is only for hosting open source projects, AWS and Cloudflare do give us sponsored plans, but we only have a limited number of credits each year. The additional bandwidth costs AI crawlers are currently causing will likely mean we will run out of AWS credits early.

By blocking these crawlers, bandwidth for our downloaded files has decreased by 75% (~800GB/day to ~200GB/day). If all this traffic hit our origin servers, it would cost around $50/day, or $1,500/month, along with the increased load on our servers.

Normal traffic gets cached by our CDN, and doesn't cost us anything for bandwidth. But because many of these files are not downloaded often (and they're large), the cache is usually expired and the requests hit our origin servers directly, causing substantial bandwidth charges. Zipped documentation was designed for offline consumption by users, not for crawlers.

We should look into filtering out AI requests.

This also obviously has legal and moral benefits - keeping github to the technical ones though.

Acceptance criteria

  • AI requests are filtered out before hitting PlaceCal

Implementation notes & questions

Implementation plan

To be written by the developer

@kimadactyl kimadactyl added the requires research Is this still an issue? What are viable solutions? label Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requires research Is this still an issue? What are viable solutions?
Projects
None yet
Development

No branches or pull requests

1 participant