You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that our Community site is only for hosting open source projects, AWS and Cloudflare do give us sponsored plans, but we only have a limited number of credits each year. The additional bandwidth costs AI crawlers are currently causing will likely mean we will run out of AWS credits early.
By blocking these crawlers, bandwidth for our downloaded files has decreased by 75% (~800GB/day to ~200GB/day). If all this traffic hit our origin servers, it would cost around $50/day, or $1,500/month, along with the increased load on our servers.
Normal traffic gets cached by our CDN, and doesn't cost us anything for bandwidth. But because many of these files are not downloaded often (and they're large), the cache is usually expired and the requests hit our origin servers directly, causing substantial bandwidth charges. Zipped documentation was designed for offline consumption by users, not for crawlers.
We should look into filtering out AI requests.
This also obviously has legal and moral benefits - keeping github to the technical ones though.
Acceptance criteria
AI requests are filtered out before hitting PlaceCal
Use case
Promoted by this: https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/
We should look into filtering out AI requests.
This also obviously has legal and moral benefits - keeping github to the technical ones though.
Acceptance criteria
Implementation notes & questions
Implementation plan
To be written by the developer
The text was updated successfully, but these errors were encountered: