manton
manton

AI crawling reprise: manton.org

|
Embed
Progress spinner
In reply to
johnbeales.com
johnbeales.com

@manton We have seen enough bot traffic on 4RoadService.com to take down our servers at times, even with a crawl-delay in our robots.txt. It got so bad that we added a middleware to return an HTTP 429 to bots that identify themselves as bots in their user-agent, but unfortunately that punishes only the bots that identify themselves, not the truly deceitful bots.

Meta seems to follow the crawl-delay directive, I haven’t been able to verify the other major LLM players, but there are plenty of bots/crawlers that identify themselves in the user-agent but do not respect the crawl-delay.

|
Embed
Progress spinner