Does anyone know Arc Search's user agent so we can block it?
@cory I think it might be PerplexityBot. I’m not 100% certain though.
https://blog.perplexity.ai/blog/arc-x-perplexity
https://darkvisitors.com/agents/perplexitybot
@knowler 👀 thank you! Going to make sure it’s added on my end and keep an eye out for anything else they’re using.
@austincnunn it’s an AI powered engine that aggregates information for you and provides suspect answers — I really don’t see any benefit to allowing it.
@cory Thats fair. Just confused when I saw that and thought I had missed a big hubballoo.
@austincnunn ah yeah it's yet another misguided attempt to layer AI into absolutely everything
@cory Ok, so not PerplexityBot, this is the user-agent string: ArcMobile2/11 CFNetwork/1492.0.1 Darwin/23.3.0
@knowler got it! So we'd need to block `ArcMobile2`? I haven't dug into this myself but can't imagine the entire string is required. 😅
I have these in one of my vhost access logs:
"Arc/1.19.1 (Mac OS X Version 14.0 (Build 23A5337a))"
"Arc/1.26.2 (Mac OS X Version 14.2.1 (Build 23C71))"
"Arc/1.25.1 (Mac OS X Version 14.3 (Build 23D5051b))"
"ARC Reader (http://arc.semsol.org/)"
Hope this helps!
@bahua thank you! My site's on Netlify and I don't have any insight into access logs from any visitors.
@andyn I don't like the AI-based approach and have blocked and intend to keep blocking similar crawlers (https://darkvisitors.com is a good reference).
@andyn I suppose that's fair — I'm more concerned with the extractive nature of AI writ large and would rather draw the line and add new crawlers as they arise.
@cory Try to make sure you don’t block Arc in it’s entirety. Many of us Arc users are not exactly fans of the LLM stuff they’ve added lately (which thankfully is opt-in on the Mac version, I never turned any of them on).
@torb I sure won’t! I’m not interested in blocking visitors or browsers, just robots and scrapers (provide the honor robots.txt). 😄
@cory @torb another reason why the robots solution for these LLMs is not so good, a meta tag would be better: https://noml.info/