@bluecaravan just to add, some discussion about shielding your site from AI scraping, over on the Help section help.micro.blog/t/exclude...
@adam thanks! I already had the OpenAI bots added but not the Bard and Common Crawl ones, this was helpful 😊
@adam Robots.txt. One of my favorite topics this year. I’m whether there’s any way of knowing whether bots comply.
@bluecaravan gah @sod Custom Robots plugin not actually injecting code into my Typewriter Dark Mode by @thedimpause theme - any ideas guys? This feels like it should become default @manton
@warner Could you please elaborate a bit? My plug-in is not designed to inject code anywhere. Its sole purpose is to enable an easy way to edit your blog's robots.txt
file, and it looks like that works just fine.
@sod stand down @thedimpause: brainfart thinking it was Meta tags plugin, which I believe I've had a problem in the past injecting into HTML head on this theme. Let's hope AI bots respect .txt more than Google are said to for search instructions! Wonder if Manton will ever add disallows to Micro.blog/robots.txt to embody the laudable data-ownership policy reiterated on Feb 27th, given our data is being showcased on this domain too!!
@warner Phew, thanks, you had me worried there for a bit. 😅 Even if there are efforts from companies to enable some kind of control over how their bots crawl our blogs and websites, I think it's a good to always assume traditional crawlers and AI models gobble up all the data you make publicly available on the open web.
The only way to be reasonable sure they won't see your stuff is to keep it in a more private area of the web, behind a login or paywall, for example. And even then, there's always the risk that some of your readers share your writing with the AI tools they use anyway.