bluecaravan
bluecaravan
Between Matt Mullenweg going off the rails and Automattic reportedly striking deals to let AI companies scrape data from Tumblr and Wordpress blogs, I think those of us who post things on the internet increasingly need to be aware of how our content may be used, and make informed decisions about where we c... chencuifen.com
|
Embed
Progress spinner
adam
adam

@bluecaravan just to add, some discussion about shielding your site from AI scraping, over on the Help section help.micro.blog/t/exclude...

|
Embed
Progress spinner
bluecaravan
bluecaravan

@adam thanks! I already had the OpenAI bots added but not the Bard and Common Crawl ones, this was helpful 😊

|
Embed
Progress spinner
CarloVSantiago
CarloVSantiago

@adam Robots.txt. One of my favorite topics this year. I’m whether there’s any way of knowing whether bots comply.

|
Embed
Progress spinner
warner
warner

@bluecaravan gah @sod Custom Robots plugin not actually injecting code into my Typewriter Dark Mode by @thedimpause theme - any ideas guys? This feels like it should become default @manton

|
Embed
Progress spinner
sod
sod

@warner Could you please elaborate a bit? My plug-in is not designed to inject code anywhere. Its sole purpose is to enable an easy way to edit your blog's robots.txt file, and it looks like that works just fine.

|
Embed
Progress spinner
In reply to
warner
warner

@sod stand down @thedimpause: brainfart thinking it was Meta tags plugin, which I believe I've had a problem in the past injecting into HTML head on this theme. Let's hope AI bots respect .txt more than Google are said to for search instructions! Wonder if Manton will ever add disallows to Micro.blog/robots.txt to embody the laudable data-ownership policy reiterated on Feb 27th, given our data is being showcased on this domain too!!

|
Embed
Progress spinner
sod
sod

@warner Phew, thanks, you had me worried there for a bit. 😅 Even if there are efforts from companies to enable some kind of control over how their bots crawl our blogs and websites, I think it's a good to always assume traditional crawlers and AI models gobble up all the data you make publicly available on the open web.

The only way to be reasonable sure they won't see your stuff is to keep it in a more private area of the web, behind a login or paywall, for example. And even then, there's always the risk that some of your readers share your writing with the AI tools they use anyway.

|
Embed
Progress spinner
warner
warner

@sod may be important for future IP legal challenges to have made every effort

|
Embed
Progress spinner