Added a new experimental feature for bookmarks in Micro.blog Premium where the text of a web page you bookmark will be summarized by OpenAI. You can show the summaries by choosing “Show Summary” from the “…” menu in Bookmarks on the web.
Added a new experimental feature for bookmarks in Micro.blog Premium where the text of a web page you bookmark will be summarized by OpenAI. You can show the summaries by choosing “Show Summary” from the “…” menu in Bookmarks on the web.
@news So, let's say the site being bookmarked has a robots.txt that tells all bots, and especially AI scrapers, to sod off. This will bypass that, in a way, and OpenAI will gain from someone's content who doesn't want it to. I can't say I'm thrilled by this. Is there any way to check first if the site has a robots.txt that makes it apparent it doesn't want AI to gain from its content? I really dislike all this AI stuff and it feels like we're just going to have to cave to it and I refuse.
We need a way to block all of this from the creator's side, because robots.txt isn't going to do it. And especially when the consumer of the creation can then use AI on the content anyway. Ugh. I am weary of this timeline already and it has only just begun.
@anniegreens I don't disagree, and I'm also extremely concerned about the direction—and how far—AI is going. It should be mentioned, however, that robots.txt unfortunately isn't a guarantee of blocking scrapers:
There's no technical requirement that a bot obey your requests. Currently only Google and OpenAI who have announced that this is the way to opt-out, so other AI companies may not care about this at all, or may add their own directions for opting out. — eff.org
You make an excellent point though, this is something that absolutely should be respected per the site-owner's request—assuming this process would bypass the robots.txt setting per the above.
@pimoore I didn't mean to insinuate that it was, but since we know the developer in this case I am asking whether they could check for the presence of that and be a good citizen and adhere to it.
@anniegreens @pimoore If you with gain mean OpenAI could use the content to train their models, you're right; that's technically possible. But, they would take a huge risk, as their business terms and privacy policy claim:
We do not train on your business data (data from ChatGPT Team, ChatGPT Enterprise, or our API Platform)
Hopefully, @manton uses a business account (and not his personal account) for Micro.blog's API requests to OpenAI. And if so, you can feel reasonably safe knowing that your content won't be used to train their models via this specific feature. That's not stopping your content from ending up in OpenAI's training data via other channels, though.
And, OpenAI could say one thing and do another, and they wouldn't be the first company in the world to lie. 😊
There's no 100% certain way to be excluded from training data, other than keeping your content away from the public internet. And even then, you can't really control if another human copies and pastes your content into ChatGPT or something similar.
PS. In case if it's not clear in my reply above, I see plenty of risks, moral issues, and so on with applied statistics AI as well. I'm not opposed to AI, but I do think it must be sustainably built, regulated, and rolled out responsibly.
@anniegreens You definitely weren't insinuating that it was, merely bringing up a very valid concern around it. I didn't even consider this possibility when I read about the feature, so your observation is most astute.
I'm not opposed to AI, but I do think it must be sustainably built, regulated, and rolled out responsibly.
So much this. Unfortunately, one need only read about the AI deep fakes that are already happening, and ask themselves if we've already crossed that bridge.