Added a new experimental feature for bookmarks in Micro.blog Premium where the text of a web page you bookmark will be summarized by OpenAI. You can show the summaries by choosing âShow Summaryâ from the ââŚâ menu in Bookmarks on the web.
Added a new experimental feature for bookmarks in Micro.blog Premium where the text of a web page you bookmark will be summarized by OpenAI. You can show the summaries by choosing âShow Summaryâ from the ââŚâ menu in Bookmarks on the web.
@news So, letâs say the site being bookmarked has a robots.txt that tells all bots, and especially AI scrapers, to sod off. This will bypass that, in a way, and OpenAI will gain from someoneâs content who doesnât want it to. I canât say Iâm thrilled by this. Is there any way to check first if the site has a robots.txt that makes it apparent it doesnât want AI to gain from its content? I really dislike all this AI stuff and it feels like weâre just going to have to cave to it and I refuse.
We need a way to block all of this from the creator's side, because robots.txt isn't going to do it. And especially when the consumer of the creation can then use AI on the content anyway. Ugh. I am weary of this timeline already and it has only just begun.
@pimoore I didnât mean to insinuate that it was, but since we know the developer in this case I am asking whether they could check for the presence of that and be a good citizen and adhere to it.
@anniegreens @pimoore If you with gain mean OpenAI could use the content to train their models, youâre right; thatâs technically possible. But, they would take a huge risk, as their business terms and privacy policy claim:
We do not train on your business data (data from ChatGPT Team, ChatGPT Enterprise, or our API Platform)
Hopefully, @manton uses a business account (and not his personal account) for Micro.blogâs API requests to OpenAI. And if so, you can feel reasonably safe knowing that your content wonât be used to train their models via this specific feature. Thatâs not stopping your content from ending up in OpenAIâs training data via other channels, though.
And, OpenAI could say one thing and do another, and they wouldnât be the first company in the world to lie. đ
Thereâs no 100% certain way to be excluded from training data, other than keeping your content away from the public internet. And even then, you canât really control if another human copies and pastes your content into ChatGPT or something similar.
PS. In case if itâs not clear in my reply above, I see plenty of risks, moral issues, and so on with applied statistics AI as well. Iâm not opposed to AI, but I do think it must be sustainably built, regulated, and rolled out responsibly.