pratik
pratik
The Evolution of Privacy and Ownership in the Blogging World microblog.pratikmhatre.com
|
Embed
Progress spinner
Miraz
Miraz

@pratik Interesting thoughts Pratik and some persuasive arguments.

|
Embed
Progress spinner
pratik
pratik

@Miraz What are your opinions about publicness and use of your words online? I ask since you too use your full real name, I presume.

|
Embed
Progress spinner
jsonbecker
jsonbecker

@pratik I think blogs are inherently public, though that’s not the same, necessarily, as being “Creative Commons”. Still, fair use applies, and I can only barely remember being confused by Vasta’s comments that I half followed and now thinking it’s bonkers based on your description.

A lot of people I know who are mad about AI are professional artists or have friends or family trying to make it as artists. For them, I think the objection is purely “putting people out of work”. I just don’t see it as any different than any other time a technology has had the potential to make us more efficient. And I just struggle to jump to the “protect the economy as it exists today if we can’t dump capitalism entirely” mindset.

And I’m not entirely convinced that what they want to preserve, which is almost entirely an aberration on the scale of human culture, is more worth preserving than things that can come later.

Maybe AI will reduce jobs in some areas, maybe it won’t. Hard to say. And even if it does, it’s harder to say that the world on the other side is a worse trade.

For me, I write in public and not for money. I’m happy for folks to use my words, though I would like attribution when they are my words. I’ve already had too much benefit from AI, say, identifying every picture of my dog on my phone, to want to go back.

|
Embed
Progress spinner
Miraz
Miraz

@pratik I guess to some extent I don't care. I don't want to be misrepresented and would be very angry if someone reposted all my stuff under their own name as though it was theirs. Or if they took my posts and somehow twisted them to suggest that views I do not hold are my views. I can't really imagine how those scenarios could come about. My importance in the world is that I'm one person amongst (how many are we now?) 8 billion. I'm simply a speck of sand on a beach.

|
Embed
Progress spinner
pratik
pratik

@jsonbecker Re: Vasta, I think all he wanted was an acknowledgment from the Micro.blog community that whatever that blogger-developer did with his blog contents, he could've asked for permission first but then it turned into an argument about what's public and he shouldn't be so dramatic. Which, for me, makes it all the more ironic since a few people from back then have expressed the concerns I mentioned about AI. So I was mostly commenting on the changed nature of what public writing means once our robot overlords arrive 🤗

Also, I attended a webinar at work recently which painted a more hopeful future while also acknowledging the ethics of using AI. TBH, it's an exciting time to be living in, similar to when the personal computer arrived, the Internet went public, and the smartphone era started. I think it's crazy I have and am living through all four, and who knows, how many more.

|
Embed
Progress spinner
pratik
pratik

@Miraz True. In the larger scope of things and given the deluge of information, what we personally put out is hardly anything. We can expect basic courtesy, but it's not something I will now pick a fight over unless as you said, I'm being maliciously misrepresented. If it's not malicious, I consider it a personal failing for not having conveyed my arguments clearly.

|
Embed
Progress spinner
jsonbecker
jsonbecker

@pratik I vaguely remember this was like almost “permission to block quote”.

Anyway, I think that LLMs/RAG are really promising and we’ll see them as normal as autocomplete searches.

|
Embed
Progress spinner
rom
rom

@pratik interesting thoughts. Blogs are public, yes - the very least that can be done is to attribute it to the author if re-used. Something those "AI" bots do not do and will not do. And even worse, "AI" can use those words, style, and what not, (against you in the court of law, oops, wrong context), and hash some random text AND pass it on as if it were done by the author. SO, yes, there is a concern. These two are different contexts, IMHO.

Edit: oh, "AI" here means those LLMs and the companies that hoover-up everything.

|
Embed
Progress spinner
pratik
pratik

@rom I've found implementations of LLMs that do that. For e.g., in Arc Search, it lists the websites, it used to summarize the results of your search. In a way, it helps also add credibility to the search result i.e., this summary is not based on the random musings of a bloke.

As a theoretical exercise, if I read a bunch of papers, sites, resources, etc. and seemingly understand the gist of a topic and I write an essay on the topic, is that similar to what an AI does on a much larger scale? The usual conditions for not plagiarizing & attribution apply. Is it ok then?

|
Embed
Progress spinner
rom
rom

@pratik not if you write it the same way the author(s) did it - same style, etc. - if you do it YOUR WAY, with attributions and citations, it is FAIR USE. If you write it and pretend that it came from the author(s), it is another story. Just my two cents.

|
Embed
Progress spinner
Dunk
Dunk

@pratik If they do not want to share it they should grab their favourite pen and write it in their journal. The web should be considered as public domain once it is out there you forego any control over it.... and don't get me started on the IndieWeb nonsense 🤬

|
Embed
Progress spinner
stupendousman
stupendousman

@pratik The difference then and now is that before most people wanted to be discovered on their own because they genuinely thought there was a model to make money out of blogging. Now, why do I want some company's AI LLM to index my content for its own benefit and company's? Sure you can index all my content, but pay the eff up. Times change and technology change. Today everything must be opt-in, not opt-out. Sure the facebooks and google's dont want you think that, but Internet is a utility. Just because its on internet doesnt mean it was meant to be findable. (Findable is diff from being private).

|
Embed
Progress spinner
pratik
pratik

@stupendousman

Internet is a utility. Just because its on internet doesnt mean it was meant to be findable.

I wish this were true, but in its current iteration, it is not. Being on the Internet MEANS it is findable and you have to opt-out. But even the opt-opt means is request-based so we rely on web services to respect that request.

Now, why do I want some company's AI LLM to index my content for its own benefit and company's

Why not? If the company is making finding and synthesizing information better just like Google did, why not? As I said, if you are earning from your content, you can and should prevent indexing, but if you are not, then why prevent contributing to methods that may generate value?

|
Embed
Progress spinner
stupendousman
stupendousman

@pratik

As I said, if you are earning from your content, you can and should prevent indexing, but if you are not, then why prevent contributing to methods that may generate value?

Sure, lets ask these companies to make their tools free for us to use then. Why pay 20 bucks a month to OpenAI to use ChatGPT? If they want to freely index my content (whether or not I want to make money out of it), definitely give away the output of that also free.

|
Embed
Progress spinner
z428
z428

@pratik Interesting read, a few very valid arguments, and a few tough choices. Agree with much of the openness towards AI but am extremely unhappy to see it making the already nasty imbalance of power in the web even worse, because... there seems no open web at the moment. All relevant technologies are dominated by large players who spend a lot of money on things, and in the AI world, this at least by now seems even worse given how much effort is still required to train and run huge models. It seems to just, so far, add to moving money from bottom to top, making a few companies that already are dangerously big even bigger and richer. I'd be more than willing to offer my random stuff for training models for the public good, but I'd really be hesitant doing so to make OpenAI or Meta even bigger.. .

|
Embed
Progress spinner
pratik
pratik

@stupendousman They can, except the work involved in turning random bits of information on the web into meaningful results via LLMs costs a lot in terms of programming effort/skill and other resources. Hence, my distinction on content that was generated to earn revenue; otherwise, as per copyright law, it is fair use. Personally, I think $20 is heavily subsidized. Heck, it's significantly cheaper if you know how to tap into their API (explains the hajjar startups). On the other hand, they could've gone the Google route by offering it for free and monetizing it some other way ("Ad - You asked ChatGPT to create a two-day itinerary to Portland. Here are the cheapest flights, hotels, etc.")

|
Embed
Progress spinner
pratik
pratik

@z428 That would be good if we considered LLMs as a public good or for the overall public benefit right from the start, like landing on the moon. So basically, we offer our random stuff to the...government, who will then contract it out to the same companies to create training models coz do we want the government to first think about what the federal department does creating LLM fit in? Does it have to be the federal government or should it be done by states? Do we even need the government to focus on this new-fangled technology that may prove to be nothing like crypto instead of first focusing on improving food security and homelessness problems? I don't have the answers but I am asking the questions to help smarter people make an informed decision.

|
Embed
Progress spinner
In reply to
chipotle
chipotle

@pratik @stupendousman It’s at least worth noting that it hasn’t been established as a matter of law that using copyrighted material for LLM training is automatically fair use. It’s a plausible interpretation, but one of the factors used to determine whether a given reproduction is fair use is “the effect of the use upon the potential market for or value of the copyrighted work”; generative AI can clearly create works that do compete with the copyrighted work they’re trained on.

I publish a lot of things on the web and explicitly specify the Creative Commons “Attribution NonCommercial ShareAlike” license, which means the material cannot be used for commercial purposes without my explicit permission; if my work is being used to train an LLM that the LLM’s owners are charging access for, that seems on its face to violate the terms of the license. “But you made it available on the internet, too bad so sad” is not and has never been a valid defense against copyright infringement, legally speaking. :)

|
Embed
Progress spinner
stupendousman
stupendousman

@chipotle @pratik

“But you made it available on the internet, too bad so sad” is not and has never been a valid defense against copyright infringement, legally speaking. :)

This is my overarching point.

|
Embed
Progress spinner
pratik
pratik

@stupendousman @chipotle I see your point; hence, I'm trying to get to a middle ground that is beneficial to all people. Hence a model where you can opt-out may be better. My original post was mostly trying to understand why would you?

In terms of fair use, I have been fascinated by the "Everything is a Remix" four-part documentary. It also talks about what is ideal and what actually happens that gets close to the line but never crosses it. I, too, am looking forward to the legal decisions coz information hoarding is never great, IMO.

|
Embed
Progress spinner
z428
z428

@pratik Yes, these are interesting questions... Looking from another angle, it seems we so far have failed to answer questions much "easier" when it comes to making sure digital platforms (Meta, WhatsApp, Amazon, Twitter, ...) obey to some basic agreed-upon laws. And in these cases it seems dependencies were relatively easy to come by, nothing to compare with the amount of hardware and money required to train one of the current LLMs, even though this is slowly improving or maybe changing for the better. And at the very end, we know it's hard to impossible to get will and action for regulating players that reached a certain size. That leaves me cautiously concerned here...

|
Embed
Progress spinner