cleverdevil
cleverdevil

Facebook takes your personal data and exploits it for profit in shady ways. GitHub has now done the same, but with your source code. GitHub is a much better company than Facebook is, and they have an opportunity to prove it now. So, prove it.

|
Embed
nitinkhanna
nitinkhanna

@cleverdevil prove it how? What would you have them do?

|
Embed
cleverdevil
cleverdevil

@nitinkhanna there are many things that they could do:

  • Make a public statement acknowledging that they didn't think this through well enough and that they'll be taking steps to make it right.
  • Give people an option within their GitHub repositories to opt in or out of the training process.
  • Automatically disable training and prevent enabling training on software that is licensed under an incompatible license (GPL, MPL, etc.)
  • Open source their ML model so that the "derivative work" is available under a compatible open source license.
  • Shut down the entire offering until they figure it out.

Its a complicated problem, but they'll need to attack it head-on.

|
Embed
pimoore
pimoore

@cleverdevil You have a link where I can read about this?

|
Embed
In reply to
toddgrotenhuis
toddgrotenhuis

@cleverdevil This is a good list.

|
Embed
cleverdevil
cleverdevil

@pimoore so, Nora Tindall has some great tweets about the issue. But, the long and short of it is that GitHub released something called GitHub Copilot which uses AI / machine learning to predictively pair program along with you.

The issue is that they trained their machine learning model using tons of code on GitHub without anyone's consent. There are a myriad of issues with this, not the least of which is copyright, but software licensing also comes into play.

There are several very popular open source licenses, such as the GPL, which explicitly prohibit "derivative works" that are created from GPL-licensed code, unless those works are also released under the GPL. They're immediately in violation of hundreds (thousands?) of GPL licensed projects.

Worse, if you're a user of Copilot, there is a decent chance that when you're writing some code, Copilot predictively spits out some code that is very very close to, if not verbatim, lifted from a GPL licensed project. Guess what? Now you are in violation of the GPL unless you open source your work under the same license.

Its a bit of a nightmare, and an absolute self-own from GitHub.

|
Embed
nitinkhanna
nitinkhanna

@cleverdevil interesting... I take it the whole thing left a bad taste in your mouth. It certainly is a concern what they've taught the model on. But as far as absracted code goes, they could very well have done a good job using well known open source software to train on. Of course, technical breakdowns will tell us more.

But I don't see how shutting it down and rethinking it would help. Every time they do, someone might come up with a new issue which they'll have to respond to.

You're right about open sourcing the model though - it's in line with what we've come to expect from the open source world, even though github per se hasn't always been a good caretaker of that.

|
Embed
pimoore
pimoore

@cleverdevil This kind of makes me want to move my stuff to either Gitlab/Bitbucket, or a self-hosted Gitea instance.

Thanks for the link and breakdown of what’s happening with this!

|
Embed
cleverdevil
cleverdevil

@nitinkhanna well, they trained their model on my open source software without my consent, for their own benefit, which is not only kind of icky, its also a legal problem for them as they're in violation of many, many, many projects' licenses. It also puts their users at risk as a result. If they don't shut it down and rethink, they're asking for many lawsuits that they'll very likely lose.

|
Embed
cleverdevil
cleverdevil

@pimoore I'm not quite there just yet, as I think that they're genuinely trying (and succeeding) to do something very cool and innovative. They just misfired a bit on the critical thinking side. Many technologists suffer from this :)

I'm hopeful that they pivot.

|
Embed