Asymmetry

It seems to me that the likely outcome of "copyright doesn't cover ML model output" is that Github (or whoever) will train models on all open source code (of any license) and generated code from that can be used in proprietary software with no source made available. Meanwhile, Github is highly unlikely to train its public model on Github or Microsoft proprietary source code, and no one else can legally access that code to train a model on it.

So the overall ratchet effect is that GPL/MIT/whatever code can now be used in proprietary code bases, without regard to the original license, but proprietary code will largely never be used to train these models, and thus inaccessible to open source developers, or developers at other proprietary software companies.

You could imagine a Github Enterprise product where you can pay for a version of Copilot that is trained on both the open source corpus as well as your company's private proprietary code, accessible only to developers in your company's employ.

(14 comments)