Wednesday, December 22, 2021

Thoughts on the GPT3 Decision (2020)

 | UPDATED

UPDATE 12/15/2020: When this was first published, I raised the concern about trainind data being extracted from large language models like GPT-3. A recent paper, resulting from a collaboration between a bunch of big ML research organizations, demonstrates exactly this.

A few months ago I wrote a piece detailing my experiences with OpenAI;s GPT-3 model. A lot of you got excited from that article, and went ahead and signed up for the GPT-3 waitlist.

If you haven’t heard already heard, OpenAI is giving Microsoft exlusive licensing to the underlying code and trained model behind GPT-3.

What exactly is happening?

Previously, OpenAI’s full GPT-3 model was only accessible outside of OpenAI via an API. This was set up so that users could send inputs and receive outputs, but never see any of the details within the model like the weights or exact architecture. Even with this API, OpenAI only provided access to a small, select number of beta users (both companies and individuals).

Now, OpenAI has finally ended this beta-testing period. OpenAI is now going to exclusively license GPT-3 to Microsoft. This means that Microsoft will be the only company outside of OpenAI to have the ability to directly access the code and the model. Other individuals and organizations will still have access to GPT-3 through the limited API (which will be kept alive for at least the immediate future), but it’s not exactly clear how long into the future this access will be kept open. As you might have guessed, a lot of people (including Elon Musk himself) are unhappy with this news.

Why is this being done?

It’s worth noting that OpenAI transitioned from a “non-profit” organization to a “capped-profit” organization last year. Part of the motivation for building GPT-3 was the idea that OpenAI could sell it (both the model and API access). The plan was that OpenAI could then recover the costs of developing and researching it in the first place.

Speaking of those costs, this particular neural network is a gargantuan mess in that regard. The original training likely cost somewhere around $5 million. Even if you’re only trying to reproduce a small subset of the paper, that will still rack you up a high six-figure cloud compute bill. Unlike most models you might be familiar with, the trained GPT-3 is so large that it’s hard to move off the cluster of machines it was trained on (i.e., you’re not going to fit this in an .h5 file that you can just store on your Google Drive).

There are also a few defects in the model that could pose serious legal liability. There’s already been plenty of discussion about the various biases inherent in the model. Partly as a consequence of this size, these defects are much harder to fix than they are to spot. Such a tweaking would likely require another multi-million-dollar cloud compute budget, and another tens of thousands of man-hours. Even then, that wouldn’t guarantee that all the damaging biases could be found and corrected, or even that new detrimental biases wouldn’t be introduced.

This of course raises the question: If this model requires millions of dollars to fine-tune, requires gigantic cloud computing resources, and could easily rack up gigantic legal costs through its misuse, who could actually use it? Of course, this narrows down the pool to multi-billion-dollar tech companies, with gigantic cloud compute resources to spare, along with a prodigious legal department. Cue Microsoft with it’s $143 billion in revenue, Azure cloud services (the same service GPT-3 was trained on), 1,500-person legal department, and PR department that’s been hardened by the fiasco that was Microsoft’s Neo-Nazi-sympathizing twitter bot.

What does this mean for future ML models?

When researchers and programmers refer to “democratization of AI”, they’re usually referring to a one or both of 2 things: easier access to code and architecture, and/or easier access to ML development resources.

On the one hand, we’ve seen some pretty remarkable strides in this space. The ML community has seen groups like PapersWithCode and HuggingFace pop up, both of which make it much easier to implement machine learning ideas from research papers with minimal friction. Machine learning has gone from needing a CS PhD for entry, to being something that even high schoolers get get involved in. This has also meant that machine learning can be applied to fields ranging from medicine to finance to security to art. As such this openness is seen as a norm by not only the users of many of these ML models but also the builders.

On the other hand, OpenAI’s decision to offer Microsoft an Exclusive License to GPT-3 marks a pretty extreme departure from this trend. There have been other large models that have been similarly kept closed-source this year (most notably Google and Facebook’s newest chatbot models), and many ML companies do rely on closed-source algorithms to make their business defensible, but OpenAI’s Microsoft deal is probably the most high-profile example of this. It’s not entirely clear if such blatant flauting of this openness norm might encourage greater secrecy around larger ML models.

Is this the ethical thing to do?

I did mention in my GPT-3 piece that this could lead to a development landscape divided more starkly into have and have-nots in AI. This echoed many of the reactions from when OpenAI announced it’s decision to start making a profit on its AI research. For this reason (and perhaps also reasons like OpenAI’s seeming overhyping of GPT-2), it may be tempting to think of the people at OpenAI as villainous caricature from HBO’s Silicon Valley (at least that’s what Twitter and Reddit might have you believe). However, I still believe that the overwhelming majority of them are still very much motivated by making sure AI systems are used safely and averting worst-case use scenarios (partly because I’ve known many of the people that work at OpenAI personally).

And after looking on sites like Upwork to see how people were reacting to GPT-3…perhaps it’s actually for the best that this is being kept closed source for now.

Yes, this screenshot above is real.

Even without GPT-3, this kind of scene on sites like Upwork is pretty typical. First off, I do want to stress that all my clients are much more reputable, and far less shady, than these guys. I go to great pains to weed out prospective clients that are trying to use AI to commit fraud, blackmail, and a bunch of other nasty offenses you might not have thought possible with AI in it’s current state.

A lot of people in the AI safety discussions often describe hypothetical examples of some future Artificial General Intelligence. While it is an exciting thought experiment to think of how best to contain the titular AI from William Gibson’s “Neuromancer”, you don’t need AGI to have an AI catastrophe on your hands. We’ve already seen cases of rampant astroturfing on government message boards, use of spoofing in algorithmic trading to cause hundreds of millions in losses, and some pretty scary lines being crossed when it comes to use of AI in weapons systems. In other words, we need ways of implementing more safety around AI now, not in some far off hypothetical future where AI becomes sentient.

So, despite some questionable business practices, and despite worries about a new social class imbalance in AI development, I support this decision to restrict access to the full GPT-3.


Cited as:

@article{mcateer2020gpt3micro,
    title = "Thoughts on the GPT3 Decision",
    author = "McAteer, Matthew",
    journal = "matthewmcateer.me",
    year = "2020",
    url = "https://matthewmcateer.me/blog/thoughts-on-gpt-3-decision/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I will be very happy to correct them right away! Alternatily, you can follow me on Twitter and reach out to me there.

See you in the next post 😄



from Hacker News https://ift.tt/3qmWHVF

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.