(2023-05-11) Yegge We're Gonna Need A Bigger Moat

Steve Yegge: We’re Gonna Need a Bigger Moat. Everyone making SaaS on LLMs, including coding assistants like Cody and Copilot, was rocked by the AI news events of last week.

None of it is news per se. But last week, two events at Google highlighted something crazy big brewing for about 10 weeks that’s been flying under everyone’s radar.

Our story begins with 75-year-old Dr. Geoffrey Hinton, the “Godfather of Deep Learning”, who left Google last week, and is now on tour, explaining the tech zombie apocalypse that is unfolding around us

Googler Luke Sernau’s brilliant essay, “We have no moat, and neither does OpenAI”, spells out a radical change that appears to be in progress. ((2023-05-05) Google: We Have No Moat And Neither Does OpenAI)

Google hasn’t said anything, but we do know that Sundar Pichai, who’s been close to this since its inception, started comparing AI to the invention of Fire in 2018

wait. Transformers are just a mathematical construct! How can a math construct be as big and powerful and scary as… the invention of fire itself? Seriously?

Before last week’s news, we already knew that Google’s Transformer architecture, which is the basis for all LLMs, is truly an invention for the ages. It’s a simple mathematical tool, much like the Fourier transform, except instead of picking out frequencies from noise, it picks out meaning from language.

When you make Transformers big enough, in so-called “Trillion-parameter space”, they begin to develop surprising higher-order capabilities

The Transformer is a shockingly simple invention in many ways, and yet its emergent properties make it more like The Matrix. (emergence)

This does not sound like fire to me. It sounds like a fancy playground for billionaires.

just last week, the “We have no moat” memo highlighted that they have yet another superpower of which we were unaware. That capability is that Transformers can also learn from each other, via a set of new DLCs dropped by modders. The biggest mod (Sernau highlights many of them in his leaked essay, but this is the doozy) is Low Rank Adaptation (LoRA).

LoRA makes LLMs composable, piecewise, mathematically, so that if there are 10,000 LLMs in the wild, they will all eventually converge on having the same knowledge.

there was some other news too. What was it again?

Oh yeah, it was training costs. Remember when it was roughly $1B to train an LLM like GPT-4? According to the leaked Google memo, world-class competitive LLM training costs just dropped from a billion dollars to… that’s right, you guessed it… A hundred dollars.

For this discussion, keep the distinction clear in your mind between GPT–a Transformer that’s been trained to be extra good on standardized testing–and ChatGPT, which is a great big fancy scalable application for a billion users. ChatGPT is LLM-backed SaaS. Keep that in mind! It needs a moat or anyone can compete.

OpenAI clearly recognized how much money they could make if they had a lock on the LLM market with ChatGPT. So they basically gave everyone the finger and started keeping secret

GPT became a moat, for a while, which made ChatGPT really hard to compete with, and only a few companies managed it. For a few months, the whole industry shifted to integrate with these providers

Before last week, there were, oh, maybe five LLMs in GPT’s class. In the whole world.

Right around ten weeks ago, a chain of events kicked off a ten orders of magnitude reduction in LLM training and serving costs

A bit of 3-month-old history: back on February 23rd, Meta’s AI team announced LLaMA, their Bard/GPT competitor.

but unfortunately at the time it was only 68% as smart as GPT on standardized tests

his AI research team went and open-sourced LLaMA. Why? Because with Meta being in large-Nth place, drifting awkwardly into obsolescence, and Zuck not watching, what did they really have to lose?

now every tinkerer on earth with a GPU laptop and PyTorch suddenly knew how the ChatGPT sausage was made.

LLMs can fuckin’ copy each other. So their so-called “data advantage” was really only going to be safe for as long as all the big players kept the AIs locked up.

Within 2 weeks, on March 2nd 2023, LLaMA’s secret model weights, guarded by the finest in the Metaverse, had been leaked on Discord. At which instant, every one of a hundred thousand tinkerer data scientists on Earth suddenly had a torrent of an AI that’s roughly competitive with GPT.

Meta, according to Sernau’s reasoning, came out the clear winner among the Big Fish, because they are now the company with the architecture best suited for scaling up OSS LLMs, thereby taking advantage of all the OSS improvements. Why? because LLaMA and all derivative strains are Meta’s architecture. According to Sernau, Meta was the surprise winner, since now everyone’s using LLaMA.

Over the past 10 weeks, every single major advancement has quickly been copied by everyone else’s clone.

Within a few weeks of the leak, the Vicuna-13B launched — a promising OSS model in the LLaMA architectural family (like Stanford’s Alpaca). Vicuna is free of legal encumbrances associated with LLaMA.

LLaMA may well become the standard architecture. But it sure looks like someone’s going to have to bend the knee. Pluggable platforms have a way of standardizing, and usually on the first mover.

If you’re relying on LLMs for your moat, well… I hope you also have a data moat. You’re going to need it.

For Cody, we’ll still be tied to the big players for a while, yet. I’m guessing they have about 6 months head start. We’re not seeing the same performance from e.g. StarCoder out of the box as you’d get from GPT-4 or Claude. But just look at that moat we’ve got:

It turns out Sourcegraph’s whole platform behind the product is basically a moat-builder.

Edited:    |       |    Search Twitter for discussion