(2024-10-10) Zvim Ai85 Ai Wins The Nobel Prize
Zvi Mowshowitz: AI #85: AI Wins the Nobel Prize. Both Geoffrey Hinton and Demis Hassabis were given the Nobel Prize this week, in Physics and Chemistry respectively. Congratulations to both of them along with all the other winners. AI will be central to more and more of scientific progress over time. This felt early, but not as early as you would think.
The two big capability announcements this week were OpenAI’s canvas, their answer to Anthropic’s artifacts to allow you to work on documents or code outside of the chat window in a way that seems very useful, and Meta announcing a new video generation model with various cool features, that they’re wisely not releasing just yet.
Table of Contents
- Introduction.
- Table of Contents.
- Language Models Offer Mundane Utility. Proofs of higher productivity.
- Language Models Don’t Offer Mundane Utility. Why the same lame lists?
- Blank Canvas. A place to edit your writing, and also your code.
- Meta Video. The ten second clips are getting more features.
- Deepfaketown and Botpocalypse Soon. Assume a data breach.
- They Took Our Jobs. Stores without checkouts, or products. Online total victory.
- Get Involved. Princeton, IAPS, xAI, Google DeepMind.
- Introducing. Anthropic gets its version of 50% off message batching.
- AI Wins the Nobel Prize. Congratulations, everyone.
- In Other AI News. Ben Horowitz hedges his (political) bets.
- Quiet Speculations. I continue to believe tradeoffs are a (important) thing.
- The Mask Comes Off. What the heck is going on at OpenAI? Good question.
- The Quest for Sane Regulations. The coming age of the AI agent.
- The Week in Audio. Elon Musk is (going on) All-In until he wins, or loses.
- Rhetorical Innovation. Things happen about 35% of the time.
- The Carbon Question. Calibrate counting cars of carbon compute costs?
- Aligning a Smarter Than Human Intelligence is Difficult. Some paths.
- People Are Trying Not to Die. Timing is everything.
Language Models Offer Mundane Utility
Anthropic claims Claude has made its engineers sufficiently more productive that they’re potentially hiring less going forward. If I was Anthropic, my first reaction would instead be to hire more engineers, instead?
How much mundane utility are public employees getting from Generative AI?
So this seems like very strong evidence for 2%+ productivity growth already from AI, which should similarly raise GDP.
A consistent finding is that AI improves performance more for those with lower baseline ability. A new paper reiterates this, and adds that being well-calibrated on your own abilities also dramatically improves what AI does for you
Aceso Under Glass tests different AI research assistants again for Q3. You.com wins for searching for papers, followed by Elicit and Google Scholar. Elicit, Perplexity and You.com got the key information when requested. You.com and Perplexity had the best UIs. You.com offers additional features. Straight LLMs like o1, GPT-4o and Sonnet failed hard.
Cardiologists find medical AI to be as good or better at diagnosis, triage and management than human cardiologists in most areas
Language Models Don’t Offer Mundane Utility
AI is a huge help for boilerplate code. But setting up a dev environment is still a pain in the ass and AI is no help there.
This type of issue is a huge effective blocker for people with my level of skills. I find myself excited to write actual code that does the things, but the thought of having to set everything up to get to that point fills with dread – I just know that the AI is going to get something stupid wrong, and everything’s going to be screwed up, and it’s going to be hours trying to figure it out and so on, and maybe I’ll just work on something else
Blank Canvas
OpenAI introduces Canvas, their answer to Claude’s Artifacts.
OpenAI: We’re introducing canvas, a new interface for working with ChatGPT on writing and coding projects that go beyond simple chat
The Canvas team was led by Karina Nguyen: "My vision for the ultimate AGI interface is a blank canvas""
They are calling it Meta Movie Gen, ‘the most advanced media foundation models to-date.’ They offer a 30B parameter video generator, and a 13B for the audio.
Deepfaketown and Botpocalypse Soon
So there’s this AI girlfriend site called muah.ai that offers an ‘18+ AI companion’ with zero censorship and ‘absolute privacy.’ If you pay you can get things like real-time phone calls and rather uncensored image generation. The reason it’s mentioned is that there was this tiny little data breach, and by tiny little data breach I mean they have 1.9 million email addresses
Troy Hunt: But per the parent article, the real problem is the huge number of prompts clearly designed to create CSAM images. I easily found people on LinkedIn who had created requests for CSAM images and right now, those people should be shitting themselves.
They Took Our Jobs
On the question of job applications, anton thinks the future is bright:
Anton (showing the famous 1000 job application post): Thinking about what this might look like in e.g. hiring – the 1st order consequence is bulk slop, the 2nd order consequence is a new equilibrium between ai applications and ai screening, the 3rd order effect is a pareto optimal allocation of labor across entire industries.
My prediction is that this would not go the way Anton expects
Get Involved
AI Wins the Nobel Prize
Geoffrey Hinton, along with John Hopfield, wins the Nobel Prize in Physics for his design of the neural networks that are the basis of machine learning and AI
James Campbell: The winner of the nobel prize in physics spending the entire press conference talking worriedly about superintelligence and human extinction while being basically indifferent to the prize or the work that won it feels like something you’d see in the movies right before shit goes down.
Then there’s Demis Hassabis, who along with John Jumper and David Baker won the Nobel Prize in Chemistry for AlphaFold. That seems like an obviously worthy award
In Other AI News
In other news, my NYC mind cannot comprehend that most of SF lacks AC...
Ben Horowitz, after famously backing Donald Trump, backs Kamala Harris with a ‘significant’ donation, saying he has known Harris for more than a decade, although he makes clear this is not an endorsement or a change in firm policy. Marc Andreessen and a16z as a firm continue to back Trump
Quiet Speculations
OpenAI projects highly explosive revenue growth, saying it will nearly triple next year to $11.6 billion, then double again in 2026 to $25.6 billion. Take their revenue projection as seriously or literally as you think is appropriate, but this does not seem crazy to me at all.
Roon: chatbots will do like $10bb in global revenue. Medium stage agents maybe like $100bb. The trillions will be in ASIs smart enough to create and spin off businesses that gain monopolies over new kinds of capital that we don’t even know exist today.
The key problem with the trillions in profits from those ASIs is not getting the trillions into the ASIs. It’s getting the trillions out. It is a superintelligence. You’re not.
Claims about AI safety that make me notice I am confused:
Roon: tradeoffs that do not exist in any meaningful way inside big ai labs:
“product vs research”
“product vs safety”
these are not the real fracture lines along which people fight or quit. it’s always way more subtle
Holly Elmore: So what are the real fracture lines?
Roon: complicated interpersonal dynamics.
I totally buy that complicated interpersonal dynamics and various ordinary boring issues could be causing a large portion of issues. I could totally buy that a bunch of things we think are about prioritization of safety or research versus product are actually 95%+ ordinary personal or political conflicts, indeed this is my central explanation of the Battle of the Board at OpenAI from November 2023.
What I don’t understand is how these trade-offs could fail to be real. The claim literally does not make sense to me.
A thread about what might be going on with tech or AI people doing radical life changes and abandoning their companies after taking ayahuasca. The theory is essentially that we have what Owen here calls ‘super knowing’ which are things we believe strongly enough to effectively become axioms we don’t reconsider. Ayahuasca, in this model, lets one reconsider and override some of what is in the super knowing (jiggling), and that will often destroy foundational things without which you can’t run a tech company, in ways people can’t explain because you don’t think about what’s up there.
The Mask Comes Off
Steven Zeitchik at Hollywood Reporter decides to enter the arena on this one, and asks the excellent question ‘What the Heck is Going on at OpenAI?’
According to Zeitchik, Mira Murati’s exit should concern us.
she’d given up on trying to reform or slow down the company from within.
The Quest for Sane Regulations
CSET asks: How should we prepare for AI agents? Here are their key proposals
And here’s a summary thread.
Helen Toner (an author): We start by describing what AI agents are and why the tech world is so excited about them rn.
Short version: it looks like LLMs might make it possible to build AI agents that, instead of just playing chess or Starcraft, can flexibly use a computer to do all kinds of things.
The legal framework discusses mens rea, state of mind, potential legal personhood for AIs similar to that of corporations, who is the principle versus the agent, future industry standards, liability rules and so on. The obvious thing to do is to treat someone’s AI agent as an extension of the owner of the agent
Scott Alexander gives his perspective on what happened with SB 1047. It’s not a neutral post, and it’s not trying to be one. As for why the bill failed to pass, he centrally endorses the simple explanation that Gavin Newsom is a bad governor who mostly only cares about Newsom, and those who cultivated political power for a long time convinced him to veto the bill.
The Week in Audio
Stanford’s Erik Brynjolfsson predicts that within 5 years, AI will be so advanced that we will think of human intelligence as a narrow kind of intelligence and AGI will transform the economy.
Eric Schmidt says this year three important things are important: AI agents, text-to-code and infinite context windows. We all know all three are coming, the question is how fast agents will be good and reliable enough to use. Eric doesn’t provide a case here for why we should update towards faster agent progress
Rhetorical Innovation
Fei-Fei Li: I come from academic AI and have been educated in the more rigorous and evidence-based methods, so I don’t really know what all these words mean, I frankly don’t even know what AGI means
Which is totally fine, in terms of not thinking about what the words mean. Except that it seems like she’s using this as an argument for ignoring the concepts entirely. Completely ignoring such possibilities without any justification seems to be her plan. Which is deeply concerning, given she has been playing and may continue to play key role in sinking our efforts to address those possibilities, via her political efforts, including her extremely false claims and poor arguments against SB 1047, and her advising Newsom going forward.
Rather than anything that might work, she calls for things similar to car seatbelts – her actual example here. But we can choose not to buckle the safety belt
On the question of releasing open models, I am happy things have calmed down all around. I do think we all agree that so far the effect, while the models in question have been insufficiently capable, has proven to be positive.
The catch is that this is one of those situations in which you keep winning, and then at some point down the line you might lose far bigger than the wins. While the stressors from the models are tolerable it’s good. The problem is that we don’t know when the stressors become intolerable.
The Carbon Question
Jeff Dean: Did you get this number from Strubell et. al? Because that number was a very flawed estimate
If Dean is correct here, then the carbon footprint of training is trivial.
Aligning a Smarter Than Human Intelligence is Difficult (alignment)
Safer AI evaluates various top AI labs for their safety procedures, and notes a pattern that employees of the labs are often far out in front of leadership and the actual safety protocols the labs implement
The more sophisticated AI models get, the more likely they are to lie. The story is exactly what you would expect. Unsophisticated and less capable AIs gave relatively poor false answers
ChatGPT emerged as the most effective liar. The incorrect answers it gave in the science category were qualified as correct by over 19 percent of participants. It managed to fool nearly 32 percent of people in geography
Lying or hallucinating is the cleanest, simplest example of deception brought on by insufficiently accurate feedback
Is ‘build narrow superhuman AIs’ a viable path?
Roon: it’s obvious now that superhuman machine performance in various domains is clearly possible without superhuman “strategic awareness”. it’s moral just and good to build these out and create many works of true genius. (ASI)
Is it physically and theoretically possible to do this, in a way that would preserve human control, choice and agency, and would promote human flourishing? Absolutely.
Is it a natural path? I think no. It is not ‘what the market wants,’ or what the technology wants. The broader strategic awareness, related elements and the construction thereof, is something one would have to intentionally avoid, despite the humongous economic and social and political and military reasons to not avoid it.
Anything involving humans making choices is going to get strategic quickly.
Edited: | Tweet this! | Search Twitter for discussion