(2024-12-19) Zvim Ai95 O1 Joins The Api

Zvi Mowshowitz: AI #95: o1 Joins the API. A lot happened this week. We’re seeing release after release after upgrade. It’s easy to lose sight of which ones matter, and two matter quite a lot. The first is Gemini Flash 2.0, which I covered earlier this week. The other is that o1 (OpenAI o1), having turned pro, is now also available in the API.

The other big development is that Anthropic released one of the most important alignment papers, Alignment Faking in Large Language Models. This takes what I discussed in AIs Will Increasingly Attempt Shenanigans, and demonstrates it with a much improved experimental design, finding a more worrisome set of behaviors, in a far more robust fashion. I will cover that paper soon on its own, hopefully next week.

Table of Contents

  • From earlier in the week: Gemini Flash 2.0, AIs Will Increasingly Attempt Shenanigans, The o1 System Card is Not About o1.
  • Language Models Offer Mundane Utility. Devin seems remarkably solid.
  • Clio Knows All. In aggregate, for entertainment purposes only.
  • Language Models Don’t Offer Mundane Utility. Ignorance is bliss.
  • The Case Against Education. Academia continues to choose to be eaten by AI.
  • More o1 Reactions. Looking good.
  • Deepfaketown and Botpocalypse Soon. The rise of phishing and the AI reply guy.
  • Huh, Upgrades. Most of the stuff not yet covered is various ChatGPT features.
  • They Took Our Jobs. Advertising that identifies its target as the They.
  • The Art of the Jailbreak. If at first you don’t succeed, jiggle a bit and try again.
  • Get Involved. A free consultation service for potential lab whistleblowers.
  • Introducing. Meta floods us with a bunch of end-of-year tools, too.
  • In Other AI News. SoftBank to invest another $100 billion.
  • Quiet Speculations. Are we running out of useful data?
  • The Quest for Sane Regulations. Political perils for evals, pareto frontiers.
  • The Week in Audio. Ilya Sutskever, of course. Also Roon.
  • Rhetorical Innovation. Try not to cause everyone to accelerate automated R&D.
  • Aligning a Smarter Than Human Intelligence is Difficult. Your grades are in.
  • Not Aligning Smarter Than Human Intelligence Kills You. Another econ paper.
  • The Lighter Side. Why yes, I suppose I do.

Language Models Offer Mundane Utility

More o1 Reactions

Dylan Field: Still doing evaluations, but feels like AGI is basically here with o1 Pro mode.

G. Fodor: This became true, I think, with OpenAI o1 Pro, not Claude. Honestly, it just feels like nudging a junior developer along.

Here are some selected responses to the central question. I like Dan Mac’s answer here, based on what I’ve heard – if it’s a metaphorical system 1 task you still want Claude, if it’s a metaphorical system 2 task you want o1 pro, if it’s a web based task you probably want Perplexity, Deep Research or maybe GPT-4o with web search depending on details.


Edited:    |       |    Search Twitter for discussion