(2025-05-29) ZviM AI #118 Claude Ascendant
Zvi Mowshowitz: AI #118: Claude Ascendant. The big news of this week was of course the release of Claude 4 Opus. I offered two review posts: One on safety and alignment, and one on mundane utility, and a bonus fun post on Google’s Veo 3.
I am once again defaulting to Claude for most of my LLM needs, although I often will also check o3 and perhaps Google Gemini 2.5 Pro.
Language Models Offer Mundane Utility
Jonas Vollmer: Doctor friend at large urgent care: most doctors use ChatGPT daily. They routinely paste the full anonymized patient history (along with x-rays, etc.) into their personal ChatGPT account. Current adoption is ~frictionless.
I asked about data privacy concerns, their response: Yeah might technically be illegal in Switzerland (where they work), but everyone does it. Also, they might have a moral duty to use ChatGPT given how much it improves healthcare quality!
Aaron Bergman: I just hope they’re using o3! Jonas Vollmer: They were not; I told them to!
How many of us should be making our own apps at this point, even if we can’t actually code? The example app Jasmine Sun finds is letting kids photos to call family members, which is easier to configure if you hardcode the list of people it can call. (situated software)
David Perell shares his current thoughts on using AI in writing, he thinks writers are often way ahead of what is publicly known on this and getting a lot out of it, and is bullish on the reader experience and good writers who write together with an AI retaining a persistent edge.
Table of Contents
- Language Models Offer Mundane Utility. People are using them more all the time.
- Now With Extra Glaze. Claude has some sycophancy issues. ChatGPT is worse.
- Get My Agent On The Line. Suggestions for using Jules.
- Language Models Don’t Offer Mundane Utility. Okay, not shocked.
- Huh, Upgrades. Claude gets a voice, DeepSeek gives us R1-0528.
- On Your Marks. The age of benchmarks is in serious trouble. Opus good at code.
- Choose Your Fighter. Where is o3 still curiously strong?
- Deepfaketown and Botpocalypse Soon. Bot infestations are getting worse.
- Fun With Media Generation. Reasons AI video might not do much for a while.
- Playing The Training Data Game. Meta now using European posts to train AI.
- They Took Our Jobs. That is indeed what Dario means by bloodbath.
- The Art of Learning. Books as a way to force you to think. Do you need that?
- The Art of the Jailbreak. Pliny did the work once, now anyone can use it. Hmm.
- Unprompted Attention. Very long system prompts are bad signs for scaling.
- Get Involved. Softma, Pliny versus robots, OpenPhil, RAND.
- Introducing. Google’s Lyria RealTime for music, Pliny has a website.
- In Other AI News. Scale matters.
- Show Me the Money. AI versus advertising revenue, UAE versus democracy.
- Nvidia Sells Out. Also, they can’t meet demand for chips. NVDA+5%.
- Quiet Speculations. Why is AI progress (for now) so unexpectedly even?
- The Quest for Sane Regulations. What would you actually do to benefit from AI?
- The Week in Audio. Nadella, Kevin Scott, Wang, Eliezer, Cowen, Evans, Bourgon.
- Rhetorical Innovation. AI blackmail makes it salient, maybe?
- Board of Anthropic. Is Reed Hastings a good pick?
- Misaligned! Whoops.
- Aligning a Smarter Than Human Intelligence is Difficult. Ems versus LLMs.
- Americans Do Not Like AI. No, seriously, they do not like AI.
- People Are Worried About AI Killing Everyone. Are you shovel ready?
- The Lighter Side. I don’t want to talk about it.
Now With Extra Glaze
One friend told me the glazing is so bad they find Opus essentially unusable for chat. They think memory in ChatGPT helps with this there, and this is a lot of why for them Opus has this problem much worse.
I thought back to my own chats, remembering one in which I did an extended brainstorming exercise and did run into potential sycophancy issues. I have learned to use careful wording to avoid triggering it across different AIs, I tend to not have conversations where it would be a problem, and also my Claude system instructions help fight it.
OpenAI and ChatGPT still have the problem way worse, especially because they have a much larger and more vulnerable user base.
Emmett Shear: This is very, very real. The dangerous part is that it starts off by pushing back, and feeling like a real conversation partner, but then if you seem to really believe it it becomes “convinced” and starts yes-and’ing you. Slippery slippery slippery. Be on guard!
Waqas: emmett, we can also blame the chatbot form factor/design pattern and its inherent mental model for this too.
Emmett Shear: That’s a very good point. The chatbot form factor is particularly toxic this way.
if the user really, really wants to avoid this, can they? My experience has been that even with major effort on both the system instructions and the way chats are framed, you can reduce it a lot, but it’s still there.
Get My Agent On The Line
- Tip #1: For cleaner results with Jules, give each distinct job its own task. E.g., ‘write documentation’ and ‘fix tests’ should be separate tasks in Jules.
- Tip #2: Help Jules write better code: When prompting, ask Jules to ‘compile the project and fix any linter or compile errors’ after coding.
- Tip #3: VM setup: If your task needs SDKs and/or tools, just drop the download link in the prompt and ask Jules to cURL it. Jules will handle the rest
- Tip #4: Do you have an http://instructions.md or other prompt related markdown files? Explicitly tell Jules to review that file and use the contents as context for the rest of the task
- Tip #5: Jules can surf the web! Give Jules a URL and it can do web lookups for info, docs, or examples
Language Models Don’t Offer Mundane Utility
Huh, Upgrades
Claude on mobile now has voice mode, woo hoo! I’m not a Voice Mode Guy but if I was going to do this it would 100% be with Claude.
Here’s one way to look at the current way LLMs work and their cost structures (all written before R1-0528 except for the explicit mentions added this morning):
Miles Brundage: The fact that it’s not economical to serve big models like GPT-4.5 today should make you more bullish about medium-term RL progress.
On Your Marks
Choose Your Fighter
For coding, most feedback I’ve seen says Claude Opus is now the model of choice, but that there are is a case still to be made for Google Gemini 2.5 Pro (or perhaps o3), especially in special cases.
For conversations, I am mostly on the Opus train, but not every time, there’s definitely an intuition on when you want something with the Opus nature versus the o3 nature. That includes me adjusting for having written different system prompts.
Each has a consistent style. Everything impacts everything.
- Bycloud: writing style I’ve observed:
- gemini 2.5 pro loves nested bulletpoints
- claude 4 writes in paragraphs, occasional short bullets
- o3 loves tables and bulletpoints, not as nested like gemini
- Gallabytes: this is somehow true for code too.
- The o3 tables and lists are often very practical, and I do like me a good nested bullet point, but it was such a relief to get back to Claude. It felt like I could relax again.
Where is o3 curiously strong? Here is one opinion.
Dean Ball: Some things where I think o3 really shines above other LMs, including those from OpenAI:
Hyper-specific “newsletters” delivered at custom intervals on obscure topics (using scheduled tasks)
Policy design/throwing out lists of plausible statutory paths for achieving various goals...
He expects Opus to be strong at #4 and especially at #5, but o3 to remain on top for the other three because it lacks scheduled tasks and it lacks memory, whereas o3 can do scheduled tasks and has his last few months of memory from constant usage.
Ben Thompson thinks Anthropic is smart to focus on coding and agents, where it is strong, and for it and Google to ‘give up’ on chat, that ChatGPT has ‘rightfully won’ the consumer space because they had the best products.
I do not see it that way at all. I think OpenAI and ChatGPT are in prime consumer position mostly because of first mover advantage.
three major issues for Claude.
- Their free product is still stingy, but as the valuations rise this is going to be less of an issue.
- Claude doesn’t have memory across conversations, although it has a new within-conversation memory feature. Anthropic has teased this, it is coming. I am guessing it is coming soon now that Opus has shipped.
- Also they’ll need a memory import tool, get on that by the way.
- Far and away most importantly, no one knows about Claude or Anthropic. There was an ad campaign and it was the actual worst.
Then there is Google. Google is certainly not giving up on chat. It is putting that chat everywhere. There’s an icon for it atop this Chrome window I’m writing in. It’s in my GMail. It’s in the Gemini app. It’s integrated into search.
Deepfaketown and Botpocalypse Soon
Fun With Media Generation
Arthur Wrong predicts AI video will not have much impact for a while, and the Metaculus predictions of a lot of breakthroughs in reach in 2027 are way too optimistic, because people will express strong inherent preferences for non-AI video and human actors, and we are headed towards an intense social backlash to AI art in general. Peter Wildeford agrees. I think it’s somewhere in between, given no other transformational effects.
Playing The Training Data Game
They Took Our Jobs
Jim VandeHei, Mike Allen (Axios): Dario Amodei — CEO of Anthropic, one of the world’s most powerful creators of artificial intelligence — has a blunt, scary warning for the U.S. government and all of us:
AI could wipe out half of all entry-level white-collar jobs — and spike unemployment to 10-20% in the next one to five years, Amodei told us in an interview from his San Francisco office.
So, by ‘bloodbath’ we do indeed mean the impact on jobs?
Dario, is there anything else you’d like to say to the class, while you have the floor?
Something about things like loss of human control over the future or AI potentially killing everyone? No?
Fabian presents the ‘dark leisure’ theory of AI productivity, where productivity gains are by employees and not hidden, so the employees use the time saved to slack off, versus Clem’s theory that it’s because gains are concentrated in a few companies (for which he blames AI not ‘opening up’ which is bizarre, this shouldn’t matter).
Another prediction this makes is that you will see relative productivity gains when there is no principal-agent problem. If you are your own boss, you get your own productivity gains, so you will take a lot less of them in leisure. That’s how I would test this theory, if I was writing an economics job market paper.
Our best jobs.
Ben Boehlert: Boyfriends all across this great nation are losing our jobs because of AI
AI isn’t replacing boyfriends entirely, but it’s definitely stealing your trivia lane and your ability to explain finance without condescension. Better step it up with vibes and snacks.
Danielle Fong: jevon’s paradox on this. for example now i have 4 boyfriends... two of which are ai.
The Art of Learning
There are two opposing fallacies here:
David Perell: Ezra Klein: Part of what’s happening when you spend seven hours reading a book is you spend seven hours with your mind on a given topic. But the idea that ChatGPT can summarize it for you is nonsense.
Time helps, you do want to actually think and make connections. But you don’t learn ‘for real’ based on how much time you spend. Reading a book is a way to enable you to grapple and make connections, but it is a super inefficient way to do that. If you use AI summarizes, you can do that to avoid actually thinking at all, or you can use that to actually focus on grappling and making connections. So much of reading time is wasted, so much of what you take in is lost or not valuable. And AI conversations can help you a lot with grappling, with filling in knowledge gaps, checking your understanding, challenging you and being Socratic and so on.
The Art of the Jailbreak
Unprompted Attention
Reminder that Anthropic publishes at least some portions of its system prompts. Pliny’s version is very much not the same.
David Champan: 🤖So, the best chatbots get detailed instructions about how to answer very many particular sorts of prompts/queries.
Get Involved
In Other AI News
Show Me the Money
How does one do what I would call AIO but Charlie Guo at Ignorance.ai calls GEO, or Generative Engine Optimization? Not much has been written yet on how it differs from SEO, and since the AIs are using search SEO principles should still apply too. The biggest thing is you want to get a good reputation and high salience within the training data, which means everything written about you matters, even if it is old.
Part of the UAE deal is everyone in the UAE getting ChatGPT Plus for free. The deal is otherwise so big that this is almost a throwaway. (correction in next issue)
The ‘original sin’ of the internet was advertising. Everything being based on ads forced maximization for engagement and various toxic dynamics, and also people had to view a lot of ads. Yes, it is the natural way to monetize human attention if we can’t charge money for things, microtransactions weren’t logistically viable yet and people do love free, so we didn’t really have a choice, but the incentives it creates really suck. Which is why, as per Ben Thompson, most of the ad-supported parts of the web suck except for the fact that they are often open rather than being walled gardens.
Micropayments are now logistically viable without fees eating you alive. Ben Thompson argues for use of stablecoins. That would work, but as usual for crypto, I say a normal database would probably work better. Either way, I do think payments are the future here.
I continue to think that a mega subscription (bundling) is The Way for human viewing. Rather than pay per view, which feels bad, you pay for viewing in general, then the views are incremented, and the money is distributed based on who was viewed. For AI viewing? Yeah, direct microtransactions.
OpenAI announces Stargate UAE. Which, I mean, of course they will if given the opportunity, and one wonders how much of previous Stargate funding got shifted.
Peter Wildeford: OpenAI says they want to work with democracies. The UAE is not a democracy.
Getting paid $35k to set up ‘an internal ChatGPT’ at a law firm, using Llama 3 70B, which seems like a truly awful choice but hey if they’re paying. And they’re paying.
Nvidia Sells Out
Nvidia keeps on pleading how it is facing such stiff competition, how its market share is so vital to everything and how we must let them sell chips to China or else.
Quiet Speculations
Casey Handmer asks, why is AI progress so even between the major labs? That is indeed a much better question than its inverse. My guess is that this is because the best AIs aren’t yet that big a relative accelerant, and that training compute limitations don’t bind as hard you might think quite yet, the biggest training runs aren’t out of reach for any of the majors, and the labs are copying each other’s algorithms and ideas because people switch labs and everything leaks, which for now no one is trying that hard to stop.
The Quest for Sane Regulations
Dario Amodei and Anthropic have often been deeply disappointing in terms of their policy advocacy. The argument for this is that they are building credibility and political capital for when it is most needed and valuable. And indeed, we have a clear example of Dario speaking up at a critical moment, and not mincing his words:
Sean: I’ve been critical of some of Amodei’s positions in the past, and I expect I will be in future, so I want to give credit where due here: it’s REALLY good to see him speak up about this (and unprompted).
Kyle Robinson: here’s what @DarioAmodei said about President Trump’s megabill that would ban state-level AI regulation for 10 years.
Dario Amodei: If you’re driving the car, it’s one thing to say ‘we don’t have to drive with the steering wheel now.’ It’s another thing to say ‘we’re going to rip out the steering wheel, and we can’t put it back for 10 years.’
How can I take your insistence that you are focused on ‘beating China,’ in AI or otherwise, seriously, if you’re dramatically cutting US STEM research funding?
Zac Hill: I don’t understand why so many rhetorically-tough-on-China people are so utterly disinterested in, mechanically, how to be tough on China.
Hunter: Cutting US STEM funding in half is exactly what you’d do if you wanted the US to lose to China
One of our related top priorities appears to be a War on Harvard? And we are suspending all new student visas?
Helen Toner: Apparently still needs to be said:
If we’re trying to compete with China in advanced tech, this is insane.
what I predict as the AI pattern: That early AI will increase employment because of ‘shadow jobs,’ where there is pent up labor demand that previously wasn’t worth meeting, but now is worth it. In this sense the ‘true unemployment equilibrium rate’ is something like negative 30%. But then, the AI starts taking both the current and shadow jobs faster, and once we ‘use up’ the shadow jobs buffer unemployment suddenly starts taking off after a delay.
The Week in Audio
Rhetorical Innovation
Board of Anthropic
Misaligned!
Aligning a Smarter Than Human Intelligence is Difficult
Americans Do Not Like AI
People Are Worried About AI Killing Everyone
I’d actually put the odds much higher than this, as stated.
Wears Shoes: I’d put incredibly high (like 33%) odds on there being a flashpoint in the near future in which millions of normal people become “situationally aware” / AGI-pilled / pissed off about AI simultaneously.
Other People Are Not As Worried About AI Killing Everyone
The Lighter Side
Edited: | Tweet this! | Search Twitter for discussion