(2025-03-27) ZviM AI #109 Google Fails Marketing Forever

Zvi Mowshowitz: AI #109: Google Fails Marketing Forever. What if they released the new best LLM, and almost no one noticed? Google seems to have pulled that off this week with Google Gemini 2.5 Pro.

It’s a great model, sir. I have a ton of reactions, and it’s 90%+ positive, with a majority of it extremely positive. They cooked. But what good is cooking if no one tastes the results? Instead, everyone got hold of the GPT-4o image generator and went studio Ghibli crazy.

I love that for us, but we did kind of bury the lede. We also buried everything else. Certainly no one was feeling the AGI.

Also seriously, did you know Claude now has web search? It’s kind of a big deal. This was a remarkably large quality of life improvement.

Table of Contents

  • Google Fails Marketing Forever. Gemini Pro 2.5? Never heard of her.
  • Language Models Offer Mundane Utility. One big thread or many new ones?
  • Language Models Don’t Offer Mundane Utility. Every hero has a code.
  • Huh, Upgrades. Claude has web search and a new ‘think’ cool, DS drops new v3.
  • On Your Marks. Number continues to go up.
  • Copyright Confrontation. Meta did the crime, is unlikely to do the time.
  • Choose Your Fighter. For those still doing actual work, as in deep research.
  • *Deepfaketown and Botpocalypse Soon. The code word is ********.
  • They Took Our Jobs. I’m Claude, and I’d like to talk to you about buying Claude.
  • The Art of the Jailbreak. You too would be easy to hack with limitless attempts.
  • Get Involved. Grey Swan, NIST is setting standards, two summer programs.
  • Introducing. Some things I wouldn’t much notice even in a normal week, frankly.
  • In Other AI News. Someone is getting fired over this.
  • Oh No What Are We Going to Do. The mistake of taking BalajiS seriously.
  • Quiet Speculations. Realistic and unrealistic expectations.
  • Fully Automated AI R&D Is All You Need. Or is it? Quite likely yes, it is.
  • IAPS Has Some Suggestions. A few things we hopefully can agree upon.
  • The Quest for Sane Regulations. Dean Ball proposes a win-win trade.
  • We The People. The people continue to not care for AI, but not yet much care.
  • The Week in Audio. Richard Ngo.
  • Rhetorical Innovation. Wait, I thought you said that would be dangerous?
  • Aligning a Smarter Than Human Intelligence is Difficult. Listen y’all it’s sabotage.
  • People Are Worried About AI Killing Everyone. Elon Musk, a bit distracted.
  • Fun With Image Generation. Bonus coverage.
  • Hey We Do Image Generation Too. Forgot about Reve, and about Ideogram.
  • The Lighter Side. Your outie reads many words on the internet.

Google Fails Marketing Forever

I swear that I put this in as a new recurring section before Gemini 2.5 Pro.

I do think AI has the amazing potential to transform education for the vastly better, but I think Reid is importantly wrong for four reasons:

‘Alpha School’ claims to be using AI tutors to get classes in the top 2% of the country. Students spend two hours a day with an AI assistant and the rest of the day to ‘focus on skills like public speaking, financial literacy and teamwork.’ My reaction was beware selection bias.

Language Models Offer Mundane Utility

David Perell offers AI-related writing advice, 90 minute video at the link. Based on the write-up: He’s bullish on writers using AI to write with them, but not those who have it write for them or who do ‘utilitarian writing,’

Google Gemini 2.5, which is now atop the Arena by ~40 points.

Should you be constantly starting new LLM conversations, have one giant one, or do something in between?

I notice the alignment implications aren’t great either, including in practice, where long context conversations often are de facto jailbreaks or transformations even if there was no such intent.

I like Dan Calle’s answer of essentially projects – long threads each dedicated to a particular topic or context, such as a thread on nutrition or building a Linux box

Solving real business problems at P-and-G, one employee soundly with an AI beat two employees without AI, which soundly beat one employee with no AI. Once AI was present the second employee added very little in the default case, but were more likely to produce the most exceptional solutions. centaur

Language Models Don’t Offer Mundane Utility

Huh, Upgrades

Claude.ai has web search! Woo-hoo! You have to enable it in the settings. It’s odd how much Anthropic does not seem to think this is a big deal. It’s a big deal, and transfers a substantial portion of my use cases back to Claude

Anthropic kicks off its engineering blog with a post on its new ‘think’ tool, which is distinct from the ‘extended thinking’ functionality they introduced recently. The ‘think’ tool lets Claude pause to think in the middle of its answer, based on the circumstances.

The think tool is for when you might need to stop and think in the middle of a task. They recommend using the think tool when you need to go through multiple steps and decision trees and ensure all the information is there.

On Your Marks

In these tests, r1 is consistently impressive relative to how useful I find it in practice.

Copyright Confrontation

Meta kind of did a lot of crime in assembling the data sets to train Llama. As in, they used torrents to download, among other things, massive pies of pirated copies of books

So are we going to do anything about this? My assumption is no.

Choose Your Fighter

Video makes the case for NotebookLM as the best learning and research tool, emphasizing the ability to have truly epic amounts of stuff in a notebook.

Sarah Constantin reviews various AI ‘deep research’ tools: Perplexity’s, Gemini’s, ChatGPT’s, Elicit and PaperQA. Gemini and Perplexity were weaker. None are substitutes for actually doing the work at her level, but they are not trying to be that, and they are (as others report) good substitutes for research assistants. ChatGPT’s version seemed like the best bet for now

Deepfaketown and Botpocalypse Soon

Has the time come that you need a code phrase to identify yourself to your parents?

Has the time come to start charging small amounts for phone calls? Yes, very much so. The amount can be remarkably tiny and take a while to kick in, and still work. (war on spam)

They Took Our Jobs

Does the AI crisis in education present opportunity? Very obviously yes, and Arvind Narayanan sees two big opportunities in particular. One is to draw the right distinction between essential skills like basic arithmetic, versus when there’s no reason not to pull out the in-context AI calculator instead. When is doing it yourself building key skills versus not? I would add, if the students keep trying not to outsource the activity, that could be a hint you’re not doing a good job on this.

The second opportunity is, he notes that our educational system murders intrinsic motivation to learn. Perhaps we could fix that? Where he doesn’t do a great job is explaining how we should do that in detail, but making evaluation and learning distinct seems like a plausible place to start.

The Art of the Jailbreak

Eliezer Yudkowsky: To anyone with an intuitive grasp of why computer security is hard, it is completely unsurprising that no AI company can lock down all possible causal pathways

John Pressman: Alright but then why doesn’t this stuff work better on humans?

And of course, when there proves to be a contagious chain of invalid reasoning that persuades many humans, you don’t think of it as a jailbreak, you call it “ideology”.

there are various attacks that indeed involve forcing the human to process information they don’t want to process. I’ve witnessed enough in my day to say this with rather high confidence.

Get Involved

Introducing

Alibaba drops the multimodal open weights Qwen2.5-Omni-7B.

In Other AI News

Apple’s CEO Tim Cook has lost confidence that its AI head can execute, transferring command of Siri to Vision Pro creator Mike Rockwell. Talk about failing upwards. Yes, he has experience shipping new products and solving technical problems, but frankly it was in a way that no one wanted.

OpenAI will adopt Anthropic’s open-source Model Context Protocol.

LessWrong offers a new policy on posting AI-generated content. You can put it in collapsable sections, otherwise you are vouching for its quickly. AI agents are also allowed to post if and only if a human is collaborating and vouching

Tamay Besiroglu wars about overinterpreting METR’s recent paper about doubling times for AI coding tasks, because it is highly domain dependent

Oh No What Are We Going to Do

I normally ignore BalajiS, but AI czar David Sacks retweeted this calling it ‘concerning,’ so I’m going to spend too many words on the subject, and what is concerning is… China might create AI models and open source them? Which would destroy American business models, so it’s bad?

So first of all, I will say, I did not until very recently see this turnaround to ‘open source is terrible now because it’s the Chinese doing it’ from people like Balaji and Sacks coming, definitely not on my bingo card.

It’s kind of impressive how much the Trump attitude of ‘when people sell you useful things below cost of production then that’s terrible, unfair competition, make them stop’ now be applied by people whose previous attitude was maximizing on trade, freedom and open source.

There’s no way for PRC to do industrial policy and ‘overproduce’ models, it’s about how good a model can be produced. Various Chinese companies are already flooding the zone with tons of open models and other AI products. Every few days I see their announcements. And then almost all the time I never see the model again, because it’s bad, and it’s optimizing for benchmarks, and it isn’t useful.

The hype has literally never been lived up to, because even the one time that hype was deserved – DeepSeek’s v3 and r1 – the hype still went way too far.

Oh, and if you’re wondering how seriously to take all this, or why Balaji is on my list of people I try my best to silently ignore, Balaji closes by pitching as the solution… Bitcoin, and ‘community.’ Seriously. You can’t make this stuff up.

Quiet Speculations

Dean Ball: I do not expect DeepSeek to continue open sourcing their frontier models for all that much longer. I give it 12 months, max.
I created a Manifold Market for this.

What about the Epoch ‘GATE’ scenario, should we expect that? Epoch director Jamie Sevilla addresses the elephant in the room, that no one should not expect that. It’s a ‘spherical cow’ model, but can still be a valuable guide in its own way

Fully Automated AI R&D Is All You Need

Tom Davidson: New paper!
Once we automate AI R&D, there could be an intelligence explosion, even without labs getting more hardware.
Empirical evidence suggests the positive feedback loop of AI improving AI could overcome diminishing returns.

The claims here are remarkably modest:
If such an SIE occurs, the first AI systems capable of fully automating AI development could potentially create dramatically more advanced AI systems within months, even with fixed computing power.
Within months? That’s eons given the boost you would get from ‘finishing the o-ring’ and fully automating development.
(AGI)

IAPS Has Some Suggestions

Peter Wildeford lays out a summary of the IAPS (Institute for AI Policy and Strategy) three point plan

The Quest for Sane Regulations

Dean Ball offers an additional good reason ‘regulate this like [older technology X]’ won’t work with AI: That AI is itself a governance technology, changing our capabilities in ways we do not yet fully understand. It’s premature to say what the ‘final form’ wants to look like

In most ways, you want to accelerate AI adoption (or ‘diffusion’), not slow it down, and that acceleration is Dean’s ideal here. Adoption captures the mundane utility and helps us learn and, well, adapt. Whereas the irreversible dangers lie elsewhere, concentrated in future frontier models.

Dean’s core proposal is to offer AI companies opt-in regulation via licensed private AI-standards-setting and regulatory organizations.

If the lab does and sustains that, then the safe harbor applies.

The safeguard against shopping for the most permissive regulator is the regulator’s license can be revoked for negligence, which pulls the safe harbor.

Dean thinks current tort liability is a clear and present danger for AI developers, which he notes he did not believe a year ago.

I also continue to be confused about how this solves the state patchwork problem, since a safe harbor in California doesn’t do you much good if you get sued in Texas

We The People

The Week in Audio

Richard Ngo gives a talk and offers a thread about ‘Living in an extremely unequal world,’ as in a world where AIs are as far ahead of humans as humans are of animals in terms of skill and power. How does this end well for humans and empower them?

That leaves the utilitarian and virtue ethics solutions, and which way to go on that is a big question, but that throws us back to the actually hard question, which is how to cause the Powers That Will Be to want that.

Rhetorical Innovation

If you’re working to align AI, have you asked what you’re aligning the AI to do? Especially when it is estimated that ~10% of AI researchers actively want humanity to lose control over the future

Alignment researchers at big labs don’t ask about WHAT they’re aligning AGI for.

Aligning a Smarter Than Human Intelligence is Difficult

Alex Albert (Head of Claude Relations, Anthropic): Most people don’t realize they can significantly influence what frontier LLMs improve at, it just requires some work

This implies that ‘an eval checking for exactly the things you do not want the AI to be able to do’ is, shall we say, a rather double edged sword.

If others could fine-tune your model, then you need to fine-tune as part of your test, and so on

For example, DeepSeek recently released a new version of v3. The extension from the new v3 to a new version of r1 (or r2) is quite cheap. So if you were worried about its capabilities, not only would you want to test fine-tuning to enhance its particular dangerous capabilities, you would also want to test it as a reasoning model, and give it proper tool access and so on

People Are Worried About AI Killing Everyone

Elon Musk tells Ted Cruz that AI is 10%-20% likely to annihilate humanity in 5-10 years, then they both go back to focusing on other things

Fun With Image Generation

You will for now have to pay for the fun, but honestly how were you not paying before

The fun has escalated quite a bit, and has now changed in kind. The question is, does this mean a world of slop, or does it mean we can finally create things that aren’t slop?
Or, of course, both

The good text rendering is crucial for this. It allows objects to be captioned like in e.g. political cartoons, it allows a book to be a specific book and therefore commentary. I don’t think we’ll exhaust the demand as quickly this time

Kitze: i’m sorry but do you understand it’s over for graphical designers? like OVER over.

Except, it isn’t. How was that not graphic design?

Hey We Do Image Generation Too

Did you hear there’s also a new image generator called Reve, from xAI? It even seems to offer unlimited generations for free.

We also got Ideogram 3.0, which Rowan Cheung calls ‘a new SoTA image generation model.’ If nothing else, this one is fast, and also available to free users. Again, people aren’t talking about it


Edited:    |       |    Search Twitter for discussion

No twinpages!