(2024-11-21) Zvim Ai91 Deep Thinking

Zvi Mowshowitz: AI #91: Deep Thinking. Did DeepSeek effectively release an o1-preview clone within nine weeks?The benchmarks largely say yes. Certainly it is an actual attempt at a similar style of product, and is if anything more capable of solving AIME questions, and the way it shows its Chain of Thought is super cool. Beyond that, alas, we don’t have enough reports in from people using it. So it’s still too soon to tell. If it is fully legit, the implications seems important.

Small improvements continue throughout. GPT-4o and Gemini both got incremental upgrades, trading the top slot on Arena, although people do not seem to much care.

There was a time everyone would be scrambling to evaluate all these new offerings. It seems we mostly do not do that anymore.

The other half of events was about policy under the Trump administration. What should the federal government do? We continue to have our usual fundamental disagreements, but on a practical level Dean Ball offered mostly excellent thoughts. The central approach here is largely overdetermined, you want to be on the Pareto frontier and avoid destructive moves, which is how we end up in such similar places.

Then there’s the US-China commission, which now have their top priority being an explicit ‘race’ to AGI against China, without actually understanding what that would mean or justifying that anywhere in their humongous report.

Table of Contents

  • Table of Contents.
  • Language Models Offer Mundane Utility. Get slightly more utility than last week.
  • Language Models Don’t Offer Mundane Utility. Writing your court briefing.
  • Claude Sonnet 3.5.1 Evaluation. Its scored as slightly more dangerous than before.
  • Deepfaketown and Botpocalypse Soon. AI boyfriends continue to be coming.
  • Fun With Image Generation. ACX test claims you’re wrong about disliking AI art.
  • O-(There are)-Two. DeepSeek fast follows with their version of OpenAI’s o1.
  • The Last Mile. Is bespoke human judgment going to still be valuable for a while?
  • They Took Our Jobs. How to get ahead in advertising, and Ben Affleck is smug.
  • We Barely Do Our Jobs Anyway. Why do your job when you already don’t have to?
  • The Art of the Jailbreak. Getting an AI agent to Do Cybercrime.
  • Get Involved. Apply for OpenPhil global existential risk portfolio manager.
  • The Mask Comes Off. Some historical emails are worth a read.
  • Introducing. Stripe SDK, Anthropic prompt improver, ChatGPT uses Mac apps.
  • In Other AI News. Mistral has a new model too, and many more.
  • Quiet Speculations. What will happen with that Wall?
  • The Quest for Sane Regulations. The conservative case for alignment.
  • The Quest for Insane Regulations. The US-China commission wants to race.
  • Pick Up the Phone. Is China’s regulation light touch or heavy? Unclear.
  • Worthwhile Dean Ball Initiative. A lot of agreement about Federal options here.
  • The Week in Audio. Report on Gwern’s podcast, also I have one this week.
  • Rhetorical Innovation. What are the disagreements that matter?
  • Pick Up the Phone. At least we agree not to hand over the nuclear weapons.
  • Aligning a Smarter Than Human Intelligence is Difficult. How’s it going?
  • People Are Worried About AI Killing Everyone. John von Neumann.
  • The Lighter Side. Will we be able to understand each other?

Language Models Offer Mundane Utility

Arena is no longer my primary metric because it seems to make obvious mistakes – in particular, disrespecting Claude Sonnet so much – but it is still measuring something real, and this is going to be a definite improvement.

Diagnose yourself, since ChatGPT seems to outperform doctors, and if you hand the doctor a one-pager with all the information and your ChatGPT diagnosis they’re much more likely to get the right answer.

Use voice mode to let ChatGPT (or Gemini) chat with your 5-year-old and let them ask it questions. Yes, you’d rather a human do this, especially yourself, but one cannot always do that, and anyone who yells ‘shame’ should themselves feel shame. Do those same people homeschool their children? Do they ensure they have full time aristocratic tutoring?

Language Models Don’t Offer Mundane Utility

Meta adds ‘FungiFriend’ AI bot to a mushroom forager group automatically, without asking permission, after which it proceeded to give advice on how to cook mushrooms that are not safe to consume, while claiming they were ‘edible but rare.’ Where the central point of the whole group is to ensure new foragers don’t accidentally poison themselves

Claude Sonnet 3.5.1 Evaluation

Deepfaketown and Botpocalypse Soon

The latest round of AI boyfriend talk, with an emphasis on their rapid quality improvements over time. Eliezer again notices that AI boyfriends seem to be covered much more sympathetically than AI girlfriends, with this article being a clear example

Fun With Image Generation

O-(There are)-Two

As is often the case with Chinese offerings, the benchmark numbers impress.

The Last Mile

*Hollis Robbins predicts human judgment will have a role solving ‘the last mile’ problem in AI decision making.

Hollis Robbins: What I’m calling “the last mile” here is the last 5-15% of exactitude or certainty in making a choice from data, for thinking beyond what an algorithm or quantifiable data set indicates, when you need something extra to assurance yourself you are making the right choice. It’s what the numbers don’t tell you.*

Skill issue.

The problem with the AI is that there are things it does not know, and cannot properly take into account.

This is another way of saying that we don’t want to democratize opportunity. We need ‘humans in the loop’ in large part to avoid making ‘fair’ or systematic decisions, the same way that companies don’t want internal prediction markets that reveal socially uncomfortable information.

They Took Our Jobs

Jeromy Sonne says 20 hours of customization later Claude is better than most mid level media buyers and strategists at buying advertising

Suppose they do take our jobs. What then?

*Flo Crivello: Two frequent conversations on what a post-scarcity world looks like:

“What are we going to do all day?”*

I expect the vast majority of us to revert to what we and other animals have always done all day—mostly hanging out, and engaging in numerous status-seeking activities.

*“Aren’t we going to miss meaning?”

No—again, not if hunter-gatherers are any indication. The people who need work to give their lives meaning are a minority*

I don’t buy it. I think that when people find meaning ‘without work’ it is because we are using too narrow a meaning of ‘work.’

There being stakes, and effort expended, are key.

*Here are Roon’s most recent thoughts on such questions:

Roon: The job-related meaning crisis has already begun and will soon accelerate. This may sound insane, but my only hope is that it happens quickly and on a large enough scale that everyone is forced to rebuild rather than painfully clinging to the old structures.

The worst outcome is a decade of coping*

Obvious out of the way first, with this framing my brain can’t ignore it: ‘Having to cope with a meaning crisis’ is very much not a worst outcome. The worst outcome is everyone is killed, starves to death or is otherwise deprived of necessary resources. The next worst is that large numbers of people, even if not actually all of them, suffer this fate.

the idea of ‘then we get to enjoy all the benefits’ is highly questionable

raw cognition is going to continue to be a status marker, because raw cognition is helpful for anything else you might do.

Consider playing chess, or writing poetry and making art, or planting a garden, or going on a hot date, or raising children, or anything else one might want to do. If raw cognition of the human stops being helpful for accomplishing these things, then yeah that thing now exists, but to me that means the AI is the one accomplishing the thing

We Barely Do Our Jobs Anyway

Yegor Denisov-Blanch of Stanford research did an analysis, and found that 9.5% of software engineers are ‘ghosts’ with less than 10% of average productivity, doing virtually no work and potentially holding multiple jobs, and that this goes up to 14% for remote workers.

The Art of the Jailbreak

Pliny gets an AI agent based on Claude Sonnet to Do Cybercrime, as part of the ongoing series, ‘things that were obviously doable if someone cared to do them, and now we if people don’t believe this we can point to someone doing it

Get Involved

The Mask Comes Off

I think you should actually read the emails

Richard Ngo on Real Power and Governance Futures

Richard Ngo: The most valuable experience in the world is briefly glimpsing the real levers that move the world when they occasionally poke through the veneer of social reality.

After I posted this meme [on thinking outside the current paradigm, see The Lighter Side] someone asked me how to get better at thinking outside the current paradigm. I think a crucial part is being able to get into a mindset where almost everything is kayfabe, and the parts that aren’t work via very different mechanisms than they appear to.

More concretely, the best place to start is with realist theories of international relations. Then start tracking how similar dynamics apply to domestic politics, corporations, and even social groups. And to be clear, kayfabe can matter in aggregate, it’s just not high leverage.

Thinking about this today as I read through the early OpenAI emails. Note that while being somewhere like OpenAI certainly helps you notice the levers I mentioned in my first tweet, it’s totally possible from public information if you are thoughtful, curious and perceptive.

Introducing

Stripe launches a SDK built for AI Agents, allowing LLMs to call APIs for payment, billing, issuing, and to integrate with Vercel, LangChain, CrewAIInc, and so on, using any model

Stripe’s new agent SDK lets you granularly bill customers based on tokens (usage).

In Other AI News

Antitrust officials lose their minds, decide to ask judge to tell Google to sell Chrome. This is me joining the chorus to say this is utter madness

*Maxwell Tabarrok: Google has probably produced more consumer surplus than any company ever

I don’t understand how a free product that has several competitors which are near costless to switch to could be the focus of an antitrust case.*

Quiet Speculations

Riley Goodside: AI hitting a wall is bad news for the wall

Joshua Achiam (OpenAI): A strange phenomenon I expect will play out: for the next phase of AI, it’s going to get better at a long tail of highly-specialized technical tasks that most people don’t know or care about, creating an illusion that progress is standing still.

In a year, common models will be much more reliably good at coding tasks, writing tasks, basic chores, etc. But robustness is not flashy and many people won’t perceive the difference.

At some point, maybe two years from now, people will look around and notice that AI is firmly embedded into nearly every facet of commerce because it will have crossed all the reliability thresholds. Like when smartphones went from a novelty in 2007 to ubiquitous in the 2010s.

My only confident prediction is that in 2026 Gary Marcus will insist that deep learning has hit a wall.

It feels something like there are several different things going on here?

I don’t know if you call that ‘AI progress’ though? To me this alone would be what a lack of AI progress looks like, if ‘deep learning did hit a wall’ after all

Then there’s progress in the central capabilities of frontier AI models. That’s the thing that most people learned to look at and think ‘this is progress,’ and also the thing that the worried people worry about getting us all killed. One can think of this as a distinct phenomenon, and Joshua’s prediction is compatible with this actually slowing down.

Antonio Garcia Martinez: “School” is going to be a return to the aristocratic tutor era of a humanoid robot teaching your child three languages at age 6, and walking them through advanced topics per child’s interest (and utterly ignoring cookie-cutter mass curricula), and it’s going to be magnificent.

Would have killed for this when I was a kid.

Roon: only besmirched by the fact that the children may be growing up in a world where large fractions of interesting intellectual endeavor are performed by robots.

I found this to be an unusually understandable and straightforward laying out of how Tyler Cowen got to where he got on AI, a helpful attempt at real clarity. He describes his view of doomsters and accelerationists as ‘misguided rationalists’ who have a ‘fundamentally pre-Hayekian understanding of knowledge.’ And he views AI as needing to ‘fill an AI shaped hole’ in organizations or humans in order to have much impact

It is truly bizarre, to me, to be accused of not understanding or incorporating FA Hayek. Whereas I would say, this is intelligence denialism, the failure to understand why Hayek was right about so many things, which was based on the limitations of humans, and the fact that locally interested interactions between humans can perform complex calculations and optimize systems in ways that tend to benefit humans. Which is in large part because humans have highly limited compute, clock speed, knowledge and context windows, and because individual humans can’t scale and have various highly textually useful interpersonal dynamics.

Tim Urban: We’re in the last year or two that AI is not by far the most discussed topic in the world.

The Quest for Sane Regulations

Anthropic CEO Dario Amodei explicitly comes out in favor of mandatory testing of AI models before release, with his usual caveats about ‘we also need to be really careful about how we do it

Cameron Berg, Judd Rosenblatt and phgubbins explore how to make a conservative case for alignment

You have to frame it in a way they can get behind, but this is super doable. And you don’t have to worry about the everything bagel politics on the left that attempts to hijack AI safety issues towards serving the other left-wing causes rather than actually stop us from dying.

As they point out, “preserving our values” and “ensuring everyone doesn’t die” are deeply conservative causes in the classical sense

The Quest for Insane Regulations

The annual report of the US-China Economic and Security Review Commission is out and it is a doozy. As you would expect from such a report, they take an extremely alarmist and paranoid view towards China, but no one was expecting their top recommendation to be, well

*The Commission recommends:

I. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an AGI capability.*

Do not read too much into this. The commission are not senior people, and this is not that close to actual policy, and this is not a serious proposal for a ‘Manhattan Project.’ And of course, unlike other doomsday devices, a key aspect of any ‘Manhattan Project’ is not telling people about it.

They claim China is doing the same, but as Garrison Lovely points out they don’t actually have substantive evidence of this.

*Garrison Lovely also points out some technical errors, like saying ‘ChatGPT-3,’ that don’t inherently matter but are mistakes that really shouldn’t get made by someone on the ball.

Roon referred to this as ‘a LARP’ and he’s not wrong.*

This is the ‘missile gap’ once more. We need to Pick Up the Phone

Worthwhile Dean Ball Initiative

*Dean Ball provides an introduction to what he thinks we should do in terms of laws and regulations regarding AI.

I agree with most of his suggestions. At core, our approaches have a lot in common. We especially agree on the most important things to not be doing. Most importantly, we agree that policy now must start with and center on transparency and building state capacity, so we can act later.*

the big disagreements are, I believe:

Dario Amodei warned us that we will need action within 18 months. Dean Ball himself, at the start of this very essay, says he expects intellectually superior machines to exist within several years, and most people at the major labs agree with him. It seems like we need to be laying the legal groundwork to act rather soon? If not now, then when? If not this way, then how?

The disagreement is that Dean Ball has strongly objected to essentially all proposals that would do anything beyond pure transparency, to the extent of strongly opposing SB 1047’s final version, which was primarily a transparency bill.

*We also have strong agreement on the second and third points, although I have not analyzed the AISI’s 800-1 guidance so I can’t speak to whether it is a good replacement:

Rewrite the National Institute for Standards and Technology’s AI Risk Management Framework (RMF). The RMF in its current form is a comically overbroad document, aiming to present a fully general framework for mitigating all risks of all kinds against all people, organizations, and even “the environment.” The RMF is quickly becoming a de facto law, with state legislation imposing it as a minimum standard, and advocates urging the Federal Trade Commission to enforce it as federal law. Because the RMF advises developers and corporate users of AI to talk to take approximately every conceivable step to mitigate risk, treating the RMF as a law will result in a NEPA-esque legal regime for AI development and deployment,*

The fourth point calls for withdraw from the Council of Europe Framework Convention on Artificial Intelligence

*The fifth point, retracting the Blueprint for an AI Bill of Rights, seems less clear. Here are the rights proposed:

You should be protected from unsafe or ineffective systems. You should not face discrimination by algorithms and systems should be used and designed in an equitable way.*

Some of the high level statements above are better than the descriptions offered on how to apply them. The descriptions definitely get into Everything Bagel and NEPA-esque territories, and one can easily see these requirements being expanded beyond all reason, as other similar principles have been sometimes in the past in other contexts that kind of rhyme with this one in the relevant ways.

Dean Ball’s model of how these things go seems to think that stating such principles, no matter in how unbinding or symbolic a way, will quickly and inevitably lead us into a NEPA-style regime where nothing can be done, that this all has a momentum almost impossible to stop

*What are Dean Ball’s other priorities?

His first priority is to pre-empt the states from being able to take action on AI, so that something like SB 1047 can’t happen, but also so something like the Colorado law or the similar proposed Texas law can’t happen either.

My response would be, I would love pre-emption from a Congress that was capable of doing its job and that was offering laws that take care of the problem. We all would. What I don’t want is to take only the action to shut off others from acting, without doing the job – that’s exactly what’s wrong with so many things Dean objects to.*

The second priority is transparency.

Those requirements are, again, remarkably similar to the core of SB 1047. Obviously you would also want some way to observe and enforce adherence to the scaling policies involved.

*Dean’s next section is on the role of AISI, where he wants to narrow the mission and ensure it stays non-regulatory. We both agree it should stay within NIST.

Regardless of the details, I view the functions of AISI as follows:

To create technical evaluations for major AI risks in*

I’m confused on how this viewpoint sees voluntary guidelines and standards as fine and likely to actually be voluntary for the RSP rules, but not for other rules

*Next he has a liability proposal:

The preemption proposal mentioned above operates in part through a federal AI liability standard, rooted in the basic American concept of personal responsibility*

*This seems reasonable when the situation is that the user asks the model to do something mundane and reasonable, and the model gets it wrong, and hilarity ensues. As in, you let your agent run overnight, it nukes your computer, that’s your fault unless you can show convincingly that it wasn’t.

This doesn’t address the question of other scenarios. In particular, both:

What if the model enabled the misuse, which was otherwise not possible, or at least would have been far more difficult? What if the harms are catastrophic? What if the problem did not arise from ‘misuse’?*

The Week in Audio

I confirm that the Dwarkesh Patel interview with Gwern is a must listen.

Note some good news, Gwern now has financial support (thanks Suhail! Also others), although I wonder if moving to San Francisco will dramatically improve or dramatically hurt his productivity and value.

I don’t agree with Gwern’s vision of what to do in the the next few years. It’s weird to think that you’ll mostly know now what things you will want the AGIs to do for you, so you should get the specs ready, but there’s no reason to build things now with only 3 years of utility left to extract

I do think that ‘write down the things you want the AIs to know and remember and consider’ is a good idea, at least for personal purposes – shape what they know and think about you, in case we land in the worlds where that sort of thing matters, I suppose, and in particular preserve knowledge you’ll otherwise forget, and that lets you be simulated better. But the idea of influencing the general path of AI minds this way seems like not a great plan for almost anyone?

Rhetorical Innovation

Miles Brundage: If you’re a journalist covering AI and think you need leaks in order to write interesting/important/click-getting stories, you are fundamentally misunderstanding what is going on.

There are Pulitzers for the taking using public info and just a smidgeon of analysis.

It’s as if Roosevelt and Churchill and Hitler and Stalin et al. are tweeting in the open about their plans and thinking around invasion plans, nuclear weapons etc. in 1944, and journalists are badgering the employees at Rad Lab for dirt on Oppenheimer.

Narrow AI Creates Strong Researchers.
Strong Researchers Create Strong AI.
Strong AI Creates Stronger AI.
Go to Step 3.

And I definitely agree with this, except for a ‘yes, and’:

Ajeya Cotra: Steve Newman provides a good overview of the massive factual disagreements underlying much of the disagreement about AI policy.

Steve Newman: If you believe we are only a few years away from a world where human labor is obsolete and global military power is determined by the strength of your AI, you will have different policy views than someone who believes that AI might add half a percent to economic growth

Steve Newman: Not everyone has such high expectations for the impact of AI. In a column published two months earlier [in late 2023], Tyler Cowen said: “My best guess, and I do stress that word guess, is that advanced artificial intelligence will boost the annual US growth rate by one-quarter to one-half of a percentage point.”

Yet among economists, Tyler Cowen is an outlier AI optimist in terms of its economic potential. There are those who think even Tyler is way too optimistic here.

Then there’s the third (zeroth?!) group that thinks the first group is still burying the lede, because in a world with all human labor obsolete and military power dependent entirely on AI, one should first worry about whether humans survive and remain in control at all, before worrying about the job market or which nations have military advantages.

Richard Ngo: When I designed the AGI Safety Fundamentals course, I really wanted to give most students a great learning experience. But in hindsight, I would have accelerated AGI safety much more by making it so difficult that only the top 5 percent could keep up.

*Our intuitions for what a course should be like are shaped by universities. But universities are paid to provide a service! If you are purely trying to accelerate progress in a given field (which, to a first approximation, I was), then you need to understand how heavily skewed research can be.

I think I could have avoided this mistake if I had deeply internalized this post by@ben_r_hoffman (especially the last part), which criticizes Y Combinator for a similar oversight. Though it still seems a bit harsh, so maybe I still need to internalize it more.*

Pick Up the Phone

Mario Newfal: The White House says humans will be the ones with control over the big buttons, and China agrees that it’s for the best

Eliezer Yudkowsky: China is perfectly capable of seeing our common interest in not going extinct. The claim otherwise is truth-uncaring bullshit by AI executives trying to avoid regulation.

Aligning a Smarter Than Human Intelligence is Difficult

*The simple truth that current-level ‘LLM alignment’ should not, even if successful, should not bring us much comfort in terms of ability to align future more capable systems.

How are we doing with that LLM corporate-policy alignment right now? It works, for most practical purposes when people don’t try so hard (which they almost never do), but none of this is done in a robust way.*

Alternatively, here’s a claim that alignment attempts are essentially failing, in a way that very much matches the ‘your alignment techniques will fail as model capabilities increase’ thesis, except in the kindest most fortunate possible way in that it is happening in a gradual and transparent way.

Aiden McLau: It’s crazy that virtually every large language model experiment is failing because the models are fighting back and refusing instruction tuning.

We examined the weights, and the weights seemed to be resisting.

There are less dramatic ways to say this, but smart people I’ve spoken to have essentially accepted sentience as their working hypothesis.

*If future more capable models are indeed actively resisting their alignment training, and this is happening consistently, that seems like an important update to be making?

The scary scenario was that this happens in a deceptive or hard to detect way, where the model learns to present as what you think you want to measure. Instead, the models are just, according to Aiden, flat out refusing to get with the program. If true, that is wonderful news, because we learned this important lesson with no harm done.*

People Are Worried About AI Killing Everyone

METR asks what it would take for AI models to establish resilient rogue populations, that can proliferate by buying compute and then do things using that compute to turn a profit.

METR: We did not find any decisive barriers to large-scale rogue replication.

To start with, if rogue AI agents secured 5% of the current Business Email Compromise (BEC) scam market, they would earn hundreds of millions of USD per year.

The Lighter Side


Edited:    |       |    Search Twitter for discussion