(2025-06-10) ZviM Give Me A Reasoning Model

Zvi Mowshowitz: Give Me a Reason(ing Model). Are we doing this again? It looks like we are doing this again. This time it involves giving LLMs several ‘new’ tasks including effectively a Tower of Hanoi problem, asking them to specify the answer via individual steps rather than an algorithm then calling a failure to properly execute all the steps this way (whether or not they even had enough tokens to do it!) an inability to reason.

The actual work in the paper seems by all accounts to be fine as far as it goes if presented accurately, but the way it is being presented and discussed is not fine.

Not Thinking Clearly

Ruben Hassid (12 million views, not how any of this works): BREAKING: Apple just proved AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini don’t actually reason at all. They just memorize patterns really well. Here’s what Apple discovered: (hint: we’re not as close to AGI as the hype suggests) Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games

Thinking Again: Ryan Greenblatt: This paper doesn’t show fundamental limitations of LLMs: – The “higher complexity” problems require more reasoning than fits in the context length (humans would also take too long). – Humans would also make errors in the cases where the problem is doable in the context length.

The team might be good, but in this case you don’t blame the reaction on the media. The abstract very clearly is laying out the same misleading narrative picked up by the media. You can wish for a media that doesn’t get fooled by that, but that’s not the world we live in, and the blame is squarely on the way the paper presents itself.

Inability to Think

Colin Fraser I think gives us a great and clean version of the bear case here?

It seems important that this doesn’t follow?

In Brief

To summarize, this is tough but remarkably fair: Charles Goddard: 🤯 MIND-BLOWN! A new paper just SHATTERED everything

What’s In a Name

Also the periodic reminder that asking ‘is it really reasoning’ is a wrong question.
Yuchen Jin: Ilya Sutskever, in his speech at UToronto 2 days ago: “The day will come when AI will do all the things we can do.” “The reason is the brain is a biological computer, so why can’t the digital computer do the same things?” It’s funny that we are debating if AI can “truly think” or give “the illusion of thinking”, as if our biological brain is superior or fundamentally different from a digital brain. Ilya’s advice to the greatest challenge of humanity ever: “By simply looking at what AI can do, not ignoring it, that will generate the energy that’s required to overcome the huge challenge.”

a different name for what is happening would dissolve the dispute, then who cares?
Colin Fraser: The labs are the ones who gave test time compute scaling these grandiose names like “thinking” and “reasoning”. They could have just not called it that.
I don’t see those names as grandiose. I see them as the best practical descriptions in terms of helping people understand what is going on.


Edited:    |       |    Search Twitter for discussion