(2024-10-11) Marcus On AI
Gary Marcus on AI (GenAI). ...a new article on LLMs from six AI researchers at Apple: "we found no evidence of formal reasoning in language models .... Their behavior is better explained by sophisticated pattern matching--so fragile, in fact, that changing names can alter results by ~10%!"
One particularly damning result was a new task the Apple team developed, called GSM-NoOp
This kind of flaw, in which reasoning fails in light of distracting material, is not new. Robin Jia Percy Liang of Stanford ran a similar study, with similar results, back in 2017 (which Ernest Davis and I quoted in Rebooting AI, in 2019...)
There is just no way you can build reliable agents on this foundation, where changing a word or two in irrelevant ways or adding a few bit of irrelevant info can give you a different answer.
Another manifestation of the lack of sufficiently abstract, formal reasoning in LLMs is the way in which performance often fall apart as problems are made bigger
We can see the same thing on integer arithmetic. Fall off on increasingly large multiplication problems has repeatedly been observed, both in older models and newer models
Failure to follow the rules of chess is another continuing failure of formal reasoning
The inability of standard neural network architectures to reliably extrapolate -- and reason formally -- has been the central theme of my own work back to 1998 and 2001, and has been a theme in all of my challenges to deep learning, going back to 2012, and LLMs in 2019.
What I argued in 2001, in The Algebraic Mind, still holds: symbol manipulation, in which some knowledge is represented truly abstractly in terms of variables and operations over those variables, much as we see in algebra and traditional computer programming, must be part of the mix. Neurosymbolic AI -- combining such machinery with neural networks - is likely a necessary condition for going forward.
Edited: | Tweet this! | Search Twitter for discussion