(2024-10-03) Zvim Ai84 Better Than A Podcast

Zvi Mowshowitz: AI #84: Better Than a Podcast. Introduction: Andrej Karpathy continues to be a big fan of NotebookLM, especially its podcast creation feature. There is something deeply alien to me about this proposed way of consuming information, but I probably shouldn’t knock it (too much) until I try it? So I figured: What could be a better test than generating a podcast out of this post

In some ways I was impressed. The host voices and cadences are great, there were no mistakes, absurdities or factual errors, everything was smooth. In terms of being an actual substitute? Yeah, no. It did give me a good idea of which ideas are coming across ‘too well’ and taking up too much mindspace, especially things like ‘sci-fi.’ I did like that it led with OpenAI issues, and it did a halfway decent job with the parts it did discuss. But this was not information dense at all, and no way to get informed.

Table of Contents:

  • Introduction. Better than a podcast.
  • Language Models Offer Mundane Utility. You’d love to see it.
  • Language Models Don’t Offer Mundane Utility. Let’s see what happens.
  • Copyright Confrontation. Zuck to content creators: Drop dead.
  • Deepfaketown and Botpocalypse Soon. A word of bots, another for the humans.
  • They Took Our Jobs. Software engineers, now super productive, our price cheap.
  • The Art of the Jailbreak. Encode it in a math puzzle.
  • Get Involved. UK AISI is hiring three societal impacts workstream leads.
  • Introducing. AlphaChip, for designing better AI chips, what me worry?
  • Advanced voice for everyone, if you want it enough.
  • In Other AI News. Anthropic revenue is on the rise.
  • The Mask Comes Off. The man who sold the world… hopefully to himself.
  • Quiet Speculations. Perplexity for shopping?
  • The Quest for Sane Regulations. The suggestion box is open.
  • The Week in Audio. Peter Thiel has some out there ideas, yo.
  • Rhetorical Innovation. Would you go for it, or just let it slip until you’re ready?
  • Remember Who Marc Andreessen Is. For reference, so I can remind people later.
  • *A Narrow Path. If it would kill everyone, then don’t let anyone ***ing build it.
  • Aligning a Smarter Intelligence is Difficult. Pondering what I’m pondering?
  • The Wit and Wisdom of Sam Altman. To do, or not to do? To think hard about it?
  • The Lighter Side. 10/10, no notes.

Language Models Offer Mundane Utility

Sarah Constantin requests AI applications she’d like to see. Some very cool ideas in here, including various forms of automatic online content filtering and labeling. I’m very tempted to do versions of some of these myself when I can find the time, especially the idea of automatic classification of feeds into worthwhile versus not. As always, the key is that if you are going to use it on content you would otherwise need to monitor fully, hitting false negatives is very bad. But if you could aim it at sources you would otherwise be okay missing, then you can take a hits-based approach.

Language Models Don’t Offer Mundane Utility

Llama 3.2 ‘not available for download’ in the EU, unclear exactly which regulatory concern or necessary approval is the bottleneck.

When measuring something called ‘diagnostic reasoning’ when given cases to diagnose GPT-4 alone (92%) did much better than doctors (73%) and also did much better than doctors plus GPT-4 (77%). So by that measure, the doctors would be much better fully out of the loop and delegating the task to GPT-4. Ultimately, though, diagnosis is not a logic test, or a ‘match the logic we think you should use’ test. What we mostly care about is accuracy. GPT-4 had the correct diagnosis in 66% of cases, versus 62% for doctors.

My strong guess is that doctors learn various techniques that are ‘theoretically unsound’ in terms of their logic, or that take into account things that are ‘not supposed to matter’ but that do correlate with the right answer

this suggests that one future weakness of AIs will be if we succeed in restricting what things they can consider, actually enforcing a wide array of ‘you are not allowed to consider factor X’ rules that humans routinely pay lip service to and then ignore.

in this post that the measurement was diagnostic reasoning & not final diagnoses.

Jonathan Chen (study author): Provocative result we did NOT expect. We fully expected the Doctor + GPT4 arm to do better than Doctor + "conventional" Internet resources. Flies in the face of the Fundamental Theorem of Informatics (Human + Computer is Better than Either Alone). (cyborg)

It is already well known that if the AI is good enough, the humans will in many settings mess up and violate the Fundamental Theorem of Informatics

Copyright Confrontation

Mark Zuckerberg was asked to clarify his position around content creators whose work is used to create and train commercial products

So you’re going to give them a practical way to exercise that option, and if they say no and you don’t want to bother paying them or they ask for too much money then you won’t use their content? Somehow I doubt that is his intention.

Deepfaketown and Botpocalypse Soon

Levelsio predicts the social media platform endgame of bifurcation. You have free places where AIs are ubiquitous, and you have paid platforms with only humans

I agree that some form of gatekeeping seems inevitable. We have several reasonable choices.

Various forms of proof of identity (digital identity) also work. You don’t need Worldcoin. Anything that is backed by a payment of money or a scarce identity will be fine

This all seems highly survivable, once we bring ourselves to care sufficiently. Right now, the problem isn’t so bad, but also we don’t care so much.

I for one downplay them because if the problem does get bad we can ‘fix them in post.’

They Took Our Jobs

The latest version of a common speculation on the software engineer job market, which is super soft right now, taking things up a notch.

  • alz: reminder, the entry-level tech job market is still totally cooked, like 4.0's from Berkeley are getting 0 job offers.*

There is clearly an AI-fueled flooding-the-zone and faking-the-interviews application crisis.

The most obvious is indeed payment.

The problem should be self-limiting. If the job market gets super soft, that means there will be lots of good real candidates out there. Those candidates, knowing they are good, should be willing to send costly signals. This can mean ‘build cool things’ (maker)

How long until we no longer need schools? (schooling)

There are two clocks ticking here.
As AI improves, the AI tutors get better.
Also, as the child gets older, the relative value of the AI tutor improves.

I think that, today, an average 16 year old would learn better at home with an AI tutor than at a typical school, even if that ‘AI tutor’ was simply access to AIs like Gemini, NotebookLM, Claude and ChatGPT plus an AI coding assistant.

Specialization is even better, but not required. You combine the AI with textbooks and other sources, and testing, with ability to reach a teacher or parent in a punch, and you’re good to go.

Of course, the same is true for well-motivated teens without the AI. The school was already only holding them back and now AI supercharges their independent studies.

Six years from now, I don’t see how that is even a question. Kids likely still will go to schools, but it will be a wasteful anachronism

The question is, will a typical six year old, six years from now, be at a point where they can connect with the AI well enough for that to work? My presumption, given how well voice modes and multimodal with cameras are advancing, is absolutely yes, but there is some chance that kids that young will be better off in some hybrid system for a bit longer. If the kid is 10 at that point? I can’t see how the school makes any sense.

The Art of the Jailbreak

Get Involved

Introducing

AlphaChip, Google DeepMind’s AI for designing better chips with which to build smarter AIs, that they have decided for some bizarre reason should be open sourced. That would not have been my move

OpenAI Dev Day

OpenAI used Dev Day to ship new tools for developers. In advance, Altman boasted about some of the progress

*from last devday to this one:

98% decrease in cost per token from GPT-4 to 4o mini

Prompt Caching is automatic now for prompts above 1,024 tokens, offering a 50% discount for anything reused

This contrasts with Claude, where you have to tell it to cache but the discount you get is 90%.

What does it mean to have a ‘realtime API’? It means exactly that, you can use an API to sculpt queries by the user while they’re talking in voice mode. The intent is to let you build something like ChatGPT’s Advanced Voice Mode within your own app, and not requiring stringing together different tools for handling inputs and outputs.

The first thing I saw someone else build was called Live Roleplays, an offering from Speak to help with language learning, which OpenAI demoed on stage. This has always been what I’ve seen as the most obvious voice mode use case.

We do need to lower the price a bit, right now this is prohibitive for most uses. But if there’s one thing AI is great at, it’s lowering the price

$0.25 per minute of output. You can do 8-10x cheaper and lower latency with @cartesia_ai right now.

McKay Wrigley: Realtime AI will change everything. Computers won’t just be tools. They will be 200 IQ coworkers who will actively help you with any task - and you will have entire teams of them... OpenAI is building the nervous system for AGI, and it’s available via API. Take advantage of it.

I know I’m a broken record on this, but once again, who stands to benefit the most from this?
PEOPLE WHO CAN CODE.
(Learn To Code)

The presentation also spent a bunch of time emphasizing progress on structured outputs and explaining how to use them properly, so you get useful JSONs.

Sam Altman: "We have an approach of: figure out where the capabilities are going, then work to make that system safe. o1 is our most capable model ever but it's also our most aligned model ever."

Is that what the Preparedness Framework says to do? This makes the dangerous assumption that you can establish the capabilities, and then fix the safety issues later in post.

Sam: "I think worrying about the sci-fi ways that this all goes wrong is also very important. We have people thinking about that."

We’ve gone so far backward that Sam Altman needs to reassure us that they at least have some people ‘thinking about’ the ways this all goes wrong, while calling them ‘sci-fi ways’ in order to delegitimize them.

Also this: “Iterative deployment is our best safety system we have.”

What are they going to do with the AGIs?

Mission: Build safe AGI. If the answer is a rack of GPUs, they'll do that. If the answer is research, they'll do that.

This must be some use of the word ‘safe’ that I wasn’t previously aware of?

after using it for a bit I will say I am ‘definitively smarter’ than o1. Perhaps I am prompting it badly but I have overall been disappointed in o1.

In Other AI News

For the consumer product, Claude is failing to break through to visibility, and it seems unrelated to product quality.

The best part of chatbot subscriptions is the profit margins are nuts. Most people, myself included, are paying miles more per token for subscriptions than we would pay for the API.

BioNTech and Google DeepMind build biological research assistant AIs, primarily focused on predicting experimental outcomes, presumably to choose the right experiments. For now that’s obviously great, the risk concerns are obvious too.

The Mask Comes Off

Matthew Yglesias: OpenAI’s creators hired Sam Altman, an extremely intelligent autonomous agent, to execute their vision of x-risk conscious AGI development for the benefit of all humanity but it turned out to be impossible to control him or ensure he’d stay durably aligned to those goals. (alignment)

Sigal Samuel writes at Vox that ‘OpenAI as we knew it is dead’ pointing out that this consolidation of absolute power in Altman’s hands and abandonment of the non-profit mission involves stealing billions in value from a 501c(3) and handing it to investors and especially to Microsoft

There are rules. Rules that apply to ‘the little people.’

To get there, it will have to deal with regulatory requirements in at least two states, determine how to award equity in the for-profit company, and split assets with the nonprofit entity, which will continue to exist.

One problem will be antitrust attention, since Microsoft had been relying on OpenAI’s unique structure to fend off such complaints.

I think the antitrust concerns are bogus and stupid, but many people seem to care.

The bigger question is, what happens to OpenAI’s assets? The more complicated part is what would happen to OpenAI’s assets. When such a conversion takes place, it can’t simply shift assets from a nonprofit to a for-profit.

Gwern, who predicted Mira’s departure, offered further thoughts a few months ago on the proposition that OpenAI has been a dead or rotting organization walking for a while now, and is rapidly losing its lead.

Current and former employees say OpenAI has rushed product announcements and safety testing, and lost its lead over rival AI developers. They say Altman has been largely detached from the day-to-day

The company, which has grown to 1,700 employees from 770 last November, this year appointed its first chief financial officer and chief product officer.

The majority of OpenAI employees have been hired since the Battle of the Board, and that would be true even if no one had left. That’s an extreme level of growth. It is very difficult to retain a good culture doing that. One likely shift is from a research culture to a product-first culture.

It’s also hard not to be concerned about the concrete details of the safety protocols around the release of GPT-4o:
Executives wanted to debut 4o ahead of Google’s annual developer conference and take attention from their bigger rival.
The safety staffers worked 20 hour days, and didn’t have time to double check their work

after the model launched, people familiar with the project said a subsequent analysis found the model exceeded OpenAI’s internal standards for persuasion—defined as the ability to create content that can persuade people to change their beliefs and engage in potentially dangerous or illegal behavior.

In Nate Silver’s book, there is a footnote that Altman told Silver that self-improving AI is ‘really scary’ and that OpenAI isn’t pursuing it. This is a highly bizarre way to make a statement that contradicts OpenAI’s clearly stated policies, which include using o1 (aka Strawberry) to do AI research, and the direct pursuit of AGI.

So this quote shows how much Altman is willing to mislead.

Quiet Speculations

Dreaming Tulpa reports they’ve created smart glasses that automatically snap photos of people you see, identifies them, searches online and tells you tons of stuff about them, like phone number and home address, via streaming the camera video to Instagram

Alas, there are then the privacy concerns. If you make all of this too smooth and too easy, it opens up some malicious and anti-social use cases as well.

The good news, I think, is that there is not that much overlap in the Venn diagram between ‘things you would want to know about people’ and ‘things you would want to ensure other people do not know.’ It seems highly practical to design a product that is a win-win, that runs checks and doesn’t share certain specific things like your exact address or your social security number?

The Quest for Sane Regulations

Now that SB 1047 has been vetoed, but Newsom has said he wants us to try again with something ‘more comprehensive,’ what should it be? As I explained on Tuesday (recommended if you haven’t read it already), Newsom’s suggested approach of use-based regulation is a recipe for industry strangulation without helping with risks that matter, an EU-style disaster.

The Week in Audio

From August: Peter Thiel talked to Joe Rogan about a wide variety of things, and I had the chance to listen to a lot more of it. His central early AI take here is bizarre. He thinks passing the Turing Test is big, with his justification largely being due to how important we previously thought it was, which seems neither here nor there.

We agree that current Turing-level AIs are roughly ‘internet big’ (~8.0 on the Technological Richter Scale) in impact if things don’t advance from here, over the course of several decades

The weird part is where he then makes this more important than superintelligence, or thinks this proves superintelligence was an incorrect hypothesis.

Peter then takes us on a wild ride through many other topics and unique opinions. He’s always fun and interesting to listen to, even (and perhaps especially) the parts where he seems utterly wrong

Rhetorical Innovation

Robin Hanson: ALL innovation changes the world without democratic consent.

Eliezer Yudkowsky: My problem with ASI is not that it will undemocratically kill everyone, but that it will kill everyone

Innovation does not mean you automatically need permission. It also does not mean you have or should have a free pass to change the world however you like. Robin and I would both draw the line to give permission to more things than America’s status quo, and indeed I expect to oppose many AI regulations upon mundane AI, starting with much of the EU AI Act

I’m also strongly with Yudkowsky here, not Leahy. My problem with everyone dying undemocratically is mostly the dying part, not the undemocratic one

our founders knew this principle well. The US Constitution is in large part designed to protect us from the majority doing various highly dumb things

Your periodic reminder: Better start believing in science fiction stories, dear reader, you’re in one - regardless of how much additional AI progress we see

Never mind o1. All the fictional characters in most of the science fiction I’ve read or seen over the years would be blown away by at least one of GPT-4 or what you can do with a smartphone without AI, often by both.

Robert Miles: People are starting from a prior in which ‘[AIs] are safe until you give me an airtight case for why they're dangerous.’ This framing is exhausting.

If you're building an AGI, it's like building a Saturn V rocket [but with every human on it]. It's a complex, difficult engineering task, and you're going to try and make it aligned, which means it's going to deliver people to the moon and home again.

People ask “why assume they won't just land on the Moon and return home safely?"

And I'm like, because you don't know what you're doing!

If you try to send people to the moon and you don't know what you're doing, your astronauts will die.

Eliezer Yudkowsky: The big issue in aligning superintelligence is that, if you screw up enough, you cannot repair your mistake. The ASI will not let you repair it. (alignment)

If you can't repair a thing past a certain time, this alone will make an easy engineering project into a Very Hard Problem

Roon offers wise words (in an unrelated thread), I fully endorse:
Roon: A true accelerationist feels their heart beat faster when they stare into the fog of war. The stomach lurch from the vertigo of science fiction. The courage of someone changing their whole life knowing it could go sideways. Anyone else is a larping idiot. People who are wearing blindfolds accelerating into walls dissociated from the real world in an amphetamine haze with nothing precious to gain or lose are shuffling zombies that have given up their soul to the great replicator.

Remember Who Marc Andreessen Is

*There is a school of thought that anything opposed to them is 1984-level totalitarian.

Marc Andreessen, and to a lesser extent Paul Graham, provide us with a fully clean examples this week of how his rhetorical world works and what it means when they say words. So I wanted to note them for future reference*

A Narrow Path

A Narrow Path is a newly written plan for allowing humanity to survive the path to superintelligence. Like the plan or hate the plan, this at least is indeed a plan, that tries to lay out a path that might work

I do agree that this much is clear: Until such time as we figure out how to handle superintelligence in multiple senses, building superintelligence would probably be collective suicide

The problem is: What we definitely cannot deal with is that once we build AGI, the world would rapidly build ASI, one way or another.

when interacting with the world and being given tools, what AI can we be confident will stay ‘bounded’? They suggest this can happen with safety justifications. It’s going to be tough.

But it is a great exercise - instead of asking ‘what can we in practice hope to get done right now?’ they instead ask a different question ‘where do we need to go’ and then ‘what would it take to do something that would actually get there?’

One should respond on that level. Debate the logic of the path.

Either argue it is insufficient, or it is unnecessary, or that it flat out won’t work or is not well defined, or suggest improvements, point out differing assumptions and cruxes, including doing that conditional on various possible world features.

The key is you must pick one of these: 0. Building a superintelligence under current conditions will turn out fine. 0. No one will build a superintelligence under anything like current conditions. 0. We must prevent at almost all costs anyone building superintelligence soon.

If you know you won’t be able to bite either of those first two bullets? Then it’s time to figure out the path to victory, and talk methods and price

Aligning a Smarter Than Human Intelligence is Difficult

the model needs to be smart enough to be able to get a useful answer out of a human-style Chain of Thought, without being smart enough to no longer get a useful answer out of a human-style Chain of Thought. And definitely without it being smart enough that it’s better off figuring out the answer and then backfilling in a Chain of Thought to satisfy the humans giving the feedback, a classic alignment failure mode.

Davidad: Remember folks, the more capable the base model (beyond about 13B-34B), the less the “reasoning trace” serves as an effective interpretability tool for the true causes of the final answer. UNLESS the final answer is produced only via running formal methods on the reasoning…

I think Davidad is correct here.

The Wit and Wisdom of Sam Altman

Always remember to reverse any advice you hear, including the advice to reverse any advice you hear: Sam Altman: You should have a very high bar for doing anything but thinking about what to work on…

My experience is that almost no one gets this correct. People usually do one of:

  • Do something reasonably random without thinking about what to do.
  • Do a ton of thinking about what to do, stay paralyzed and do nothing.
  • Do a ton of thinking about what to do, realize they are paralyzed or out of time and money, and end up doing something reasonably random instead.*
  • Do a ton of thinking, buy some abstract argument, and do something unwise.
  • Overthink it, and do something unwise.

The good news is, even your system-1 instinctive guess on whether you are doing too little or too much thinking versus doing is almost certainly correct.


Edited:    |       |    Search Twitter for discussion