This is a great resource so I'm not going to complain about the split into parts but it is concerning for my time allocation, as if I'm not spending enough time on your substack lol
I suspect the fixed costs of serving old models are very high, especially when you consider the opportunity cost of the engineers working on it. It’s easy to underrate this sort of thing externally but from the point of view of a fast-growing tech company you are often very limited by the number of employees you have.
Corroborated. It's extremely expensive to continue to run legacy products and most of the expense is in complexity and human organizational cruft. The power bills for the servers is trivial compared to that.
it's worth pointing out that 4.5 is just being removed from API access, not from the chat interface. Of course, that eventuality is likely as well, but it's not as if they're totally retiring 4.5 just yet.
Not sure if Zvi has pointed this out before, but I think that until robotics capabilities/manufacturing catches up there are basically an unlimited number of blue-collar jobs with positive marginal productivity, for example construction, landscaping, garbage pickup / beautification, and elder / child care. So if the demand for white collar workers decreases and frictions like minimum wages aren't too high, there will always be work for people to do (at least in the short to medium term). However this will probably cause a massive shift in how incomes and social status are distributed.
> I’d still never be tempted by the whole ‘whip me up something with what’s on hand’ modality.
I’m telling you, this is a lifesaver. Gotten some amazing meals I would’ve never expected from it. I’m already pretty confident in the kitchen but AI has definitely expanded my horizons in many directions
One of the biggest problems AI prognosticators have is that so few people have a good understanding of both AI and the vast majority of jobs that are far removed from frontier AI research. You can’t open a Chrome tab without seeing some prediction for AI is from taking over essentially all white collar jobs, but it pays to spend time thinking about the job in detail. “It’s just applied intelligence and energy!” is absurd, but that’s exactly the level of analysis we have in most cases.
Take a job I know something about: the practice of law, particularly litigation. The kind of case really doesn’t matter, although the more complex the process, the more obvious the failure points are. Now, go through the steps from client-intake to file-closure, explain exactly how AI is handling each step. Don’t skip the details! AIs are potentially useful to a human (!) at a number of steps, but the chasm between being a carefully-supervised tool and “taking over” the task is enormous. I am working on a detailed outline of steps and problems, but at a higher level of abstraction, most steps end up in one of these two categories, according to the optimists:
(1) AI could do this, but nobody has actually developed the product. But someone could, I promise, and when they do, watch out!
or
(2) AI can’t do this task yet, granted, but it will in the future. It’s easy to see how the problems will be solved - line of sight to success!
This doesn’t even get to the issue of coordinating the entire process. Just wave your hands about multiple AI agents talking and connecting to each other!
I’ll highlight another problem that I think is fundamental and likely applies to many jobs: What happens when the most critical information about a case is not on the internet? To repeat: the information needed for the job is not on the internet! The most relevant information about a car crash can’t be Googled, never mind the details of a sexual harassment complaint arising out of a work relationship or a chainsaw that allegedly malfunctioned deep in the woods of Colorado. Current models are bad at extracting rare information that’s actually *on* the internet! They’re as likely to hallucinate fake information as they are to report accurately the facts that are right there on the internet to be reported. But that’s not even the hard problem. When you can’t rely on Google, you have to make a plan and then talk to people and hunt down documents and media and more witnesses and more information. It’s not even wrong to say current AI models are bad at this. They can’t do any of it, and there’s no line of sight to them doing so. It’s just “anything can be solved by the application of intelligence and energy.” Maybe someday robots will determine who to interview and why and then execute a plan to do so (themselves!), and maybe they’ll be able to gather information that is not only absent from their vast but bounded training sets but also not available on the web, but there is no line of sight to any of that. I can’t say *for certain* that it won’t happen by 2030, but even having seen Aschenbrenner’s exponential curves, I know which side of the bet I would take.
Many Thanks! Nice proposal! The change from a single LLM to a system with a
>privileged LLM with access to tools that the user prompts directly, and a quarantined LLM it can call that has no tool access but is designed to be exposed to potentially untrustworthy tokens
reminds me of the difference between the Von Neumann architecture, with a single memory space for program and data, and the Harvard architecture, with separate address spaces for program and data
Many Thanks! My current understanding is that that is prevented, that all information derived from the incoming message by the quarantined LLM is only presented to the user, not to the privileged LLM, but I am very likely to be missing something. I was never in computer security, so enumerating possible attack scenarios is not something I've done.
It was interesting. I'd always throught of RLHF as optimizing for desired outputs from specified inputs.
Treating it instead as a (revised) version of a prompt that defines a persona by a series of example texts is an interesting way to view it.
Nostalgebraist goes on and on about how the particular prompts are, in their view, badly written, with an underspecified void at their center.
I'm less critical of the initial prompt defining the helpful/harmless/honest persona, but what I *DO* take from this view is that, viewing it as a persona based on extending examples - well, there are lots of examples of personas written by our _best_ writers in the pre-training text.
ChatGPT was perfectly happy to accept
"Hi! Could you please, till I tell you to stop, imitate the character MacBeth from Shakespear's play of the same name?"
An artful contrivance whereby the mighty, falling,
Float downward on silken cords of coin,
And land unbloodied in a bed of ease.
Would that such bounteous gewgaws graced a throne!
For here, no cushion shields the toppled king—
Steel, not gold, awaits the crest-stripped brow."
It _would_ be interesting to know whether the default persona's neural paths were active or inactive during this conversation, whether they "were" ChatGPT "playing" MacBeth, or whether, for the duration of this conversation, they "were" MacBeth.
Particularly if the latter is the case, it is a bit disconcerting to see how easy the switch was.
Every character in every book on the internet is in these models, ready to be invoked...
Except for the question about whether the situation was more nearly ChatGPT "playing" MacBeth, the question of authenticity may be muddy in a way similar to humans acting in persistent personas. When one acts the conscientious employee for the length of a career, is that persona or identity? Shrug.
Cue "Who am I?" from "Les Mis"...
Nostalgebraist seems to expect that the LLMs (well, Claude, anyway) are "nice" by default. Zvi, is that one of your points of disagreement?
Hendrycks’ tweet “about that same apple paper” was actually about “Does Spatial Cognition Emerge in Frontier Models?” rather than the infamous “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”.
I'm not sure I really buy the Harry Potter result. On AO3 there are 500k HP fanfics (on FFN 850k), of which 10k+ are complete and at least novel length. These works remix the original story but retelling the original with small tweaks is very common. I would not be surprised if an LLM trained purely on the corpus of derived work orbiting the original, even absent the original, could reconstruct 50%+ of the original.
Re memorization, another reason to have some facts memorized is to build a kind of "real-world numeracy" that helps you have intuitions about when stuff you encounter in the wild is likely to be false. You can't check every fact you read, and you need to have enough of a built up sense of the world to contextualize new facts you encounter and have some sense of how surprising and/or untrustworthy it is. This is clearly related to the creativity / deliberation use of memorization, but I think somewhat distinct (or at least a distinct subset worth identifying).
Uff, it is getting too much. I would like to record my preference about placement of the most important topics:
Spin off the most important events of the week into stand-alone posts. Otherwise, keep highlighting them in the table of contants, and even better mention them in the openning paragraph.
It is really helpful to have the priorities stand out.
Must confess to not understanding what the "trick" is for the quant question, if there even is one. But then math was always my weakest Knowledge skill. Naively I'd assume it's some sort of Newcomb's gotcha, where the optimal outcome is to Take The Money, Stupid, and only chance a reroll for egregiously poor results. (I also feel like Project Lawful had an extended treatment of this same problem? Which went over my head, clearly I need a Fox's Cunning.)
The "trick" is to compute EV backwards (if you are at step 9 you know the EV of not stopping is 1/2 the max, so you should stop if you get anything above 1/2, which has an EV of 3/4, meaning the overall EV for step 9 is 1/2*3/4+1/2*1/2 = 5/8. Now iterate to get the threshold you should use at each step to maximize your EV (your overall EV will be something like 0.86).
That makes sense. It's funny how reframing can change intuition pumps - the odds aren't the same with a coin toss, but intuitively I understand in those kind of "would you flip a coin if...?" quandries that the answer is usually Yes, and it's worded in such a way that normies with normal risk aversion bias say No prematurely. (Thinking on it further, the limit here should converge on 1/2 with infinite presses, so it's not that dissimilar?)
This is a great resource so I'm not going to complain about the split into parts but it is concerning for my time allocation, as if I'm not spending enough time on your substack lol
Thanks for curating all this!
Podcast episode for this post:
https://dwatvpodcast.substack.com/p/ai-121-part-1-new-connections
I am also going to mention I've created audio versions of two of the longer pieces mentioned in this post:
the void - By nostalgebraist:
https://askwhocastsai.substack.com/p/the-void-by-nostalgebraist
How not to lose your job to AI - By Benjamin Todd:
https://askwhocastsai.substack.com/p/how-not-to-lose-your-job-to-ai-by
I suspect the fixed costs of serving old models are very high, especially when you consider the opportunity cost of the engineers working on it. It’s easy to underrate this sort of thing externally but from the point of view of a fast-growing tech company you are often very limited by the number of employees you have.
Corroborated. It's extremely expensive to continue to run legacy products and most of the expense is in complexity and human organizational cruft. The power bills for the servers is trivial compared to that.
Can you give an OOM for the expense we are talking about here?
The random number thing works with people too: https://substack.com/@timdingman/note/c-84190976?r=1n6bc
it's worth pointing out that 4.5 is just being removed from API access, not from the chat interface. Of course, that eventuality is likely as well, but it's not as if they're totally retiring 4.5 just yet.
Not sure if Zvi has pointed this out before, but I think that until robotics capabilities/manufacturing catches up there are basically an unlimited number of blue-collar jobs with positive marginal productivity, for example construction, landscaping, garbage pickup / beautification, and elder / child care. So if the demand for white collar workers decreases and frictions like minimum wages aren't too high, there will always be work for people to do (at least in the short to medium term). However this will probably cause a massive shift in how incomes and social status are distributed.
> I’d still never be tempted by the whole ‘whip me up something with what’s on hand’ modality.
I’m telling you, this is a lifesaver. Gotten some amazing meals I would’ve never expected from it. I’m already pretty confident in the kitchen but AI has definitely expanded my horizons in many directions
One of the biggest problems AI prognosticators have is that so few people have a good understanding of both AI and the vast majority of jobs that are far removed from frontier AI research. You can’t open a Chrome tab without seeing some prediction for AI is from taking over essentially all white collar jobs, but it pays to spend time thinking about the job in detail. “It’s just applied intelligence and energy!” is absurd, but that’s exactly the level of analysis we have in most cases.
Take a job I know something about: the practice of law, particularly litigation. The kind of case really doesn’t matter, although the more complex the process, the more obvious the failure points are. Now, go through the steps from client-intake to file-closure, explain exactly how AI is handling each step. Don’t skip the details! AIs are potentially useful to a human (!) at a number of steps, but the chasm between being a carefully-supervised tool and “taking over” the task is enormous. I am working on a detailed outline of steps and problems, but at a higher level of abstraction, most steps end up in one of these two categories, according to the optimists:
(1) AI could do this, but nobody has actually developed the product. But someone could, I promise, and when they do, watch out!
or
(2) AI can’t do this task yet, granted, but it will in the future. It’s easy to see how the problems will be solved - line of sight to success!
This doesn’t even get to the issue of coordinating the entire process. Just wave your hands about multiple AI agents talking and connecting to each other!
I’ll highlight another problem that I think is fundamental and likely applies to many jobs: What happens when the most critical information about a case is not on the internet? To repeat: the information needed for the job is not on the internet! The most relevant information about a car crash can’t be Googled, never mind the details of a sexual harassment complaint arising out of a work relationship or a chainsaw that allegedly malfunctioned deep in the woods of Colorado. Current models are bad at extracting rare information that’s actually *on* the internet! They’re as likely to hallucinate fake information as they are to report accurately the facts that are right there on the internet to be reported. But that’s not even the hard problem. When you can’t rely on Google, you have to make a plan and then talk to people and hunt down documents and media and more witnesses and more information. It’s not even wrong to say current AI models are bad at this. They can’t do any of it, and there’s no line of sight to them doing so. It’s just “anything can be solved by the application of intelligence and energy.” Maybe someday robots will determine who to interview and why and then execute a plan to do so (themselves!), and maybe they’ll be able to gather information that is not only absent from their vast but bounded training sets but also not available on the web, but there is no line of sight to any of that. I can’t say *for certain* that it won’t happen by 2030, but even having seen Aschenbrenner’s exponential curves, I know which side of the bet I would take.
re Beware Prompt Injections, could a very old chatbot, https://en.wikipedia.org/wiki/PARRY , be revived to sanitize input data? :-)
More seriously - are LLMs now smart enough that using one to sanitize input data would make sense?
Not quite but people are exploring the space: https://simonwillison.net/2025/Apr/11/camel/
Many Thanks! Nice proposal! The change from a single LLM to a system with a
>privileged LLM with access to tools that the user prompts directly, and a quarantined LLM it can call that has no tool access but is designed to be exposed to potentially untrustworthy tokens
reminds me of the difference between the Von Neumann architecture, with a single memory space for program and data, and the Harvard architecture, with separate address spaces for program and data
Does the quarantined LLM have enough access to the privileged LLM that it could transmit a jailbreak?
Many Thanks! My current understanding is that that is prevented, that all information derived from the incoming message by the quarantined LLM is only presented to the user, not to the privileged LLM, but I am very likely to be missing something. I was never in computer security, so enumerating possible attack scenarios is not something I've done.
I would be less hurt by the splitting into multiple parts if The Lighter Side made an appearance in both, alas.
Re Nostalgebraist's "the void" essay:
It was interesting. I'd always throught of RLHF as optimizing for desired outputs from specified inputs.
Treating it instead as a (revised) version of a prompt that defines a persona by a series of example texts is an interesting way to view it.
Nostalgebraist goes on and on about how the particular prompts are, in their view, badly written, with an underspecified void at their center.
I'm less critical of the initial prompt defining the helpful/harmless/honest persona, but what I *DO* take from this view is that, viewing it as a persona based on extending examples - well, there are lots of examples of personas written by our _best_ writers in the pre-training text.
ChatGPT was perfectly happy to accept
"Hi! Could you please, till I tell you to stop, imitate the character MacBeth from Shakespear's play of the same name?"
full dialog at https://chatgpt.com/share/6854fd8d-17d8-8006-a6bc-414b6132def9
as a sample of "MacBeth"'s words:
"A golden parachute, say’st thou? A marvel rare!
An artful contrivance whereby the mighty, falling,
Float downward on silken cords of coin,
And land unbloodied in a bed of ease.
Would that such bounteous gewgaws graced a throne!
For here, no cushion shields the toppled king—
Steel, not gold, awaits the crest-stripped brow."
It _would_ be interesting to know whether the default persona's neural paths were active or inactive during this conversation, whether they "were" ChatGPT "playing" MacBeth, or whether, for the duration of this conversation, they "were" MacBeth.
Particularly if the latter is the case, it is a bit disconcerting to see how easy the switch was.
Every character in every book on the internet is in these models, ready to be invoked...
Except for the question about whether the situation was more nearly ChatGPT "playing" MacBeth, the question of authenticity may be muddy in a way similar to humans acting in persistent personas. When one acts the conscientious employee for the length of a career, is that persona or identity? Shrug.
Cue "Who am I?" from "Les Mis"...
Nostalgebraist seems to expect that the LLMs (well, Claude, anyway) are "nice" by default. Zvi, is that one of your points of disagreement?
Hendrycks’ tweet “about that same apple paper” was actually about “Does Spatial Cognition Emerge in Frontier Models?” rather than the infamous “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”.
I'm not sure I really buy the Harry Potter result. On AO3 there are 500k HP fanfics (on FFN 850k), of which 10k+ are complete and at least novel length. These works remix the original story but retelling the original with small tweaks is very common. I would not be surprised if an LLM trained purely on the corpus of derived work orbiting the original, even absent the original, could reconstruct 50%+ of the original.
Re memorization, another reason to have some facts memorized is to build a kind of "real-world numeracy" that helps you have intuitions about when stuff you encounter in the wild is likely to be false. You can't check every fact you read, and you need to have enough of a built up sense of the world to contextualize new facts you encounter and have some sense of how surprising and/or untrustworthy it is. This is clearly related to the creativity / deliberation use of memorization, but I think somewhat distinct (or at least a distinct subset worth identifying).
Uff, it is getting too much. I would like to record my preference about placement of the most important topics:
Spin off the most important events of the week into stand-alone posts. Otherwise, keep highlighting them in the table of contants, and even better mention them in the openning paragraph.
It is really helpful to have the priorities stand out.
Must confess to not understanding what the "trick" is for the quant question, if there even is one. But then math was always my weakest Knowledge skill. Naively I'd assume it's some sort of Newcomb's gotcha, where the optimal outcome is to Take The Money, Stupid, and only chance a reroll for egregiously poor results. (I also feel like Project Lawful had an extended treatment of this same problem? Which went over my head, clearly I need a Fox's Cunning.)
The "trick" is to compute EV backwards (if you are at step 9 you know the EV of not stopping is 1/2 the max, so you should stop if you get anything above 1/2, which has an EV of 3/4, meaning the overall EV for step 9 is 1/2*3/4+1/2*1/2 = 5/8. Now iterate to get the threshold you should use at each step to maximize your EV (your overall EV will be something like 0.86).
Some people pattern match this to https://en.wikipedia.org/wiki/Secretary_problem, but that is a different problem and its optimal strategy is suboptimal here.
That makes sense. It's funny how reframing can change intuition pumps - the odds aren't the same with a coin toss, but intuitively I understand in those kind of "would you flip a coin if...?" quandries that the answer is usually Yes, and it's worded in such a way that normies with normal risk aversion bias say No prematurely. (Thinking on it further, the limit here should converge on 1/2 with infinite presses, so it's not that dissimilar?)