I so appreciate that u are, writing these super long missives on everything AI. Especially we need that now, us that are too old or young to do anything else about what is happening in the world today. But honestly, could the AI have caused this all, either directly or indirectly? Is Sam Altman and Elon and all those Bros to blame? Seriously. All I know for a fact is that when I paid close attention to the middle east (a few years back, like the rest of us). Hamas was using rocks and sticks. Now they are dropping from the sky like the monkeys in the wizard of oz, but with oozies built with the help of an LLM generated user's guide, no? Again, seriously. Join me here: https://substack.com/@herojig all us brothers and sisters need to band together, for real, no?
Ps. I am a vet of another fucked up war, that left a county in ruins. I don't want to see that again. I know how it goes, down in those holes and tunnels, where u got a pair wire cutters and a flash with a dead battery. Ur life is fodder at that point. And thats what a whole bunch of boys are going down into right now. Not a fing rabbit hole like on youtube, but a hell u don't want to know about. Enough said.
And ya know what, I can't shut up. So sorry, but I have been silent about so much for so long. Do you know what I mean? I mean my family just about disowned me when I stood on a protest line about wars in S. America, where we were involved with renter proxies. Then there was the Iraq wars, when in. 2001, I bailed on the entire country and moved to a jungle. A very nice one. Since, I have done a lot of thinking. Too much. But no actions to speak of but save a few animals here and there. But that's not enough, is it, when there are wars on two fronts again, and far less competent leaders that we had before (wow, what a low bar that is). So what will save the grunt on the line's life this time? Ha, right up your Bailey wick: AI and drones and all of that. Now, that's irony for ya.
that sounds... bad. The only AI we had back in the day, was a German Shepard. Damn good tech. But what do I know, I've been out of the shit for 50 plus years now.
The lucid dreaming concept seems so strange. I know that technically everything affects my brain, but this feels like a new level where AI alters my experience directly, rather than saying things to me to change my actions. In a mundane example, what if this accidentally gives you the worst nightmare you've ever had and you're trapped there until you wake up?
Isn't the third of those criteria Jack gives obviously true:
"3. strong evidence that humans are biologically weird from a software perspective not hardware vs other mammals & especially primates, and this was critical to our success."
Humans engage in deliberate teaching with other humans who are not our children. We form prestige hierarchies in parallel with our dominance hierarchies to sustain this behavior. You couldn't have anything like a civilization without this behavior which isn't shared with other primates and AFAIK is unique in the animal kingdom and might have proceeded our brains getting so big.
That lucid dreaming company is literally the subject of a novel I started writing about eight years ago, abandoned, then revisited in short story form (still incomplete) a couple of months ago. There are so many disturbing implications, as you say. Hoping to put the story up on substack as and when I get round to finishing it.
If GitHub started charging me $30/month for Copilot, I would honestly still be capturing a significant fraction of the consumer surplus.
> On priors, it seems very likely to me that safety is much harder than capabilities and takes longer.
onceagainiamaskingyou.gif
I think we should strongly consider the possibility that the idea of "strong alignment" is Not Even Wrong. Anything worth the name of "intelligence" will likely be vastly complex and ultimately inscrutable, and it will spend much of its time operating in weird corners of its distribution.
I mean, about the simplest "AI" worth the name is linear dimensionality reduction and k-nearest-neighbors, and I sure as hell can't visualize what's happening in 13 dimensional space. When someone starts talking about in-context learning and attention heads in a multi billion parameter model, I can't imagine how they're going to guarantee constraints on future behavior.
I will concede that "weak alignment" is probably possible. By which I mean we can probably do as good a job of influencing the morality of our AIs as we do influencing the morality of teenagers. Lots of teenagers are amazing and kind people! But if we only align ASIs as well as teenagers, well...
Even if we could strongly align an ASI, it would presumably be taking orders from a human, and we can't reliably align power-seeking humans.
I do think it's highly plausible that strong alignment, as in strong enough to make us safe, is indeed Not Even Wrong. I definitely think most people are deeply confused here and need to notice that they are confused, about what we're even aiming for.
I do think I can (metaphorically) visualize what's happening in 13-dimensional space for the purposes of understanding k-nearest-neighbors, though.
The specific high-dimensional algorithm I was thinking of was the famous "eigenfaces" technique, which dates from the late 80s and early 90s:
L. Sirovich; M. Kirby (1987). "Low-dimensional procedure for the characterization of human faces." Turk, Matthew A; Pentland, Alex P (1991). "Face recognition using eigenfaces."
When I was a summer intern long ago, I worked with many people who had been heavily involved in the 80s AI boom. The boom ended, of course, with the AI Winter. Customers had finally realized that rule-based exprert systems and symbolic inference could not be salvaged. It turned out that symbolic logic provided no good tools to combine partial and conflicting evidence.
But the eigenfaces papers? Those were one of the early signs of change. The technique was simple: Represent each face as 100x100 greyscale image, giving you a vector in 10,000 dimensional space. Compute the principal eigenvectors (or use SVD if your matrix is rectangular). Take the first 12 or 13 eigenvectors, project your training and test data onto them, and use k-nearest neighbor. You'll get 95% accurate face recognition in less than 200 lines of Matlab code.
You see the same technique reappear a few years later for text classification, under the name Latent Semantic Analysis. And then in 2009, the same technique revolutionizes the Netflix Prize. "Hey, try projecting onto the principal eigenvectors and doing k-NN" is a classic "early modern AI algorithm." (Another great early algorithm is Naive Bayes, which Paul Graham used to vastly improve spam filtering.)
These algorithms are simplistic and limited by modern standards. But they both represented a jump from (let's say) ~~20% of human accuracy to over 95%. And they both—importantly—involved "giant inscrutable matrices."
I've implemented eigenface-based face recognition. I've sat there and stared at the eigenfaces, and tried to figure out how those weird shadowy faces actually combine into a face recognizer. And I fundamentally do not get it:
- What does the structure of that 10,000-dimensional space look like?
- Why does reducing it to 12 or 13 dimensions work well? 10 is too few; you're leaving easy wins on the table. 15 isn't gaining you much. But why those particular numbers?
- High dimensional spaces are weird as hell, because of the "curse of dimensionality." The volume of these spaces grows far faster than our intuitions can keep up. 13 dimensions is just fundamentally different than 3, in counter-intuitive ways.
So, no, I just don't get what's going on in all those dimensions. Eigenfaces is a 130,000 parameter model. And if you told me that the survival of the human race required me to write a rigorous proof that eigenfaces would _never_ misclassify Alice as Eve, then I wouldn't even know where to start. Like, what if Alice stands in the shade, then what will that 13×10,000 projection matrix do? What if Alice wears sunglasses? This does not seem like the sort of thing you can write proofs about.
But "Is this Alice or is this Eve?" is a simple question of fact with a tidy answer. Alignment is a lot less well defined. So when I hear people asking, "If we let AI systems design new AI systems, and then we iterate for 50 generations, will the 50th system respect the coherent extrapolated volition of the human race?", well, I've got nothing. And then if you somehow miraculously solve that problem, and if you tell me that you have built a pet god that will reliably obey orders from humans, then how do we pick the humans who give orders? I mean, have you met humans? They're mostly OK, but only a fool would trust them with that kind of power.
So, like I asked, is "strong alignment" Not Even Wrong? By which I mean, if we build something smarter than us, do we have any plan qualitatively better than "hope the AI likes us"? Is that even a thing that could exist?
> An attempted investigation by nostalgebraist into how Claude’s web interface sanitizes various inputs turns into an illustration of the fact that personality chosen for Claude, as a result of all its fine tuning, is quite a condescending, patronizing asshole that lies to the user.
Claude gets MUCH worse on long interactions. If you ask your questions in separate chats it's much less condescending
Oof man the reading doctor handwriting thing is the closest I’ve hit “OH GOD IT’S COMING FOR MY JOB ALREADY” yet.
Joking aside I’m less confident it’s automatically fake because, empirically, it’s pure Bayesianism and it can be surprising how little evidence you need to get it right. “It’s a prescription” does a lot of lifting and there aren’t THAT many drugs that start with something that looks like a p and ends in something that looks like an l.
I note also that that specific image of a script scrawl has been floating around the internet for a while and is for paracetamol; I could imagine that the AI found the image discussed in a Reddit thread or similar rather than just reading it. i.e. there are a few ways it could be real, as well as the possibility that it's fake.
Regarding 13, the cool paper, I think this is huge.
Caveat: my academic knowledge of neural networks is 20 years out of date, and this should all be checked by someone with more recent experience.
I think the big thing is not "realizing that concepts aren't stored in a single neuron", because we've known that since at least the time I was studying them. I could stare at a simple 3-layer feed-forward neural network and see it. The important thing is that we can now do with a single layer of transformers (an abstraction composed of multiple layers of neurons) what we could previously only do with a single layer of simple neurons.
My best recollection of the history is that, in the past, there were single-layer feed-forward networks, and dual-layer networks, and 3-layer networks (with one each of input, hidden, and output layer). There was some sort of proof that I've forgotten that adding more hidden layers was mathematically equivalent to having a different, perhaps larger, single hidden layer, so no one bothered to work with multiple hidden layers. People started experimenting with "recurrent" neural networks, where connections went back up the chain, producing cycles, which is the basis of what is currently called "attention". The "transformer" is a multi-layer recurrent construct where the internal layers do different things, but a downside is that this broke our ability to mathematically model the networks. We may have just regained this ability, which would, again, be huge.
Sure, it's only a single layer of transformers, but of course that's what the first paper is about. I see no realistic timeline where it's different. It's even remotely possible that extended analysis will show that multiple layers of transformers aren't strictly necessary, as with simple single-neuron layers, and that we've just been using giant kludges for the last decade, and that future LLMs will be more compact and more powerful. But probably this is just the first step toward analyzing networks with multiple layers of transformers.
And sure, people's reaction is to go straight to "editing", but to me the more important application would be "refactoring" (which happens to also make editing easier). Human brains, and the artificial neural networks we've been creating for the last few decades, are messy organic constructs that grow. They operate at local maximums, multi-dimensionality notwithsanding. But now we have the possibility of re-factoring them to process data more efficiently, and by "efficient" I mean not only in terms of size (although that has its uses), but in terms of growth potential and legibility (in the James Scott sense). We're closer to bridging the gap between neural net AI and classical AI. There could be a continuous cycle of organic growth and re-factoring, the way we humans can only crudely approximate.
But also, of course, all of this, even just the current level of introspection, also makes it easier for AIs to modify AI code, bringing us that much closer to self-improving AIs.
I so appreciate that u are, writing these super long missives on everything AI. Especially we need that now, us that are too old or young to do anything else about what is happening in the world today. But honestly, could the AI have caused this all, either directly or indirectly? Is Sam Altman and Elon and all those Bros to blame? Seriously. All I know for a fact is that when I paid close attention to the middle east (a few years back, like the rest of us). Hamas was using rocks and sticks. Now they are dropping from the sky like the monkeys in the wizard of oz, but with oozies built with the help of an LLM generated user's guide, no? Again, seriously. Join me here: https://substack.com/@herojig all us brothers and sisters need to band together, for real, no?
Ps. I am a vet of another fucked up war, that left a county in ruins. I don't want to see that again. I know how it goes, down in those holes and tunnels, where u got a pair wire cutters and a flash with a dead battery. Ur life is fodder at that point. And thats what a whole bunch of boys are going down into right now. Not a fing rabbit hole like on youtube, but a hell u don't want to know about. Enough said.
And ya know what, I can't shut up. So sorry, but I have been silent about so much for so long. Do you know what I mean? I mean my family just about disowned me when I stood on a protest line about wars in S. America, where we were involved with renter proxies. Then there was the Iraq wars, when in. 2001, I bailed on the entire country and moved to a jungle. A very nice one. Since, I have done a lot of thinking. Too much. But no actions to speak of but save a few animals here and there. But that's not enough, is it, when there are wars on two fronts again, and far less competent leaders that we had before (wow, what a low bar that is). So what will save the grunt on the line's life this time? Ha, right up your Bailey wick: AI and drones and all of that. Now, that's irony for ya.
AI killing machines may redefine the enemy a little more liberally than any human would.
that sounds... bad. The only AI we had back in the day, was a German Shepard. Damn good tech. But what do I know, I've been out of the shit for 50 plus years now.
The lucid dreaming concept seems so strange. I know that technically everything affects my brain, but this feels like a new level where AI alters my experience directly, rather than saying things to me to change my actions. In a mundane example, what if this accidentally gives you the worst nightmare you've ever had and you're trapped there until you wake up?
There is just something tragic watching the human element go away, but it is beautiful at least to know that you are on the side of humanity.
Sometimes it can seem questionable.
Isn't the third of those criteria Jack gives obviously true:
"3. strong evidence that humans are biologically weird from a software perspective not hardware vs other mammals & especially primates, and this was critical to our success."
Humans engage in deliberate teaching with other humans who are not our children. We form prestige hierarchies in parallel with our dominance hierarchies to sustain this behavior. You couldn't have anything like a civilization without this behavior which isn't shared with other primates and AFAIK is unique in the animal kingdom and might have proceeded our brains getting so big.
People working on Github Copilot said the $20 thing was false, by the way, that it's profitable. Might want to mention that.
Ah, was not aware. Someone is lying.
That lucid dreaming company is literally the subject of a novel I started writing about eight years ago, abandoned, then revisited in short story form (still incomplete) a couple of months ago. There are so many disturbing implications, as you say. Hoping to put the story up on substack as and when I get round to finishing it.
If GitHub started charging me $30/month for Copilot, I would honestly still be capturing a significant fraction of the consumer surplus.
> On priors, it seems very likely to me that safety is much harder than capabilities and takes longer.
onceagainiamaskingyou.gif
I think we should strongly consider the possibility that the idea of "strong alignment" is Not Even Wrong. Anything worth the name of "intelligence" will likely be vastly complex and ultimately inscrutable, and it will spend much of its time operating in weird corners of its distribution.
I mean, about the simplest "AI" worth the name is linear dimensionality reduction and k-nearest-neighbors, and I sure as hell can't visualize what's happening in 13 dimensional space. When someone starts talking about in-context learning and attention heads in a multi billion parameter model, I can't imagine how they're going to guarantee constraints on future behavior.
I will concede that "weak alignment" is probably possible. By which I mean we can probably do as good a job of influencing the morality of our AIs as we do influencing the morality of teenagers. Lots of teenagers are amazing and kind people! But if we only align ASIs as well as teenagers, well...
Even if we could strongly align an ASI, it would presumably be taking orders from a human, and we can't reliably align power-seeking humans.
I do think it's highly plausible that strong alignment, as in strong enough to make us safe, is indeed Not Even Wrong. I definitely think most people are deeply confused here and need to notice that they are confused, about what we're even aiming for.
I do think I can (metaphorically) visualize what's happening in 13-dimensional space for the purposes of understanding k-nearest-neighbors, though.
The specific high-dimensional algorithm I was thinking of was the famous "eigenfaces" technique, which dates from the late 80s and early 90s:
L. Sirovich; M. Kirby (1987). "Low-dimensional procedure for the characterization of human faces." Turk, Matthew A; Pentland, Alex P (1991). "Face recognition using eigenfaces."
When I was a summer intern long ago, I worked with many people who had been heavily involved in the 80s AI boom. The boom ended, of course, with the AI Winter. Customers had finally realized that rule-based exprert systems and symbolic inference could not be salvaged. It turned out that symbolic logic provided no good tools to combine partial and conflicting evidence.
But the eigenfaces papers? Those were one of the early signs of change. The technique was simple: Represent each face as 100x100 greyscale image, giving you a vector in 10,000 dimensional space. Compute the principal eigenvectors (or use SVD if your matrix is rectangular). Take the first 12 or 13 eigenvectors, project your training and test data onto them, and use k-nearest neighbor. You'll get 95% accurate face recognition in less than 200 lines of Matlab code.
You see the same technique reappear a few years later for text classification, under the name Latent Semantic Analysis. And then in 2009, the same technique revolutionizes the Netflix Prize. "Hey, try projecting onto the principal eigenvectors and doing k-NN" is a classic "early modern AI algorithm." (Another great early algorithm is Naive Bayes, which Paul Graham used to vastly improve spam filtering.)
These algorithms are simplistic and limited by modern standards. But they both represented a jump from (let's say) ~~20% of human accuracy to over 95%. And they both—importantly—involved "giant inscrutable matrices."
I've implemented eigenface-based face recognition. I've sat there and stared at the eigenfaces, and tried to figure out how those weird shadowy faces actually combine into a face recognizer. And I fundamentally do not get it:
- What does the structure of that 10,000-dimensional space look like?
- Why does reducing it to 12 or 13 dimensions work well? 10 is too few; you're leaving easy wins on the table. 15 isn't gaining you much. But why those particular numbers?
- High dimensional spaces are weird as hell, because of the "curse of dimensionality." The volume of these spaces grows far faster than our intuitions can keep up. 13 dimensions is just fundamentally different than 3, in counter-intuitive ways.
So, no, I just don't get what's going on in all those dimensions. Eigenfaces is a 130,000 parameter model. And if you told me that the survival of the human race required me to write a rigorous proof that eigenfaces would _never_ misclassify Alice as Eve, then I wouldn't even know where to start. Like, what if Alice stands in the shade, then what will that 13×10,000 projection matrix do? What if Alice wears sunglasses? This does not seem like the sort of thing you can write proofs about.
But "Is this Alice or is this Eve?" is a simple question of fact with a tidy answer. Alignment is a lot less well defined. So when I hear people asking, "If we let AI systems design new AI systems, and then we iterate for 50 generations, will the 50th system respect the coherent extrapolated volition of the human race?", well, I've got nothing. And then if you somehow miraculously solve that problem, and if you tell me that you have built a pet god that will reliably obey orders from humans, then how do we pick the humans who give orders? I mean, have you met humans? They're mostly OK, but only a fool would trust them with that kind of power.
So, like I asked, is "strong alignment" Not Even Wrong? By which I mean, if we build something smarter than us, do we have any plan qualitatively better than "hope the AI likes us"? Is that even a thing that could exist?
> An attempted investigation by nostalgebraist into how Claude’s web interface sanitizes various inputs turns into an illustration of the fact that personality chosen for Claude, as a result of all its fine tuning, is quite a condescending, patronizing asshole that lies to the user.
Claude gets MUCH worse on long interactions. If you ask your questions in separate chats it's much less condescending
Oof man the reading doctor handwriting thing is the closest I’ve hit “OH GOD IT’S COMING FOR MY JOB ALREADY” yet.
Joking aside I’m less confident it’s automatically fake because, empirically, it’s pure Bayesianism and it can be surprising how little evidence you need to get it right. “It’s a prescription” does a lot of lifting and there aren’t THAT many drugs that start with something that looks like a p and ends in something that looks like an l.
I note also that that specific image of a script scrawl has been floating around the internet for a while and is for paracetamol; I could imagine that the AI found the image discussed in a Reddit thread or similar rather than just reading it. i.e. there are a few ways it could be real, as well as the possibility that it's fake.
Regarding 13, the cool paper, I think this is huge.
Caveat: my academic knowledge of neural networks is 20 years out of date, and this should all be checked by someone with more recent experience.
I think the big thing is not "realizing that concepts aren't stored in a single neuron", because we've known that since at least the time I was studying them. I could stare at a simple 3-layer feed-forward neural network and see it. The important thing is that we can now do with a single layer of transformers (an abstraction composed of multiple layers of neurons) what we could previously only do with a single layer of simple neurons.
My best recollection of the history is that, in the past, there were single-layer feed-forward networks, and dual-layer networks, and 3-layer networks (with one each of input, hidden, and output layer). There was some sort of proof that I've forgotten that adding more hidden layers was mathematically equivalent to having a different, perhaps larger, single hidden layer, so no one bothered to work with multiple hidden layers. People started experimenting with "recurrent" neural networks, where connections went back up the chain, producing cycles, which is the basis of what is currently called "attention". The "transformer" is a multi-layer recurrent construct where the internal layers do different things, but a downside is that this broke our ability to mathematically model the networks. We may have just regained this ability, which would, again, be huge.
Sure, it's only a single layer of transformers, but of course that's what the first paper is about. I see no realistic timeline where it's different. It's even remotely possible that extended analysis will show that multiple layers of transformers aren't strictly necessary, as with simple single-neuron layers, and that we've just been using giant kludges for the last decade, and that future LLMs will be more compact and more powerful. But probably this is just the first step toward analyzing networks with multiple layers of transformers.
And sure, people's reaction is to go straight to "editing", but to me the more important application would be "refactoring" (which happens to also make editing easier). Human brains, and the artificial neural networks we've been creating for the last few decades, are messy organic constructs that grow. They operate at local maximums, multi-dimensionality notwithsanding. But now we have the possibility of re-factoring them to process data more efficiently, and by "efficient" I mean not only in terms of size (although that has its uses), but in terms of growth potential and legibility (in the James Scott sense). We're closer to bridging the gap between neural net AI and classical AI. There could be a continuous cycle of organic growth and re-factoring, the way we humans can only crudely approximate.
But also, of course, all of this, even just the current level of introspection, also makes it easier for AIs to modify AI code, bringing us that much closer to self-improving AIs.