At least two potentially important algorithmic improvements had papers out this week. Both fall under ‘this is a well-known human trick, how about we use that?’ Tree of Thought is an upgrade to Chain of Thought, doing exactly what it metaphorically sounds like it would do. Incorporating world models, learning through interaction via a virtual world, into an LLM’s training is the other. Both claim impressive results. There seems to be this gigantic overhang of rather obvious, easy-to-implement ideas for improving performance and current capabilities, with the only limiting factor being that doing so takes a bit of time.
Please, What does this mean in simple terms?
Future models will comprise a hierarchy of abstract state that transitions laterally over time, with direct percepts (of multiple modalities--language, vision, kinesthetic) being optional to the process (necessary for training, but not the only link from one moment to the next). The internal model of reality needs to be free of the limitations of the perceptual realm.
Family Ties is actually a series I've been procrastinating on finishing for a couple years now. It'd be funny if AI Family Ties that was passable in quality came out before I finished the original run. I don't expect that, though, out of existing diffusion-based AIs, at least not without a TON of human postprocessing. Diffusion models lack the profound interframe continuity of items and shapes in a scene that you get for free from any camera.
I'm actually not too concerned about human flourishing/values in the optimistic case where we get a benevolent AI that doesn't kill everyone and can just do all the jobs better than we can.
The reason is that we already have proof that humans can happily flourish in that paradigm: hobbies. I have several hobbies, some of which can be lumped into the big category of "producing food", including cooking, gardening, and beer brewing (among others).
I am not even close to the best at _any_ of these. I'm not even good enough that it is difficult to find other humans that are better, let alone relative to the industrial processes. I can, for essentially trivial amounts of money, buy products that are better than what I can produce, and will ever be able to produce, yet these activities still bring me a very significant amount of joy and meaning.
I enjoy the hobbies because the action of performing them is intrinsically valuable with no need to relate to the skill or ability of anyone else, and I actively avoid taking steps that would improve the outputs but decrease my involvement, while taking lots of actions that improve the outcome while maintaining or increasing my involvement.
It is entirely possible that the entirety of human existential value will come from these kinds of hobbies (Family/small group social interactions is another such example).
In my job, I contribute, in some small way to the furthering of humanity's understanding of the physical world. If I was no longer able to productively contribute to that endeavor, that would be worse than the world in which I can continue to do so. But it is most definitely not a world in which I take no joy or find no meaning.
With regards to "intelligence denialism", the funny thing is that I can imagine the _exact reciprocal argument_ to Doomer's claims that enough intelligence can do anything. There are not even any arguments being made that can be grappled with _by either side_. Both positions appear to be taken as articles of faith. One side is claiming "sufficient intelligence can do anything, and if it can't do it, it's not sufficient". The other side is saying "There are some things intelligence can't do and adding more won't fix it, some problems are not intelligence limited".
I don't see evidence for _either_ of these claims. I have gut feelings and priors that push me in one direction or another, but nothing that could even potentially persuade someone on the other side.
One of these two sides is correct, obviously, but neither has yet found (as far as I can tell) a reason why a completely neutral party should believe them over the other side.
Thanks again for these weekly reports, Zvi. My favorite parts are the Mundane Utility section and the list of links of "cool things to try" and early proofs of concept. AI risk & alignment are cool and all but... what has AI done for me lately? 😉
> Now someone - this means you, my friend, yes you - need to get the damn thing written.
What thing are you referring to? AI regulation?
There’s already AIs producing new episodes of sitcoms: a short while back it was a passing fad on Twitch that there were AI generated episodes of Seinfeld. I haven’t seen it myself aside from a few short clips but they’re not full facsimiles (it’s not generating full live action images and the voice lines are all text to speech). I think it might be going on still? I haven’t remembered to look into it; the first time I’d heard about it I couldn’t find it because the Twitch channel got banned after the AI accidentally made an insensitive joke but I think I’ve heard they’re back up and it just hasn’t held attention long term.
But this also kind of lampshades what might be the futility of Hollywood protectionism around actors/IPs/etc. in that even if they won’t be allowed to make the eighth season of Family Ties they will almost certainly be able to make a pretty comparable knock-off using different audio and video. Already looking forward to clickbait “meet the first AI actor!” articles.
It’s sort of fun that you have the part about the brain having bugs and that AI might find hidden ones to exploit right next to the part about algorithms knowing you better than you do wrt engagement. I think maybe there’s an argument to be had that engagement is a bad metric to pursue singlemindedly specifically because people eventually express unhappiness with it regardless of the revealed preference—it’s probably not less vulnerable to the pitfalls of Goodharting just because it’s an algorithm optimizing the metric, and not only in ways where you want to be upset if the algo favors what you find politically distasteful. Like it definitely seems to be beside the point to say “we think engagement metrics are bad because of the inability of existing systems to adjust” when there’s a much more visceral sense that it often seems to be robbing one of a certain element of choice in where to spend one’s time. Getting distracted by things that catch your attention when you should or want to be doing something else can very easily be not-a-good thing. Cue XKCD comic about someone being wrong on the internet here.
Re doing things to try to avoid this: Is there a list (either by zvi, or by someone else who seems reasonable) of outlines of software projects it might help to do? I'm trying to get some people to work on something like this lately, but I don't have any idea outlines that feel hopeful.
Typo: "Paul Cristiano" -> Paul Christiano
The S&P500 dip probably had absolutely zero to do with the Tweeted image/fake news report. 30 points is pure noise.
Typo: the link around "Ladish" is missing, and there's text in the source that's not displayed on the page.
Re Yudkowsky’s “please tell us what you have learned from such interactions” objection, there is a counter-objection, to do with types of knowledge and their relative levels of communicability. Here’s Dominic Cummings summarising Michael Oakeshott:
>>Every human activity involves knowledge. There are two types of knowledge, distinguishable but inseparable:
1. Technical knowledge, or knowledge of technique. In many activities technique is formulated in rules which may be learned, remembered and put into practice. It is susceptible of precise formulation (whether or not it actually has been precisely formulated). Part of the technique of driving is in the Highway Code, the technique of cooking is in cookery books and the technique of discovery in natural science ‘is in their rules of research, of observation and verification.’ It is possible to write it down and it seems possible to be certain about it. It can be learned from a book.
2. Practical knowledge. It exists only in use and cannot be formulated in rules. Mastering any skill is impossible without it. Its normal expression is in a customary or traditional way of doing things, in practice. It is expressed in taste. It is not possible to write it down and it seems imprecise. It can neither be taught nor learned but only imparted and acquired — the only way to acquire it is by apprenticeship to a master, not because the master can teach it but because ‘it can be acquired only by continuous contact with one who is perpetually practising it.’
This seems intuitively right to me. Is it possible that Eliezer is demanding an explication of Type 1 knowledge, while Altman is describing Type 2 knowledge?
(Separately, these two map closely onto “word smarts” and “embedded-in-the-world/culture/living-tradition expertise”; my hopes for humans over AIs are largely based on the difficulty (impossibility?) of getting AIs to surpass us at the latter, virtual-environment-training or no.)
You're correct that I'm not worried in the "Bobby McFerrin sense", although I would say my actual position is closer to Marcus Aurelius. But you've missed a key point in my argument: *not* creating an AI of a certain capability level (or delaying it) could plausibly lead to an *increase* in x-risk. So your "obvious" intervention of "stop the breakthrough from being found" is, in my opinion, no more likely to mitigate x-risk than any other. That intervention is still just pushing the double pendulum up at t=2; doing something that vaguely feels correct given what we know right now, but with ultimately no hope of meaningfully impacting the eventual outcome.
You've slightly misunderstood my position on Christiano-type research. I think it's good research because it will yield meaningful, predictable benefits to society. But, in keeping with my overall position, I don't think it's effective at mitigating ASI x-risk. (And if I were someone whose sole evaluative criterion was mitigation of ASI x-risk, I would not think it is good research.)
N = 22 is small but the distribution on that first Conjecture poll is really interesting. No one under 10% and more in the 60-80 range than the 80-100?
Accurate probabilities "in the wild" are more likely to be 1% or 99% than 50%, the more so the more detailed a model you build; in the extreme case where you understand all the unknowns it converges to 1 or 0. So what we're seeing here even from experts in the field is not a detailed model but unknown unknowns dominating the calculus. Not something to rest easy about but also not the same thing epistemically as "we're probably gonna die" which would be an easy uninformed reading.
Thinking about this makes me doubt the Yudkowskian model of convergent doom. I can't think of good examples where you get to effectively bias your unknown unknowns toward one outcome without having a real causal model in place. Magnus Carlsen isn't one; chess is extremely robustly understood to be a game of skill with a high skill ceiling (high skill can change the evaluation of positions dramatically in ways that aren't obvious at lower skill levels). We haven't achieved anywhere near that quality of modeling of AGI futures.