Wolfram isn't concerned about existential risk from AI either

Expand full comment

You have a stray "If one has to do this with" paragraph.

Expand full comment
Apr 13·edited Apr 13

I'd like to offer my thoughts on a point Eliezer raised regarding knowledge practitioners possess that he may not have access to. I think I've found an example. Reinforcement Learning sucks, like a lot. Major accomplishments still depend on Proximal Policy Optimization (PPO) from 2017. The algos are hilariously sample-dependent, require near-perfect simulations of the test environment (video games, GO) and are very brittle.

The field continues to optimize for Atari, which essentially involves training and testing on the same data. Improvements in these benchmarks do not appear to be transferable to downstream tasks, as they primarily reflect better memorization of the environments. Projects like AutoGPT represent naïve implementations of agents, and the next logical step is to apply RL to them and be dissapointed. A truly effective RL algorithm would be alarming and lend support to concerns about rapid advancements, but for now, supervised learning remains the dominant approach. Reinforcement Learning with Human Feedback is mainly a gimmick, and similar results have been achieved using simple supervised fine-tuning.

Whenever I come across catastrophic scenarios that involve any form of model feedback incorporating RL, I can't help but think that's what an outsider might be missing—they are unaware that RL actually sucks.

Expand full comment

Another week, another roundup, another surge of existential dread

Man, I really really hope those scaling curves start bending, cause I frankly can't see any way out.

Expand full comment

I want to push back a bit against the section about being able to predict the end (because all roads lead to Rome) but not being able to predict details about the journey, because I have a pretty different lens that I default to about this than the one that you do.

In my view, basically what we are trying to do is develop models of the world or parts of the world and then develop hypotheses that draw on those models and test them against each other. My model has all sorts of assumptions -- intelligence is messy, there's no such thing as G, species' competitive advantage might come from many places, AI might become smart in some ways while remaining very dumb for a long time in others, etc etc. Your model has some similar assumptions to mine (like AI will become superhuman in some ways soon, AI is very impressive already etc) but many differences.

Most of what I try to do when I read these posts is try to update my model based on contact with your model and thinking through your arguments. But it is very hard to make any strong changes to my model without falsifiable hypotheses to test against each other. Very much to your credit, you do tend to make a lot more falsifiable hypotheses than Yudkowsky. But I think that the argument that Yudkowsky presents where he knows the end point but not the steps is weak to me. I'm not asking him to spell out the exact steps to AGI -- what would be persuasive to me are predictions based on his model that would be true if what he believes about intelligence etc. were true. It doesn't really matter too much if they are the steps to AGI or something entirely unrelated, since I agree that if all his (and your) assumptions about key points hold true then doom does likely follow. But I think we disagree on many key points/cruxes/model elements/whatever.

This to me is very much in line with the strength of economics -- it makes tons of falsifiable predictions, and part of what makes the Hayekian stuff impressive is not just that it makes sense in your head but it happens in the real world. For Yudkowsky I feel like I am getting the part here where it makes sense in my head but I am missing the part where it makes predictions about the real world -- and not just "here are the exact steps where we get to AGI" but predictions of the form like "if X about intelligence/competitive advantage/creativity/scaling is true then Y will likely follow, and if Y doesn't happen I will lower my confidence in X". Because I certainly have those, like if AI starts transforming the economy exponentially within a few years I will change my model a lot. But I get the sense that very little can happen that will change Yudkowsky's model substantially in his mind, and for me that is a huge weakness, and reminds me of a lot of failed ways of pre-enlightenment human attempts to do good reasoning.

That said, do appreciate your concrete predictions in the Auto-GPT post, that is absolutely what is missing from the Yudkowsky stuff to me. But disagree with you/Yudkowsky in the interview section in this one.

Expand full comment

Thanks for keeping these up, they're now my main source of AI-related content.

I was wondering semi-jokingly (but also semi-seriously) whether there would be value in alignment-concerned individuals that may not be able to contribute directly to alignment research to infiltrate entities that are working on capability advancement such as OpenAI/Anthropic etc and attempt to exfiltrate information in relation to progress and capabilities as a means of a sort of an early warning system about behind the scenes shenanigans, companies not sticking to their alignment commitments etc. Sort of a network of alignment whistleblowers if you will.

Probably a stupid thought but the desperation bag of ideas is getting pretty empty at this point - not that it was ever very full to start with.

Expand full comment

There is independent support for Eliezer's assessment that LLM research currently doesn't involve anything especially deep, but does require overcoming a shedload of technical difficulties with short half lives. For instance, appendix A of the GLM-130B paper from 6 months ago https://arxiv.org/abs/2210.02414 contains an awesome list of unsuccessful things they tried and describes how they were overcome.

Expand full comment

Couple comments.

1) The gesture at bar for implicit knowledge makes demonstrating it in a discussion really hard, as identifying 'X is a kind of implicit knowledge' is essentially gesturing at X. Thinking for just a second, there seems to be two ways we could identify the existence of implicit knowledge. First, by looking at the past we could try and identify factors that accounted for divergent performance that were not known or identified at the time but have since been recognized. This is still gesturizable though, just in hindsight (which may actually be what a valid gatekeeping attempt is looking for?). Second, we could try a sort of 'Total Factor Productivity approach' where we attempt to account for differences in performance between individuals after removing the effect of all gesturizable knowledge. This seems quite hard to do in practice.

2) One feature of the AI debate is that there is this gap between those like EY (and I think yourself?) who regard fatal AI as being overdetermined versus others who don't have a belief about the nature of future AI (or some who think benevolent AI is the attractor state). Its interesting to me how much of this factor seems to be determined by beliefs about what humans are. Like the overdetermined view seems strongly implied by the beliefs that humans are only accidentally special, our intellect is some dumb kludge, its physical substrate is kinda meh, and our values and preferences are the accidents of a random process (I mean look how long it took people to develop our present civilization and how dumb that civilization still is! Computers are an obviously superior substrate, as referenced above, much better algorithms then us likely exist, and the only values that aren't accidental are the dangerous instrumental ones). Whereas others either haven't integrated their (formal) views on humans with their thoughts about AIs, they don't think we currently understand human specialness, or they think humans are special in some way that is in tension with a straightforward extrapolation from evolutionary biology.

For example, I think Sarah Constantin's 'Why I'm not an AI Doomer' article can be in part understood as a claim that humans really are special reasoners in a way that is not understood, and that implies that fatal AI is not imminent.

This is definitely not an original thought, but one limitation of the AINotKillEveryoneism prologues I've read is that they generally seem to cover the issue with a quick gloss of the relevant big picture science claims (i.e. everything is physics including you, biology optimizes in a crude way ect.) and don't really engage with doubt that comes from the weakness of present explanations about human specialness.


Just for clarity here are how the other views on specialness seem to cash out in thinking about fatal AI being overdetermined:

*Ignorance about specialness*

The point being that each of these ends up undermining overdetermined death, not that people believe all of them:

Dumbness) Human intelligence seems really strong (we make the internet and nukes!), with few clear limits and unexplained features. Maybe we are already really smart?

Base Learning Capacity) We don't understand the base algorithm (if that is even the right term?) for human intelligence, maybe it is optimal and AI's won't be superior?

Brains) Maybe there is something special about brains that makes their limitations in compared to computers a correct one?

Values) Maybe our values reflect limitations in value space and stable values have to lie in human value land?

*Real Specialness*

Dumbness) Humans have some innate special quality to understand that can't be physically built.

Base Capacity) Human intelligence and understanding are connected to some special quality, so AIs can only approximate but can't equal human capacity fully

Brains/Substrate) The brain/basis for human intelligence is doing something special that can't be replicated.

Values) Moral values are real and any non-perverse being will follow those real moral values.

Expand full comment