5 Comments

A well thought out lengthy list of depressing questions/ scenarios. Regardless of our analysis, conclusions, and potential AI regulation, does anyone think that countries outside the US Empire (China) will do anything other than race ahead at maximum speed toward AGI? I suspect outcomes will be better if the West “wins”.

Expand full comment

Some additional questions around interpretability:

- How hard is it to understand current transformer based LLMs?

- How hard is it to understand whatever architecture replaces transformer based LLMs?

- Are RNNs easier to understand than transformers?

- Are fully binary quantized combinational logic models easier to understand than non-quantized models?

- Can existing transformer based LLMs be fully quantized to combinational logic?

- How easy is positive interpretability (understanding the main thing that a given component does)?

- How easy is negative interpretability (proving that a given component does not implement any other functionality)?

- How much of a GPT style LLM is a compressed lookup table vs a weird machine built on top vs a simulation of a human mind built on top of that?

- How does the distribution of simulation layers change with compute/parameters/data/architecture?

- How much do different flavors of reinforcement learning change interpretability?

- How well does interpretability scale with model size?

- Does interpretability improve capabilities?

- Are architectures/training methods amenable to interpretability better or worse for capabilities?

- Is good interpretability sufficient for alignment?

Expand full comment

I think we could run a pretty interesting survey on these if you want. I've been building some tech that makes it easy to get agreement/disagreement on this kind of stuff from many people

Expand full comment

I wrote a post on LessWrong partly inspired by the Crux List, in a similar style: https://www.lesswrong.com/posts/BvFJnyqsJzBCybDSD/taxonomy-of-ai-risk-counterarguments

Expand full comment