AI #49: Bioweapon Testing Begins

Zvi Mowshowitz

Feb 1, 2024

Two studies came out on the question of whether existing LLMs can help people figure out how to make bioweapons.

Read →

20 Comments

A1987dM

Feb 1, 2024

Another incoherent thing about Futurama is this https://www.astralcodexten.com/p/mantic-monday-12924/comment/48607214

Expand full comment

Rachel Haywire

Feb 1, 2024

This is fascinating. I am loving Manifold btw.

Expand full comment

Reply (1)

Austin Chen

Feb 2, 2024

I work on Manifold - thanks! I also really enjoy Zvi's markets, they're really good at turning culture discourse into useful disagreements.

Expand full comment

Brett Paul Bellmore

Feb 1, 2024

"AI-generated images, regardless of their content but especially if they depict people and other creatures, often seem to have an aura of anxiety about them,"

I was actually discussing this with my son recently. A lot of AI generated faces seem to have an odd similarity to them, almost like a distinct ethnicity, that makes a lot of them immediately obvious. But I haven't done any systematic look at this, perhaps it's just from people using the same package?

Or, more interestingly, to the extent these images are a result of the model tuning based on people's responses to the images, maybe they're converging on something encoded in our brains?

Expand full comment

ren

Feb 1, 2024

The Adderall comment got me thinking if anyone has yet thrown the money at performing a search procedure with a genetic algorithm for the best prompt addenda to maximize performance of an LLM. Obviously the search space is enormous but these prompt additions people are finding are also fairly short sequences of tokens and you could, presumably, cut the search space by quite a bit using the models themselves to produce coherent strings as suggestions to feed into the search procedures—I doubt there’s a lot of alpha in looking for garbage sequences of tokens that have seeming magic properties, but then again the latent spaces of these models have strange properties and such “garbage” sequences appear to work for jailbreaks, so...

Expand full comment

John R. Mayne

Feb 1, 2024

Guys responsible for the Carlin bit say it's human-written:

https://apnews.com/article/george-carlin-artificial-intelligence-special-lawsuit-39d64f728f7a6a621f25d3f4789acadd

It seems to me very likely that it is human-written; the folks responsible have been doing similar bits in the past.

Expand full comment

Steeven

Feb 1, 2024

Regarding Self-Rewarding language models, copied below

>Meta, committed to building AGI and distributing it widely without any intention of taking any precautions, offers us the paper Self-Rewarding Language Models, where we take humans out of the loop even at current capability levels, allowing models to provide their own rewards.

For lots of human in the loop training, you don't actually need a human in the loop for safety reasons. Lots of tasks are mundane and it is probably fine to take the human out of the loop. This paper is similar to constitutional AI, with the same problems as that approach although being able to improve in self-evaluation is an improvement.

>They claim this then ‘outperforms existing systems’ at various benchmarks using Llama-2, including Claude 2 and GPT-4. Which of course it might do, if you Goodhart harder onto infinite recursion, so long as you targe the benchmarks you are going to do well on the benchmarks.

I think you're misreading the paper. Look at the table on page 8 where it iterates to a 20% win rate which loses to GPT-4. I think you were looking at GPT-4 0613, which is an earlier checkpoint of GPT-4 and where Llama2 does win.

>I notice no one is scrambling to actually use the resulting product.

It still loses to GPT-4 and Mistral and I'm pretty sure this isn't easily available on the web anywhere, so I'm not sure why or how you would use it

Expand full comment

Brian Moore

Feb 1, 2024

Very specific question, regarding "time series", but also generally about "traditionally coded sub-modules that AIs can use like tools". You say "It does raise the question of how much understanding a system can have if it cannot preserve a time series."

What is the stumbling that prevents us from writing a function/module that Does Time Series, very simply - like say, hard-codes an understanding of a variable called time, time always go up, a network of nodes (like a few hundred at first, rather than trillions) that are dependent on each other in various ways (the "dependencies" themselves could be nodes like, "inversely correlated" or "goes up on Tuesdays, but not other days") and then train a LLM on it, to generalize not from how that function works, but on its output/states - essentially abstracting it from the LLM's perspective, so that when it receives input that has "if X happened last week, then Y will probably..." style, it goes "oh, I'd better fire up the Time Series function and plug X and Y into there, and then think about/use it that way?

Expand full comment

Joseph

Feb 1, 2024

Why would anyone buy futures contracts on "human extinction?" How do you collect?

Expand full comment

JungianTJ

Feb 1, 2024

> "It would be great if we could systematize the question of where regular people will have good intuitions, versus random or poor intuitions, versus actively bad intuitions, and adjust accordingly. Unfortunately we do not seem to have a way to respond, but there do seem to be clear patterns."

The default has to be good intuitions, I assume? One reason for bad ones is specific bad mental models, and those need to be called out. A while ago I conducted a silly contest (https://birdperspectives.substack.com/p/sack-against-pie) between three that stood out to me: pie, sack, slate. The term "pie fallacy" I quoted from Paul Graham's essay "How to Make Wealth", meaning wealth as a fixed pie: zero-sum thinking, which is ultimately behind many bad economic intuitions. With "sack fallacy", my own attempt to improve on the term "calories in calories out", I meant the body as a sack that bulges when filled with calories. And the slate refers to blank-slatism.

> "The most obvious place people have actively bad intuitions is the intuitive dislike of free markets, prices and profits, especially 'price gouging' or not distributing things 'fairly.'"

Correspondingly, it was the pie that won my anti-award for doing the most harm. But for something serious instead on bad economic intuitions, a major academic treatment, including peer discussion, see: http://www.pascalboyer.net/articles/2018BoyerPetersenFolk-Econ.pdf

Expand full comment

Valentin Baltadzhiev

Feb 2

I am not that confused by the lack of generative AI in Apple Vision Pro. It seems to me that Apple rarely are the first at the party and they prefer to delay new features and then release them as something “revolutionary”. Or it could be that they want to own the whole chain and their gen ai models are simply not good enough (yet?)

Expand full comment

hwold

Feb 2Edited

> As I explained before, that scenario is almost a Can’t Happen. If you do create AGI everything will change

There’s a world where Yudkowsky’s Fun Theory is wrong, where being sufficiently smart means necessarily falling hard into nihilism. In that world, the only possible good outcome of AGI is Futurama, where we build a mostly-non-interventionist god that is here to 1. protect us from real existential threats (like grabby aliens) 2. outlaw transhumanism (no-intelligence-enhancing-gene-therapy allowed, no mind upload for you) (what you call the Guardian)

(another fun but implausible theory is that it has already happened)

I suspect that this is the (very rough, mostly implicit) mental model of most non-transhumanist "normies", driving a lot of singularia probability-mass into futurama.

And I wouldn’t call it a mistake, unless I had strong confidence we obviously live in Fun Theory world.

Expand full comment

alpaca

Feb 2

> ‘each red ball is equally likely so 99/198, done.’

Highlighting how not good humans are at this sort of stochastic reasoning, I don't think this is valid reasoning, but a coincidence of how the numbers are set up. The probabilities wouldn't change if the red urn had 1 million red balls instead of 100, in fact the number of balls doesn't matter at all, only the conditional fraction.

I believe the correct model is a two-step process: "1/100 [chance of choosing red urn] * 1 [probability of red ball in red urn] vs 99/100 [chance of choosing green urn] * 1/99 [probability of red ball in green urn]"

Expand full comment

Reply (1)

Zvi Mowshowitz

Feb 2

Yes, of course, there is an implied 'equally likely because every urn is equally likely and has the same number of balls.' But it is right to look for that kind of shortcut, as everyone who did math team knows...

Expand full comment

Reply (1)

alpaca

Feb 2

Eh, I don't know... teaching people/students this sort of shortcut will in my experience very reliably lead them astray in slight variations of the problem - because they will apply the (invalid) reasoning of the shortcut to the modified version with the million red balls or two red balls and get out nonsense.

If you understand the nuance of the problem, applying the shortcut to do the math in your head is probably fine, but it's a bit of a Chesterton's fence-like situation where you can use it if and only if you know exactly what's going on, but otherwise you'll let the cattle go free.

FWIW I don't even find 99 / (99 + 99) to be easier to calculate than 1/100 * 1 / (99/100 * 1/99) :D

Expand full comment

Arnold Kling

Feb 2

"Even if AGI was physically impossible there are other risks to worry about, 2% seems low."

Why would you trust a prediction market on the question of human extinction? If I believe that human extinction has a 5 percent chance and the market says it is 2 percent, why would I bother to bet? Betting on human extinction means that if you're right, you never collect.

I just don't see how such a market can attract rational participants who believe that the probability is on the high side.

Expand full comment

Reply (1)

Zvi Mowshowitz

Feb 2

I do indeed make this point frequently, and don't myself participate in them.

Expand full comment

Mike Lambert

Feb 6

“That is the thing. It has been almost a year. I have done this kind of systematic prompt engineering for mundane utility purposes zero times”

I think the prompt guides make more sense when you’re a business who wants to do that process at scale, but less useful as an individual who could just do the one-off work yourself more simply.

Expand full comment

Maxwell E

Feb 12

I note that even the standout products you selected from the COT exercise… already exist. I own a collapsible hamper and a portable smoothie maker (recent grad) and my former roommate has a bedside caddy. I’m confused about how this is supposed to be novel.

Expand full comment

Reply (1)

Zvi Mowshowitz

Feb 12

Oh, huh. That does make things even less novel!

Expand full comment

Don't Worry About the Vase

AI #49: Bioweapon Testing Begins