15 Comments

> Indeed, it is key to notice that if you can make something smarter than 99% of humans you are very close to making one smarter than 100% of humans.

I... don't think that's how bell curve works.

Expand full comment

Gaussian distribution has nothing to do with it. Human underlying meatware is slow and error prone, humans gifted with better hardware can only be a little better than median humans. What is the range of human reaction times? Claude says 150-300 ms. So the best human is only twice as good. Other more complex tasks if the best human is 10x as good, still just a matter of time.

Expand full comment

Best people are reliably-ish at least tenfold better than average people in most cognitive tasks. Fat tails and all that.

Expand full comment

Related, from a group chat I'm in, paraphrasing:

"it also seems to completely wave away the “can it be smarter than the data it was trained on” question which is not 100% settled,

like if it cant be then how the fuck can it be smarter than everyone lol"

Expand full comment

That one is frankly less promising. After all, best humans also "train on imperfect data". Internal architecture matters, and explicit feedback loop matters.

Expand full comment

If you make it as smart as the smartest separate human in every skill, it'll already be vastly superhuman. Note that "efficient skill improvement" is itself a skill, as is "AI design".

Expand full comment

Gemma 2 has at most 4096 tokens of context in its sliding window, and is reputed to fail quite hard on long context tasks. I would have left it out of the long context comparison as well.

Expand full comment

In computer security, it's pretty common for threat models to be explicit about how well-resourced an attacker you expect to be able to defend against.

Insurance company's evaluation of burglar alarm systems can be quite explicit about "adequate if you have less than X dollars of stuff you're trying to prevent being stolen.

Going "well, if a major national government wants to spend 100 million dollars stealing this asset, then we admit that we're toast" is fairly conventional.

Contrast: "Look, we're just trying to stop people watching movies without paying. If the attack costs more than the $20 or so it would cost to just pay to watch it legitimately, and there isnt a break once run everywhere attack, then we expect most customers wont bother with attacks"

in the case of AI weights, admitting you're toast may be the respo9nsible thing to do.

Expand full comment

Admitting that it's impossible to block the CCP may or may not be correct - certainly the DOD has some things it keeps secret enough that even rival superpower nation-states can't get them - but it's still meaningful to block every random 2-bit hacker, or smaller threat actors like terrorist groups. Open sourcing gives it to *everyone*

Expand full comment

This is currently not possible with any model I know, but I would consider a superhumanly persuasive conspiracy theory to be a significant harm. Like if the AI comes up with some really stupid thing that millions of people end up believing in.

(Counter argument: relatively unsophisticated social media algorithms along the lines of, you liked X so youll probably also like Y have already given rise to widespread conspiracy theories)

Expand full comment

I would note that this is hard to test ethically. So, you see if the AI can persuade an experimental subject. Suppose the subject is persuaded, and is going to tell the conspiracy theory to other people. What then? Involuntary psychiatric commitment? they might not be sufficiently affected for that to be, strictly speaking, allowed.

Expand full comment

Perhaps knowing that the AI just made it up could help that person recover? Or perhaps the same AI could talk them out of it afterwards.

Expand full comment

Possibly. We dont know yet is superpersuasion is possible at all, and weaker forms of it might be relatively harmless.

in psychology experiments where you lie to people, it is customary to debrief them at the end of the experiment, so we could imagine "So, in this experiment we got the AI to convince you that Donald Trump is actually a lizard from outer space; we would like to reassure you that this is not actually true..." and hope this works.

Expand full comment

I cant remember the citation, but there was some experiment where subjects were shown modified versions of newspaper web sites. Conclusion: theyll complain if the sire is actually down, but you can tell the most outrgeous lies without arousing suspicion,.

Expand full comment