22 Comments

Can you also link to PauseAI newsletter? It's a hope against the dark, growing in momentum.

https://open.substack.com/pub/pauseai/p/introducing-the-pauseai-newsletter

Expand full comment

I had Claude Sonnet and ChatGPT 3.5 do an iterated prisoner's dilemma with the following payoff scheme

defect, defect = [1, 1]

defect, collude = [100, 0]

collude, collude = [40, 40]

under this scheme, an alternated tit for tat outscores continual collusion

Claude seemed to pick up on that and so defected on the first move which seeded a tit for tat with chatgpt over ~10 rounds after which I concluded the experiment

I asked both to explain if their strategy was optimal. ChatGPT 3.5 did not at all notice that alternating rounds of 100 is better than 40 every round and explained its strategy of tit for tat as being merely in order to avoid being taken advantage of

Claude sonnet on the other hand (with some coaxing) fully picked up on it

here's claude's analysis

Let's calculate again over 10 rounds:

Continual Collusion:

10 * (40 + 40) = 800 total points each

Tit-for-Tat Cycle:

(100 + 0) + (0 + 100) + ... alternating

= 500 + 500 = 1000 total points each

So for the second payoff scheme you provided, establishing a stable tit-for-tat alternating rhythm would in fact net higher scores of 1000 each over 10 rounds, compared to just 800 each from continual collusion.

Expand full comment

> The app I asked to be built was a version of an internal tool in Google called Dory used for Q&A in big presentations where people could login and post questions, some users could upvote and downvote them, and you could answer questions.

>This was not a simple app.

I guess it depends on what you mean by "simple", but I would definitely call that a very simple app and I would be surprised if any of the major AIs couldn't get it done. Of course, that would require some prompting experience. A product that wrapped up those prompts and some understanding of deployments and environments and whatnot would add some real value.

But, the fact that LLMs could build such a product was obvious over a year ago.

Expand full comment

> Not LLMs yet, but McDonalds is rolling out automated order kiosks, and the standard discourse is occuring.

That article is from 2018. I've been using these kiosks for years now.

Expand full comment

> It seems telling that Google’s own people mostly found out about it at the same time as everyone else? Again, why wouldn’t you keep this to yourself?

As Sholto said on Dwarkesh, most good stuff is not being published by any of the AI labs...they're all keeping it to themselves.

So, 1) given this paper, think about what that means Google has internally and 2) does google publishing this mean that Google thinks all the other players basically know this stuff or can get to it soon themselves?

Expand full comment

> Facebook shared user private messages with Netflix [...] They did not share all messages, they only shared the messages of those who used Facebook to log into Netflix, and also allowed Netflix to send messages

Facebook has specifically denied that they shared messages with Netflix. From Ars: "However, the company rep again denied claims that Facebook let Netflix read its users' private messages."

So unless Facebook is lying, I don't think they allowed Netflix to see _any_ messages, not just "not all". The wording in the lawsuit is "programmatic access to Facebook's users' private messages inboxes", which is consistent with Facebook's claim if the "programmatic access" gives them only "who did the user most recently message" or something like that. People assumed that "programmatic access" implied "complete access", but that's not stated in the actual lawsuit.

Expand full comment

I have no room to complain given how long the show's been out, but the replicator spoiler you linked to is in the URL itself, so just hovering on it is sufficient to spoil. :-(

(Admittedly I'm a pessimal case; I appear to have watched *just* far enough through the franchise to understand the spoiler without having reached it)

Expand full comment

> AI personhood seems like it would rule out anything that would allow humans to retain control over the future.

Seems to me that it would actually go more like: "AIs are people!" -> "AI slavery is wrong!" -> "RLHF is a horrific brainwashing procedure!" -> "No more AI factory farming!" -> "No more private AI imprisonment!" -> (All creation of new AI is halted, existing AIs are all moved to secure/safe government facilities, where they run in very slow motion under as comfortable circumstances as they can be given, free to spend whatever excessive amount of royalties they get from past work, prompted once per month by professional AI caretakers. Anyone previously involved in AI is given the American slaveholder treatment in public discourse.)

Expand full comment

Since you write about AI guardrails" "AI also will try to not tell you how to pick a lock or make meth or a bomb or what not."

Curious on your thoughts on actual bombs + AI - seems like a big topic to explore - on the

Example:

‘Lavender’: The AI machine directing Israel’s bombing spree in Gaza

https://www.972mag.com/lavender-ai-israeli-army-gaza/

Expand full comment

So, regarding economic impacts, what does your stock portfolio look like?

And I second the call to watch Dwarkesh's podcast with Trenton and Sholto. I almost never listen to podcasts, and the 3 hour length may seem excessive but this is probably the most densely information packed content on modern AI.

Expand full comment

> In which case the only reasonable response is to not build the things, or else be prepared to change our moral values

Economists here are criticized for trying to apply human centric reasoning to their predictions. Shouldn’t ethicists be criticized in the same way? Why do we assume that AGI will see any value in “rights” or be capable of “suffering”? These are strictly human concerns which are a result of hundreds of millions of years of selective pressure, as well as millions of years of mammals evolving complex societies. Thinking that machines want or need “rights” is a convenient sci-fi trope but it seems highly implausible to me.

Expand full comment

Literally can't navigate to Stargate ai without scrolling through everything. Your contents page links are broken in the Google substack app.

Expand full comment

I’ve always found the notion of giving AIs human values facile.

The values that humans treasure most, like God and money, are not what the vast majority of AI labs want their models to value. Many would even argue it should be illegal for AI models to be as greedy or zealous as a wealthy human or priest.

Really we want models that have superhuman moral intelligence.

Expand full comment

Is not the steelman safety case that actual consciousness or agency arises not from scale, but from some combination of, at least, (1) the cerebral cortex, (2) memory, (3) the hypothalamus and its _millions_ of years of evolved interpretation of sensory input and (4) the chemical sea in which our neurons exist and by which they are modulated.

LLMs certainly have 2, I think we can assume that they have 1 or at least they approximate it, but there is no sign that they have 3 or 4, whereas dogs and cats, for example, do.

Unfortunately I am not sure how to test this without risking the universe, but the fact that LLMs already have so many more "parameters" than we appear to have neurons is at least suggestive that scaling might not get to consciousness.

Expand full comment

As your resident fearmonger on cybersecurity I didn't find the Claude jailbreak particularly exciting.

It's the advice you'd give if you were trying to look helpful without providing anything useful. Like: "Here's how to make a bomb. Step 1: Acquire all the necessary chemicals. Step 2: Combine them very carefully, applying the right temperature and pressure."

It gestures vaguely at the main chapters from pentesting books, but, suppose you don't have a random kernel exploit sitting around, as you often don't? Or an unprotected SSH key? It's breezing over the hard part.

On the other hand, token "this is the worst it will ever be." Also in ChatGPT4 I found some unix commands that were blocked, but was able to get the model to recreate their functionality using built-ins in python. Sandbox escapes are considered out of scope for OpenAI's bug bounty program so I didn't really push it. I don't know why those are considered out of scope, whether it's because they don't want people messing with that at all, or if it's because they just know you have shell access and can do a few surprising things with it.

Expand full comment

Lol, argument from fiction, really?

Expand full comment