33 Comments

On the OpenAI watermarking thing: I am skeptical that it would work. Not for any technical reason, about which I am ignorant. Rather, there seems to be a simple game theoretical issue here. Consider the following scenario. Assume a future in which ChatGPT's output is watermarked. I prompt ChatGPT to write an essay on, say, orangutans on the moon. I take my watermarked output and pass it to Claude, telling Claude to re-write the essay. Indications of watermarks disappear.

Now, if *every* LLM baked in watermarking capabilities this obscure-by-iteration method wouldn't be a problem. All textual output from all LLMs would be watermarked.

But I doubt we'll ever live in a world in which all LLM text output is watermarked.

So, if OpenAI starts watermarking ChatGPT text output, then all a person has to do is figure out which LLM does not watermak its text output, pass the watermarked text to that watermark-free LLM, and have the watermark-free LLM re-write the text.

Expand full comment
author

I mean yes obviously you can ask Claude to rewrite it, also you can use Claude instead of ChatGPT if nothing else.

However we should not underestimate the trivial inconvenience factor. Most people are not going to run things through a second model, or find an obscure second product offering, most of the time.

Expand full comment

It may be much easier to rephrase, removing the watermaek.

e.g. if some future AI gives me a recipe for a chemical weapon of the acetylcholinesterase inhibitor variety, I can probably write up the synthesis much more easily than I could invent it in the first place (as this is not my specialist area).

Expand full comment

Hmm.. may depend on your threat model...

a) attacker is trying to generate vast quantities of low effort spam cheaply ... human rewriting is not a problem

b attacker has one, very carefully generated ai text that they are trying to hide provenance ... may be an issue. "Oh no! Q clearance Patriot turns out to be an LLM, and not a real insider in the US government. I am shocked, shocked, to discover that the whole conspiracy theory isnt true"

Expand full comment

Watermarking seems a great example of how hard it is for us to make sense of AI capabilities.

If LLMs are powerful, we can simply ask it to generate output that will not be detected by algorithms. Instead of adding "don't hallucinate", we end up adding "don't sound like a LLM" to our prompts.

If LLMs are, for some reason, consistently detectable by a watermarking algorithm, then it is, in some ways, failing the Turing test.

It's possible that the current generation of LLMs are detectable through algorithms. I think we all have a pretty good guess when we see something written by a hooman vs ChatGPT. However, I think that difference will go away really quickly.

Expand full comment

What I've heard from a HS teacher friend is that a lot of students are too stupid and/or lazy to even cheat in a hard-to-detect way when watermarking isn't being employed. I think you're overestimating how large a fraction of the population of would-be cheaters are sophisticated enough to be aware of watermarking and bother to look-for/use online methods for circumventing it.

Expand full comment
Aug 9·edited Aug 9

Love these, they're very well done, good work!

Expand full comment

That's definitely the voice of Ron Swanson from Parks and Rec

Expand full comment

Thanks for the heads-up about Askwho, and to Askwho for providing these! I was looking for the previous episodes in their "old" substack and was worried you might have stopped making them.

Expand full comment

Thanks!

The old Substack is not defunct! I was ending up converting pretty much every post by Zvi as they were always worthwhile so I created the dedicated feed, and will be converting every post going forward. I still plan to convert interesting individual articles on the other feed (when I have slack in the audio credit budget)

Expand full comment

"However [David Brooks's] core reason for why is essentially ‘humans are special,’ that it is not ‘all just information processing’ in the human brain. The AI won’t be able to match (his terms) our physical navigation of nature, our pursuit of goodness, or our ‘love,’ it ;acks consciousness, awareness, emotions, moral sentiments, agency, a unique worldview."

Even setting aside whether he's right about those individual parity elements: paperclip maximizers require none of them to destroy everything. How is an AI not feeling love supposed to save us?

Expand full comment

A long time ago, I am applying to various universities for undergrad. At Warwick, I listen to a talk Zeeman gives to the assembled applicants: once, they actually did the experiment of interviewing applicants, ignoring the interview recommendations, and accepting purely on exam results. Comparing how the students were evaluated at the end of their course versus interviewer recommendations: negative correlation. So they no longer do interviews....

So, this experimental result has been known for some time...

Expand full comment

That's Sir Erik Christopher Zeeman (4 February 1925 – 13 February 2016),

And I got the exam grades, so I went to Cambridge rather than Warwick...

Expand full comment

I wonder if the watermarking solution scales. If they anticipated that it didn't/wouldn't, they might be more likely to dismiss the idea of releasing it. Why present a pain point to many customers and a solution for others, that will then disappear/pole-switch with coming releases?

Not saying that's the correct evaluation in that circumstance, just a suspicion I have about their reasoning.

Expand full comment

The LessWrong crossposts also have AI narration: https://www.lesswrong.com/posts/4GnsAtamtcrsTFmSf/ai-76-six-shorts-stories-about-openai

Expand full comment

Please do give my feed a comparative listen. I think the human quality of ElevenLabs makes it easier listening for long periods, and I give every person quoted in each post their own distinct voice consistent within the post.

Expand full comment

Yeah, great job! I listened to maybe 90 episodes of planecrash before switching to reading the text (though I often listened at 3x).

I think I’ve also listened to LW narration for >100 hours.

Two things that the LW narration has that seem good:

- The table of contents isn’t skipped.

- Some text features are narrated, such as “There’s a list of bullet points”. It’s confusing if different sections of the text aren’t announced as different sections, with specific titles.

Expand full comment

Hmm... I guess you could view character.ai's War on Horny as a test of alignment, and they're doing it not because they really care about horny, but because the core IP that they hope to sell to google is an anti-horny filter that has been extensively tested against real attackers.

[names of companies deleted to protect the parties involved] I might .. uh ..know of a case where a company was created purely to be a honeypot for attackers, in order for their real backers to see what the attackers did. Said company's product was never intended to make money.

Oh, and my group was involved in a DARPA hackathon where we hold a mock election using voting machines and invite challengers to see what they could do. [Yeah, go on, find an exploit against CHERi and demonstrate it by changing the result of a simulated election using CHERI-vired voting machines... we will love you for the information on what the attack was ... those day zero exploits are getting kind of pricey now]

So maybe character.ai was just an elaborate honeypot.

Problem: their anti-horny filter truly, truly sucks.

Expand full comment

> Throw out all invalid outputs, and all the outputs that remain will be valid. Nice.

This technique has been implemented in OSS projects like llama.cpp for a long time, incidentally, including allowing arbitrary context-free grammars. One simple use case is for classification: you can constrain the output so that the only valid tokens are "good", "bad", or "neutral" (or whatever your categories are), instead of needing to try to convince the AI to do that itself. You can now do that with the new structured output in ChatGPT by specifying a JSON schema with an enum with a fixed list of values, but for more complex grammars it's not good enough.

With full control over the sampling you can do even more fancy things like constraining the output to be valid Python (or whatever). I think this sort of thing is the main advantage of local/OSS inference (well, other than porn, which is presumably still the actual thing most people are using it for).

Expand full comment

"It is times like this that it is" -- truncated copy, joke I don't get, or something else? (Just before 'Rhetorical Innovation"

Expand full comment

May I suggest a more extended treatment on the question "Is AI plateauing yet?" Almost every time I chat about AI this comes up as a crux, and I haven't found a great way to resolve the question or operationalize a forecast or whatever.

Expand full comment

> IAPS claims that for only $1 million, we could set up for several years a delay-based pure-software monitoring system to see where the chips are located

Nope. If you sell chips to a Singaporean company and then the company disappears and the chips are now in China, you won’t know the chips are in China. Someone needs to want you to know where the chips are. They can then show that the ping to the chip from a bunch of locations is consistent with the claimed location of the chip.

We don’t have tamper-proof pure software ways of doing that, as a state actor might be able to scan the keys off of the chip and simulate its responses to the pings from the claimed locations, with chips being elsewhere physically. There are ways around that, but they’re complicated, and also, you could just randomly send people to check data centers and look at the chips serial numbers? Hard to fake on short notice and much more tamper-proof. Not sure what would the software development cost be for that one, but, like.

This research is not in any way usable for on-chip mechanisms (such as tamper-proof license-checking) we’d need for effective compute governance, AFAICT.

Correct me if I’m wrong.

Expand full comment

Presumably, it has to be something like:

a) the chips just stop working if they dont get a response to a distance bounding ping

b) some level of tamper resistance

c) scanning the key out of one chip gives you no help in activating other chips (unique per hip key)

Difficulty; high end attacker

Expand full comment

> the chips just stop working if they don’t get a response

The paper only talks about a measure that’d allow chip owners to show that the chips are approximately where they’re claimed to be. The paper doesn’t suggest anything like the chips refusing to work if they’re not where they should be. That’d be much harder, wouldn’t make sense as a software-only measure, would probably cost many orders of magnitude more than $1m to develop and implement, especially if you want it to be resistant to state actors that might want to set up factories to disable that measure at scale.

Expand full comment

As someone who has worked on a chip hardware project ... I do not believe that you can do anything for a million dollars.

Expand full comment

Zvi writes: "The chip companies will not take the initiative because they do not want to be able to track the chips. We will need to force their hand" ... Remember the late 90s when Intel wanted to include a unique Processor Serial Number burned into each chip, and the privacy folks went cuckoo and their stock was hammered?? Good times.

Expand full comment

You say the robot tuner will be arbitrarily accurate, but it’s trivially easy to imagine a robot tuner that’s only 99% accurate, there’s no market for the perfect thing, and correct piano tuning just dies out.

Expand full comment

>Joel Miller asks five people in ‘the humanities’ the same questions about AI and how it relates to the humanities

Do you have the link for this?

Expand full comment

You probably weren't aware of this but Greg Brockman's wife is ill, so this sabbatical is independent of whether there are problems with the company.

Honestly I'm not even sure why OpenAI departures are newsworthy. Top talent in cutting edge fields moves around. If anything Google has probably lost the greatest volume of talent over the last 10 years. Anthropic has done the best job of retaining talent, however, they are the newest and they are paying some hefty salaries to keep that talent around. I don't know if that is sustainable.

Expand full comment
author

Ah, I did not know that.

The departures are newsworthy because of the pattern - individually they wouldn't be. And Brockman in particular wouldn't have looked so weird if it wasn't at the same time as Shulman.

(or, at some point, the benefit of the doubt goes away)

Expand full comment