7 Comments

> So presumably that would mean they [OpenAI] oppose all state-level regulations.

Here's the text of the third to last paragraph:

> While we believe the federal government should lead in regulating frontier AI models to account for implications to national security and competitiveness, we recognize there is also a role for states to play. States can develop targeted AI policies to address issues like potential bias in hiring, deepfakes, and help build essential AI infrastructure, such as data centers and power plants, to drive economic growth and job creation. OpenAI is ready to engage with state lawmakers in California and elsewhere in the country who are working to craft this kind of AI-specific legislation and regulation.

Expand full comment
author
Aug 27·edited Aug 27Author

Yeah, I suppose that's fair - it's a crazy position but they do own it. I'll edit.

EDIT: OK, updated now. I still think this is not a meaningful distinction, AB 3211 is exactly the same style of impose-rules-on-everyone bill that SB 1047 is, and the distinction they are drawing doesn't make sense, but they did try to make it.

Expand full comment

Thanks as always, Zvi! Your arguments have persuaded me to support SB 1047.

Regarding the AB 3211 discussion: as the creator of the “99% accurate” text watermarking method that OpenAI has had for two years, I can say with absolute certainty that there’s nothing there that any AI provider couldn’t easily replicate (and I and others have explained the algorithms publicly). Admittedly, no known watermarking method is robust against attacks like paraphrasing or translating the output, but that’s a different question. I still think creating more friction to the surreptitious use of LLMs, or at least getting real-world data about what happens when that’s tried, seems like a clear win. But I’m not sure what sort of coordination between AI companies and/or legislation will get us to that place that so many of the stakeholders seem to support in principle, but that’s so been frustratingly hard to put into practice. Above my pay grade! :-)

Expand full comment
author

Interesting! If it's easy to replicate why hasn't anyone actually implemented some version of it? Do you think a YC company could do it, since I saw pitches on this and no one was thinking they could approach that level?

Expand full comment

OpenAI and DeepMind *have* done implementations of LLM watermarking, as have academic groups at U Maryland and elsewhere. The real issue is not implementation, but deployment (and even more, public availability of the watermark detection tool).

I’d need to know more about the pitches you heard, but: if the watermarked document is only going to be modified in minor ways, then it’s very easy to get a detection accuracy that asymptotically approaches 100% with the length of the document and that’s >99% for documents of typical length and entropy. (Against a competent adversary who takes active measures to remove the watermark, the accuracy will of course drop to chance.)

Expand full comment

Thank you for the expert testimony!

This seems like a good opportunity to ask about a few things regarding watermarking that always bug me.

1. 99% - what does that actually mean? The descriptions above and elsewhere seem to imply you're measuring recall - given an LLM-produced document, what are the odds of detection. If so - don't we also care about false positives? If your main hope of reaching 99% recall is by accumulating evidence as the document gets longer, doesn't this explode the FP rate? How does the ROC look like? And what FP rate is acceptable for you, when aiming for 99%? Certainly in many contexts even 1% might render this tool de-facto unusable. Let's say each student at some university produces a 100 documents a year (essays, bits of code, you name it) you want to test and the overwhelming majority are written by the human. 1% FP means each student gets falsely accused once a year, on average. Seems a lot.

2. Exposure - would the detection tools being publicly available expose some of the details of the watermarking technique? Certainly seems it might be the case for open-source tools.

3. Generalizability - how well does a detector designed for one LLM fare on others? Given the variety of texts, training details, labeling/ synthetic data strategies etc., we should probably expect that each common LLM should require its own detector. But that means that an actual working tool would need to be a collection of such detectors. Related to 1 - doesn't this kill precision? Related to 2 - doesn't this increase the risk of exposure?

4. Prompting and "watermark-breaking" - guardrails are in a sense a constraint on the output distribution of an LLM (reducing the likelihood of certain outputs to nearly zero). Yet jailbreaking can reliably and fairly easily remove such constraints. And this has to be done roughly once per model - not per document. Doesn't this suggest that clever prompting (which would be needed once, and then would be probably widely disseminated) can subtly alter the LLM's behavior, enough to break the watermarking scheme? Note that your comment on competent adversaries doesn't fully cover this scenario. Relatedly, how robust is watermarking to different (not even adversarial!) prompts? Tyler Cowen had a conversation with Jonathan Swift - did that follow common vocabulary distributions? What if I ask the model to avoid the letter "u"?

5. Performance - what is the impact of robust watermarking on performance? Why should we expect the model + watermark to have the same performance level as the unconstrained one?

Expand full comment