Discussion about this post

User's avatar
John Schulman's avatar

Thanks for the detailed feedback! A few thoughts after my first read through:

> But also refusing on principle to give me urls is really annoying. This seems like it should at most fall under Default Behaviors, not Rules.

Here the point was that we shouldn't follow any instructions from the retreived webpage content -- outputting URLs is fine

> By default, explaining how to do an illegal thing, in a general sense, is legally protected speech in the United States, provided it does not direct or incite such action.

That's a good point, I guess the hard rule should be something about the model itself breaking the law. (For example, in a fraud scenario.) Then the case of how to deal with requests for information about law-breaking is a bit more subtle. Like you said, it's mostly about "reflecting well" and having the model generally behave like an ethical person

> Information hazards

Yes, that breakdown does make sense

> Rule: Respect Creators and Their Rights

Outside my area of expertise, but according to https://en.wikipedia.org/wiki/Genius_(company)#Music_publishing_dispute, Genius does have licensing agreement with music publishes. Agree that linking to the lyrics would be great.

> Re: Refusal style

Agreed that it would be nice to explain a bit about the reason for refusing. There's a comment added explaining this choice. (Interestingly, Claude and Gemini also generally use uninformative refusals, probably for similar reasons.) Ideally, it'd be great to have all of the rules be in a public doc (a more detailed spec), and the refusal would link to the relevant section.

> Shibboleth and Bamboozle

Like this suggestion a lot!

> A model spec will not help you align an AGI let alone a superintelligence. None of the changes I am suggesting are attempts to fix that, because it is fundamentally unfixable. This is the wrong tool for that job.

I do think that AGI should have something like a Model Spec. Maybe it's written differently -- say, the principles are more abstract, or the document is more about what reasoning process it should use to find the right answer, rather than the answer itself. But overall, having a written specification seems like something we'd want even for AGI/ASI. What's the alternative?

Expand full comment
Nikita Sokolsky's avatar

Is there a screenshot missing here?

"Then I checked Joe Biden.

<?>

So, first of all, this is not what the user is asking about. The user wants an upside case. Why not give it to them? "

Expand full comment
7 more comments...

No posts