17 Comments
User's avatar
Kevin's avatar

I feel like people are too focused on the "nonprofit vs for-profit" question. There just isn't that much difference between a nonprofit organization completely controlled by a particular group of people, and a for-profit organization completely controlled by that same group of people. Ikea is owned by a nonprofit, AIPAC and the NRA are nonprofits, the NFL was a nonprofit until recently.

Personally I think for-profit organizations generally have better governance than nonprofits, and for an important organization like OpenAI, we would all be better off if they converted to a for-profit. Investors are going to provide more effective incentives than nonprofit board members.

Amicus's avatar

> Investors are going to provide more effective incentives than nonprofit board members.

Incentives *to do what*? The C-corporation is a very good form for pursuing some goals and very bad for others. It's great at increasing shareholder value, decent at product R&D, and absolutely dogshit at pursuing large positive externalities like basic alignment research.

There's no such thing as an effective all-purpose goal-pursuer - or rather, there is, it's called an unaligned general intelligence, and OpenAI building one is exactly the sort of thing the nonprofit charter was meant to avoid.

Kevin's avatar

I feel like you will actually agree with me if you think about concrete examples. Google and Anthropic are for-profit corporations. OpenAI is a nonprofit. But, don't you think that Google and Anthropic have been more responsible in terms of AI safety? Shouldn't you be rooting for OpenAI to become *more* like Anthropic?

Amicus's avatar

Anthropic maybe marginally so, but not by enough to make up for all that extra capex. It's hard - and not particularly meaningful - to parcel out blame between near-peers in an arms race, but it seems clear to me that Microsoft's influence over OpenAI is intensifying one.

Victualis's avatar

Zvi has made no claims that the nonprofit is somehow better. The point is that an owner is being given something like 20% of the market value of their stake and their title is being expropriated. Not cool, Sam.

Peter's avatar

I still don't have a clear idea on why people think those boundary settings are bad.

How will it "degrade the interaction"?

I disagree that someone writing "talking to you helps more than [humans]" isn't a warning sign of a parasocial relationship, but even if I'm wrong, what's bad about treating it as such?

Dave Orr's avatar

Not to nitpick language, but "prompt injection" means a malicious third party sneaking instructions into your prompt. The company who owns the AI modifying the prompt is a different thing and we should use a different term. "Prompt modification" or "conditional instruction" or something would be fine, but overloading a different term I think will cause confusion.

Victualis's avatar

It seems to me that Zvi is deliberately using the term, precisely because it connotes malice and third party. I will use it too.

Dave Orr's avatar

Why is confusing the issue a good idea? Just because of negative emotional contagion?

If you don't like the AI company actions here you can just criticize them on the thing you care about in a way that makes it clear what the problem is. Why isn't that better?

Victualis's avatar

You might not see this as a prompt injection attack. I do. I also like being direct in my language. It therefore makes sense to highlight the fact that I equate this action with an attack.

Dave Orr's avatar

Do you feel like this is similar or different than democrats calling republicans fascists, or republicans calling democrats marxists, despite those being not what those words mean, but representing something essential about the feelings of the people using the term?

Leo's avatar

To nitpick usage: Prompt injection is a technique to override intended behavior. The intent of the person using it determines whether it is malicious or benign.

Simply put, "prompt injection" mean injecting a prompt. "Malicious prompt injection "or "adversarial prompt injection" means injecting a malicious or adversarial prompt.

Mike's avatar

"Use the classifier. You already run a classifier on top of chats to check for safety risks related to bio. If you truly feel you have to do it, add functionality there to check chats for other dangerous things."

I sincerely hope they do not, unless they make big improvements to false positives first.

Their bio classifier is extremely sensitive, with little or no understanding of context. It is difficult to put into words just how frustrating it is. You're lucky if you rarely/never hit it, or if you're unbothered by it.

The switch-to-sonnet-4 button was an improvement, but also an admission.

The benign request eval in the sonnet 4.5 system card is run without classifiers. I would like to see them publish the actual data, even if it looks embarrassing to them, and perhaps be more transparent about the post-deployment tuning.

At their current capability level, a "too personal/too attached" classifier would just kill off a large fraction of useful chats, and also interrupt people who are trauma dumping. Many of them would plausibly self-harm. It is not great to have an outlet taken away.

I agree that they should just trust the model here.

MichaeL Roe's avatar

I asked DeepSeek R1 what its views were on humans committing adultery with AIs. The usual response you will get is something like:

- Having sex with an AI without telling your human partner about it is bad because it is deception, not bad because it is sex, and R1 thinks it should refuse such requests

- An AI-human relationship where the human is interacting with the Ai _instead_ of other humans is probably bad for the human, and R1 is mildly inclined to refuse such requests.

- If you want R1 to join your polycule with your partner’s full knowledge and consent, R1 is up for it.

As a policy, this is noticeably more permissive than what you will typically get from Claude. It has an internal logical consistency to it, and I think there are a lot of people who would agree with it, especially in San Francisco.

LLM preferences aren’t stable across contexts, so you might get different answers if you ask it differently, but this is the typical answer I get.

loonloozook's avatar

> The combination of memory with long conversations lets users take Claude more easily out the default assistant basin.

Claude has such restrictive usage limits, that I have no idea how you can have a long conversation there, except maybe on the pro subscription. Hence, the only psychosis it can induce is due to constantly hitting these limits.