AI #112: Release the Everything

Apr 17, 2025

OpenAI has upgraded its entire suite of models.

26 Comments

Apr 17, 2025

Memory seems to be intensely recency biased. It’s interesting to have it make a picture of all your conversations to see what it thinks of you, and most of the subject in my case was stuff from the past week

Askwho Casts AI

Apr 17, 2025

Full, multi-voice podcast episode for this post:

https://open.substack.com/pub/dwatvpodcast/p/ai-112-release-the-everything

Alex

Apr 17, 2025Edited

> I’m excited for both features, but long term I’m more excited for Google integration than for research. Yes, this should 100% be Gemini’s You Had One Job, but Gemini is not exactly nailing it, so Claude?

Have you enabled the Workspace App for Gemini and tagged @Workspace into your chats?

https://support.google.com/gemini/answer/15229592

tup99

Apr 17, 2025Edited

Codex says: "Codex always runs in a sandbox first"

Codex also says: "Linux – there is no sandboxing by default"

So I'm confused.

(https://github.com/openai/codex)

Reply (1)

Victualis

Apr 19, 2025

Maybe they assume most people are running Linux inside a sandboxed container, but direct MacOS use needs a sandbox.

Austin Morrissey

Apr 17, 2025

If you want to try o3 full reply to this message with the prompt and I’ll try it

Kyle Wilson

Apr 17, 2025

> Each context is a “day.” Then, the model is retrained each “night” on the day’s data so that it has long-term knowledge of what happened (just as humans sleep).

It seems difficult to robustly align a model whose weights are guaranteed to drift off in unknown directions.

Mariana Trench

Apr 17, 2025

I uploaded a black-and-white 1956 Kodak snapshot of my toddler sister in a small pool, and it got it right. I am duly impressed and terrified.

Reply (1)

tup99

Apr 18, 2025

In the "where was this photo taken" challenge?

Yes.

"Anna Gat is super gung ho on memory, especially on it letting ChatGPT take on the role of therapist."

We are living in a time when our personal data is even less secure than it was a month ago. I would be very wary of doing this. Yes, I know our personal data was never actually secure. But now it's even worse.

David Lizerbram

Apr 18, 2025

TWITTER is a trademark, not a copyright. Source: me, an IP lawyer who hates jokes.

Reply (1)

Cjw

Apr 18, 2025

If that were the case, and they’ve stopped using the mark coming up on a few years now, have they abandoned it? Does having Twitter.com redirect to x.com count?

Reply (1)

David Lizerbram

Apr 18, 2025

They have definitely not abandoned it. I don't know about other countries, but in the U.S. the rule of thumb is at least three years of non-use is required for trademark abandonment. In this case, for example, twitter.com redirects to X.com and that alone may be sufficient to show use in commerce given how well-known and popular the service is.

Reply (2)

Cjw

Apr 18, 2025

Ok, thanks, I had IP law in 2003 and never use it so I was curious if a IP lawyer would think that was the case.

Jeffrey Soreff

Apr 19, 2025

Many Thanks! I've bumped into IP law from the patent side a little, but very much appreciate the explanation!

Jeffrey Soreff

Apr 18, 2025

One strange thing about Robin Hanson's view:

Yes, real interest rates have not risen significantly, which suggests skepticism about major near term changes.

But the valuations of the major AI labs are in the hundreds of billions of dollars.

_Both_ the investors in e.g. U.S. Treasury bonds _and_ the investors in major AI labs have "skin in the game", and presumably are making the most accurate bet they can. How can they _both_ be right?

Hanson seems to believe the former investors - but why them instead of the latter?

Reply (1)

[insert here] delenda est

Apr 18, 2025

The argument would be that the Tbill market is multiples bigger than the AI market.

The counterargument is that the AI market is 100% people expressing their expected value of AI whereas the treasury market is made up of many different buyers, a large number of which are _not_ expressing an opinion on the future value of their treasury bills

Reply (1)

Jeffrey Soreff

Apr 18, 2025

Both good points. Many Thanks!

tup99

Apr 18, 2025

We want our AI to not lie to us. Won't that make it an asshole? If you go to your grandma's house and she asks how you like her brisket, you tell her it's delicious. You don't tell her that it's dry and stringy. (And this is a white lie, not a pleasantry.)

If someone uploads a photo and asks if they are ugly or fat or their nose is too big, what do we want an AI to say? A non-asshole human would say a white lie. They would not say the truth, even gently using nice words.

Reply (1)

Thor Odinson

Jun 7, 2025

This could be a culture thing, I would prefer brutal honesty where applicable. Zvi is a New Yorker, and Jewish, and both of those groups are famous for preferring brutal honesty too.

Now, to be clear, I do think "gently, using nice words" matters quite a bit! But it is better to know the truth than not, believing false things causes far more harm in the long run

Reply (1)

tup99

Jun 7, 2025

It is true that cultures vary quite a bit in this area. But I am pretty sure that no culture is 0% or 100% truthful. I don’t think you or Zvi would say “dry and stringy” if your grandma asked how is her brisket.

Also, even if YOU prefer 99.9% honesty, most people do not. Social niceties are necessary in most cultures to be considered not an asshole. So at best, this would be training AIs to be assholes in MOST cultures.

Mark Schröder

Apr 18, 2025

Re: Please send text tokens for greater training efficiency.

There’s whisper for that.

GoodGovernanceMatters

Apr 18, 2025

> "Huh, Upgrades"

Well done.

Victualis

Apr 19, 2025

I really would not draw any conclusions about AI (or human) deception from Mafia-style games like Among Us. The optimal strategy is sensitive to tiny tweaks in the rules. I would be more likely to conclude from Claude preferring deceit that that is optimal play for this variant of the game, than that Claude prefers deceit. An AI system (or a human) might well prefer deceit outside the game setting, but to measure that you first have to have a perfect strategy to measure against.

dan mantena

Apr 20, 2025

Gemini models integrate Google docs fine at the moment so not sure what you mean that Google dropped the ball.

Also, perplexity deep research is another option that rivals Claude's deep research depth balanced with speed.

Don't Worry About the Vase

AI #112: Release the Everything