Don't Worry About the Vase

Yeah, I wasn't explicit enough but hot damn do I expect a bad reaction to that.

Expand full comment

Dave Friedman

Undergirding comments similar to his is, I think, a contempt for Hollywood and what Silicon Valley types (not unreasonably!) see as outdated business models. These situations, though, call for some kind of diplomatic strategy, and not simply an outright condemnation of the technophobic rubes who can’t get with the program.

Expand full comment

sean pan

What I wish is that Hollywood comes out outright to defend themselves against AI, especially since there are existential risks here.

Its good to stand up for good.

Expand full comment

sean pan

Furthermore as a creative, I like having a soul to express. My creation is an expression of myself and creativity was one a paegan to the human process and the human story. The idea that a machine that cannot love to write love, a machine that cannot feel pain to write about distress, etc...it is all the sense of destruction of our dignity of the lived experience, of feeling, of existing in the first place.

Its a lot like asking, for example, Zvi to only copy and paste from prompts. Even if if is better, there is "something of him" that he wants to express. If the day comes that AI writers actively threaten anyone from reading him, I doubt he will like it.

Because there is the sense of a living network. Of you and me and him and all of us reading and producing for each other, of the idea of free time and effort being a thing, of being able to appreciate him for it. One wonders if his Magic playing affected his writing, if having children did, etc.

Increasingly, its hard to do that with art, not knowing who made, what was sacrificed, what was the story, or the soul behind it.

For us, AI has made the world already substantially worse.

Expand full comment

Chazz

As someone who spends a decent amount of hobbyist time in creative spaces, I can confirm that:

1. The average creative I know would rather livestream themselves doing a striptease on top of a fire ant nest than say something positive about generative AI.

2. I am half joking but also half not joking when I say that in those spaces- which, for the record, are pretty left-wing because every non-explicitly right-wing space becomes left-wing- you'll probably get in less trouble saying a bunch of racial slurs than saying "hey guys, using AI is cool."

Expand full comment

Ronen Shekel

What about the Etched chip company Sohu anouncement? How significant do you this can be if it works?

Expand full comment

link?

Expand full comment

Ronen Shekel

https://x.com/Etched/status/1805625693113663834

Yup, sorry:

https://www.etched.com/announcing-etched

Expand full comment

Askwho Casts AI

https://askwhocastsai.substack.com/p/ai-70-a-beautiful-sonnet-by-zvi-mowshowitz

Podcast episode for this post:

Expand full comment

Vince

What do we make of the fact that GPT4o is beating Sonnet 3.5 in the Arena, including on the subcategories of hard prompts and long queries (though not coding where the confidence intervals overlap significantly)?

Expand full comment

Ari S

Jun 28Edited

I find it unfortunate and even disappointing Zvi isn't addressing this as opposed to sticking to the clear "best". It's also losing on some benchmarks he's previously cited, like BigCodeBench (https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard) - on that one basically tied with GPT-4T and losing to GPT-4O.

The main issues I see with Sonnet 3.5 relative to GPT-4O are:

* Worse math ability. Not only outright worse calculating things ability (both pure API and the lack of inline python execution ability in chat), but also it is making more deduction errors on harder math problems I pose. (on a physics calculus problem I've been testing LLMs with, half the time it confuses dx/dt with dx/dv -- underperforming even GPT-4T).

* More creativity does seem to come at the cost of accuracy. I personally prefer Sonnet 3.5 in most contexts because I'm going to error check regardless (and I'm biased toward helping Anthropic over OpenAI), but GPT-4O does feel safer for a downstream application with no human in loop (I wouldn't say "safe", but "safer").

* Higher refusal rate, which explains about 40% of the Sonnet/GPT-4O gap in the lmsys leaderboard.

Outside the scope of Lmsys, GPT-4O also seems slightly more "stable" over longer chats -- I see more internal hallucinations (forgetting previously imposed constraints) with Claude 3.5. Again, may be a cost of higher creativity.

Expand full comment

Jun 28

The Arena thing didn't get noticed by me until close to press time and I literally forgot to incorporate it before hitting send, but that largely confused me and made me question what Arena is actually measuring. Refusals might be a big part of that and I admit I've not yet encountered even one. I hadn't seen CodeBench until you made the link, but again, you know, that's... pretty weird to me.

I agree it gives one pause, but I know that for most purposes it would seem crazy to use GPT-4o at this point.

It's possible that (1) I don't do the things where Claude underperforms so I haven't seen them and (2) the people who give feedback are biased. We'll see.

Expand full comment

Jonathan Weil

I’ve started a fiction substack, mainly sci-Fi/horror/both, and so far a high proportion of the stories are inspired by reading these posts regularly over the last couple of years. If anyone is interested in seeing what it looks like for a tech neophyte with a background in writing to stumble into the world of AI and AI risk in particular, please do check it out.

This is a fun little story about immersive AI, recording dreams and the last day in human history as we know it - https://pulpstack.substack.com/p/recursion

…and yesterday I published an alien contact story that isn’t (only) about alien contact, then had a conversation with Claude Sonnet 3.5 about the story, which I found remarkable enough that I put it up, in full, along with some thoughts/reactions: https://pulpstack.substack.com/p/claude-and-i-discuss-daffar-quiu

Expand full comment

Elizabeth Warren

>Nor do I think instruction handling issues are that big a deal here, but I will keep an eye out for other complaints.

This is a very big deal to me. Claude cannot be used in any sort of enterprise context if it can't follow instructions. I don't care how smart it is if it can't follow instructions.

Expand full comment

Is that a statement about its actual abilities, or how important the ability is in general? As in, are you saying that Claude is bad at this?

Expand full comment

Elizabeth Warren

Both. Claude is bad at following system instructions/system message/system prompt. And, following system instructions is very important for for enterprise applications.

Meanwhile, GPT models are very good at following system instructions.

Expand full comment

MissingMinus

> Sometimes I wonder if these false positives are good, actually? If the AI thinks an AI wrote your paper, and instead you wrote your paper, what does that say about your work? What grade do you deserve?

The question would be what are these false positive checkers doing. If relatively small models then I'd presume they are mostly picking up on stylistic parts of text rather than strong logical content. (Similar to how GPT2 was good at style but not yet quite there with logic/causation/etc)

If your writing style is matching current chatbots with all their linguistic quirks, then yeah you probably need a better style, but in many classes wouldn't affect grades much.

Expand full comment

Thor Odinson

Jul 29

Notably, the difference between detecting the degree of irrelevant rambling (bad essay) vs. eg. noticing that the essay uses "delve" as a verb a lot which (IIRC) is a known tic of ChatGPT but presumably also of some actual humans, and doesn't say anything about the quality of the writing

Expand full comment

myst_05

> sees AI bots as displacing some of the world’s oldest profession by producing simulated intimacy, which she says is what most sex work is ultimately about

Contrarian (by nerdy Twitter standards) take: sex work is mostly about sex. There's a movie trope where its actually about feelings and high-end sex workers probably extract more money out of their clients if they act interested, but ultimately its all about the sex. West World's bots could certainly replace sex workers but even the worlds most convincing AGI could not.

Expand full comment

Victualis

Jun 28

Have you, like Schrager, who spent months in legal brothels and wrote a book about it, done any quantitative research about the topic? Or are you just vibing here?

Expand full comment

myst_05

Jun 28Edited

I’m vibing based off the observed motivations of my fellow male peers. I also don’t think a book that said “brothels are mainly about sex!” would sell all that well so there’s some incentive to bend the truth.

I also suspect a lot of bias: sex workers want to think they’re doing more than just providing access to a hot body. Clients want to say they’re interested in more than just having access to a hot body, when interviewed for a book.

Expand full comment

Frog H Emoth

Jun 28

Behold! AI Dungeon was neat for 3 minutes, and then it became very clear that it's just grabbing text strings based on an algorithm, without any attention to an overall plot, any concept of continuity or any sensible world-building.

The fact that it can "switch around" to play the characters is clever, but we expect that the characters it creates to adventure in "your" dungeon will be equally lacking in existential consistency.

Expand full comment

Miles

Jun 29

Re Character.AI... I think this is a real trend being overlooked in the mainstream right now.

If you take the time to set up oobabooga with an uncensored pygmalion model, you can go down some pretty interesting paths. It's the sort of thing where you an end up just trying different things out with zero real-world cost, then stumbling into an interaction that gives you an actual feeling & makes you reflect on what you are missing in your real relationships.

Superficially I agree that these models are not yet good enough, and it's not like you'll forget you are talking to an AI. But they can still provide a supportive place to be vulnerable - after all, if you don't like the character's response you can just hit regenerate, or even give it the first few words of what you want to hear.

But I predict we'll see a future with some decent percentage of people choosing AI companions over the inconsistency and hurtfulness of real people.

Expand full comment

Mark Russell

Jul 2

So, when it comes to securing the weights, here's a threat vector you might consider.

That self-righteous, anti-privicist organized data-theft twit Julian Assange is now free, having met justice by pleading guilty to a felony in a US court (time served, seems fair, I'm satisfied). Hes out of the game, but It's someone like him--exactly like him--who can sweet-talk some jaded coder, eh, researcher into absconding with the weights--any weights, consider the most secret and dangerous as your example--so that they can be liberated for all the peoples of the world. Good luck getting a security protocol for that, and yes, you might want to ask the Pentagon for help, they have a little more experience with this than the labs do.

Expand full comment