Don't Worry About the Vase

Coagulopath

Scaling in the old sense of making models 2+ OOMs bigger and hoping for the best isn't economically feasible anymore, even if it did still work.

Scaling in narrower senses is still done—in the DeepSeek papers they mention they're bottlenecked by a lack of good RL training data, so presumably now the challenge is to figure out what a model lacks, and synthesize more of it.

My personal belief is that scaling was always kind of a mirage: the important thing is to get the right kinds of knowledge into the model, and scaling was just the easiest way to do so (by packing EVERYTHING ON THE INTERNET into the model, you'd necessarily scoop up the needed knowledge, just as raking the entire oceans will definitely catch some fish). But now economics are forcing people to be more discriminating. Does training on trillions of tokens of random web slop really get you anywhere anymore? I don't think so. Frontier models like o3 use specialized reasoning datasets, often synthesized by other frontier LLMs.

Expand full comment

14h

Taste is knowledge. A chef can tell how fresh their ingredients are, knows how to combine them, has all the understanding required to identify and judge what they produce. Critics are the same. Once you know enough you are able to place a piece of work against its natural comparisons and know it and therefore judge it. Most people mistake taste for preference. Everyone has preferences. Some people are so knowledgeable about their preferences they develop taste in that niche. But that isn’t the same as actually having taste in a discipline. The pervasive idea that art is all subjective interpretation is everywhere in schools and on the internet and has blinded many people to this basic truth. People who don’t read poetry are declaring poems by 4.5 to be good. They may have taste in other areas of writing, but in poetry it is merely a preference. That’s fine! AI is good at producing things we like. Great! That doesn’t mean it’s actually a good poet or that it can currently write anything to compare to the best human poets. So a lot of this depends on what part of progress you are willing to call good. Yes 4.5 is a better writer, but only in some narrow areas. It has improved on the things people have stronger preferences for, ie corporate writing, meme jokes, mid poetry, etc. ask it to write anything more literary and it flunks.

Expand full comment

Reply (2)

DangerouslyUnstable

11hEdited

Please tell me the non-subjective criteria that a 3-star meal is better than a microwave burrito. Yes, _almost_ everyone will agree that it is. But that doesn't change the fact that it is a completely _subjective_ assessment.

And before you answer with "freshness" or "quality of ingredients" or "technique of the chef", make sure you are not shifting the preferences/subjectivity up a level.

One can come up with criteria along which two dishes can be relatively objectively compared, but the very selection of those criteria is, itself, subjective.

Expand full comment

You are assuming there are criteria or that the lack of such criteria proves your point.

Expand full comment

DangerouslyUnstable

When you claim lack of subjectivity, yes, the lack of criteria does indeed prove my point. That's almost definitionally what objectivity means. That there are objective criteria.

Expand full comment

You’re trying to apply modes of rationality that don’t really work.

Expand full comment

DangerouslyUnstable

11hEdited

You are completely correct that they don't work....because the field is completely subjective and therefore they don't apply. That's the _reason_ that they don't work.

That's basically what it means for those things to not work: that the thing they are trying to applied to is subjective/has no ground truth.

-edit- I'm pretty sure that finding out that taste has objectively correct answers would be philosophically equivalent to finding objectively correct morality. If you solve one, you basically solve the other.

Expand full comment

Shockz

10h

Someone with knowledge and experience in an artistic field might have different preferences, and those preferences might have required that knowledge and experience to develop. That does not necessarily make those preferences more correct.

Likewise, the fact that a work took more effort, skill, and/or experience to create does not inherently make that work "better".

I think it's honestly a bit of a shame when the preferences of artists and critics at the top of a field become too misaligned with those of the general public (which they certainly can). Wouldn't it be better if the best artists were creating works that more people could appreciate? Wouldn't it be better if critics were able to direct the public towards works they appreciate more reliably?

Expand full comment

https://open.substack.com/pub/dwatvpodcast/p/on-gpt-45?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

10h

I happen to agree with this. It’s not inconsistent with what I said.

Expand full comment

Sylvain Ribes

13h

It's often both right and hallucinatory for me, as if it were an overconfident genius.

The most striking example was when I asked it whether paracetamol for infant teething was backed by evidence and it conjured a non-existent Cochrane review without additional prompting.

It also often comes off as almost cocky in the way that it addresses me. Dunno how it got imbued with this.

Good brainstormer, editor and writer though, although I have yet to test it in French.

One has to wonder how even additional GPUd would allow them to serve all users if they're already at capacity

Expand full comment

Askwho Casts AI

13h

Podcast episode for this post:

Expand full comment

tup99

> > I view it as a model that attempts to improve on the dimension of aesthetics only.

> Where I definitely disagree with him is in the idea that the model is only attempting to improve on the aesthetic dimension. I have no doubt OpenAI had much higher hopes for what GPT-4.5 would bring us...

He said the "model attempts", not "OpenAI attempted". I think he just means "this model is mainly an improvement on aesthetics," rather than "OpenAI was only trying to improve on aesthetics." Why would he have an opinion on that?

Expand full comment

Sol