AI #100: Meet the New Boss

Zvi Mowshowitz

Jan 23

Break time is over, it would seem, now that the new administration is in town.

Read →

28 Comments

Mariana Trench

Jan 23

I like how on the LSAT question, all the X users agree that it's super simple and then they all pick different answers.

Expand full comment

Reply (1)

Ethics Gradient

Jan 23

My experience studying for the LSAT was that my practice scores went up by like ten points when I stopped trying to think at level 2 and instead just went for the ostensibly correct level 1 answer. It was like that Lawrence Waterhouse taking the ASVAB story in Cryptonomicon - it’s definitely possible to overthink the test.

Expand full comment

Ian Crandell

Jan 23

I don't have a problem coding on a laptop but I have tiny hands. Could that be it?

Expand full comment

Reply (1)

ConnGator

Jan 25

Yes, it is mostly about the keyboard, if you have a large-screen laptop. Of course, it is nowhere near as good as a top-tier multi-screen desktop setup, but much more portable.

Expand full comment

Dave Friedman

Jan 23

On podcast transcripts and LLMs: I passionately hate podcasts but on the rare instance I am sufficiently interested in what a podcast guest has to say, I will dump its transcript into o1 pro, and instruct o1 pro to parse the transcript and generate 10 questions for which the answers will convey important points. I then ask it to use the transcript to answer its own question. In practice this collapses a multi hour podcast into five minutes of prompting and one minute of scanning text.

Expand full comment

Tyler

Jan 23

> Heaven banning, where trolls are banished to a fake version of the website filled with bots that pretend to like them, has come to Reddit.

Looks like this was actually made up by the linked X user.

Wonder how long til we see something like this however...

Expand full comment

Tyler

Jan 23

Also, is the paragraph starting with "So I want to make it clear" missing a link?

Really curious what this was referring to.

Expand full comment

Askwho Casts AI

Jan 23

Congratulations on reaching 100 AI posts! These continue to be fantastic resources, serving both as insightful analysis and as the definitive compendium of views from across the spectrum. This milestone post was particularly substantial, with the audio version running 2 hours and 17 minutes - spanning 128,000 characters (after image description / transcription).

I'd like to once again ask for support. For those who enjoy these ElevenLabs quality, multi-voice conversions of Zvi's posts, please consider contributing to the production costs through a paid subscription to the podcast Substack (of course, first and foremost you should be subscribing directly to Zvi!). While ElevenLabs voice credits remain stubbornly expensive, in my opinion they truly enhance the listening experience for these long-form posts, with multiple voices allowing for clear differentiation of quoted content.

No matter what, I hope these podcast episodes of "Don't Worry About the Vase" have brought value to at least a few of you.

Here is that episode for this post:

https://open.substack.com/pub/dwatvpodcast/p/ai-100-meet-the-new-boss?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Expand full comment

Ethics Gradient

Jan 23

Intrigued by “What would a Virtuous Person Do?” as a possible mode of alignment that seems at least somewhat capabilities-agnostic while also being tractable based on existing datasets (also seems high CEV-compatible).

People like to dunk on virtue ethics for being too heuristic-based but I’ve always kind of felt that in practice a *lot* of human behaviors round out to VE as the underlying implementation of whatever moral system is the nominal interface.

Expand full comment

M Flood

Jan 23

"The instinct when hearing that taxonomy will be to underestimate it, since it encourages one to think about going task by task and looking at how much can be automated, then has this silly sounding thing called ‘ideas,’ whereas actually we will develop entirely transformative and new ways of doing things, and radically change the composition of tasks."

So true. Imagine the "aha!" moment where you not only improve a process but stop doing that process entirely and invent a better one.

Expand full comment

Dan Lucraft

Jan 23

I’ve been doing voice to prompt when coding for months now, using SuperWhisper. It’s fantastic, feels futurist lazy

Expand full comment

Leo

Jan 23

Minor: DYM: 'polity' here or was there a typo in the source?

re: "inferred present, inferred absent or absent for the _plity_ called ‘Funan II’ ..."

[Thanx for the comprehensive post.]

Expand full comment

Sir Humphrey Appleby

Jan 24Edited

As a distributed system engineer at a large company with access to chatgpt I have not been able to get it to be very valuable. The core issue is that the level of context I have is too great to bridge quickly. If someone added the context of all the written docs plus all the code maybe it would be amazing.

Generally in my area small changes (few lines of code) can leads to millions to 10s of millions of revenue. So far all I’ve managed is for GPT to write some glue code. Maybe in a couple years someone will improve the context piece. I don’t think we are at the point where it can really replace engineers until this is productized.

Expand full comment

Reply (1)

kenakofer

Jan 24

Absolutely, LLM coders are great for starting new projects, but are uninformed on large existing codebases. Much of programmers' time is spent on understanding the thousands or millions of lines of code that influence their next 2 line addition. There's also necessary context from meetings, Slack, PM software, that needs to be incorporated conveniently.

Some possibilities for SE's trying to use it better:

- Write tools to automate context extraction from every source you can, so that it can be fed to an LLM in a few keystrokes.

- Modularize more, so future code can be lower context (Good (re)architecting is a very hard problem though, and software companies dump tons of resources into it already.)

Expand full comment

Rory Watts

Jan 24

Coding on a laptop insane? Is that because you’d prefer a larger screen? I understand that, especially now that my vim days are less productive than using Cursor, which I deeply regret

Expand full comment

Muster the Squirrels

Jan 24

> Pliny jailbroke it in 24 minutes and this was so unsurprising I wasn’t sure I was even supposed to bother pointing it out. Going forward assume he does this every time, and if he ever doesn’t, point this out to me.

I recommend that you do point out Pliny's jailbreaks each time, no matter how inevitable they seem. First, it makes it less likely regular readers will forget how ineffective safety efforts have been so far. Second, you may get irregular readers who arrive on the post by following a link. For example, I've shared a few with your non-subscribers.

Expand full comment

Garrett MacDonald

Jan 24

Looking forward to your response to Scott Sumner’s recent post on alignment

https://scottsumner.substack.com/p/epistemology-in-the-age-of-ai

Expand full comment

Michael Sullivan

Jan 24

Let me try to steelman the position of people who you and patio11 describe as "seeing the narrowest of all possible bullseyes, that everything stops right here."

I think it's not exactly that they believe that AI progress will stop right here. It's that they believe that AI progress SO FAR has been mostly illusory, or else that it's very narrow.

Here are some things I think they believe:

NOT: "there will never be any progress on any benchmarks"

BUT RATHER: "progress on benchmarks in the future and also the past mostly reflects gaming the test/memorizing the answer rather than real intelligence"

NOT: "unable to read and writing convincing stuff about Patrick's essays"

BUT RATHER: "their writing is just superficially convincing, using tricks of fluency to give the impression of depth where depth never was"

and even:

NOT: "AI will never get smarter"

BUT RATHER: "smartness is not what currently constrains real value from accruing to AI, perhaps it's reliability or lack-of-hallucinations or some kind of deep understanding of the world"

To be clear, those three examples are potentially three different people of differing levels of sophistication.

I think that this steelman is nevertheless at least somewhat wrong, but I think it also contains an important kernel of truth: that LLM progress so far has been marked by the illusion of greater depth than the actuality of it. When you first looked at the output of, say, GPT3, it was like, "Woah, holy shit!" But when you dug in and looked past the fluency of the results, you saw that it wasn't very smart. LLMs have, from the start, seemed deceptive. They hallucinate (which looks a LOT like "lying"). They give fluent, subtly wrong answers. They do in fact memorize the answers of at least some tests, and give the impression of understanding when they do not in fact have that understanding.

I think that you, and probably most people reading this, would say, "Yes, there is an illusion of intelligence, but if you imagine the illusion of intelligence as a depth of water, and the actuality of intelligence as the seabed beneath the water, what has happened over time is that both the water-level and the underlying seabed level have raised -- that is, there continues to be a layer of illusory intelligence, but it is masking a greater and greater amount of real intelligence."

But it's easy for people to say, "the water level is rising, but the seabed is not. The illusion of intelligence gets harder to work through, but fundamentally the actuality of intelligence beneath it is not rising, or is rising very little."

Or, a related but somewhat more sophisticated argument might be that sure, in some cases there is a rising amount of real intelligence, but that real intelligence doesn't limit the utility of LLMs, reliability does, or lack-of-hallucination, or something, and that that isn't really rising.

Expand full comment

Dust

Jan 25

my takeaway from the AI literacy study is that a little AI literacy makes people stupider.

Expand full comment