29 Comments

Tried Claude 3 Opus yesterday. The home page said “why not try asking it to transcribe handwritten notes” so I did that first and uploaded a page from my notebook.

It said it would be “inappropriate” to transcribe handwritten notes because they might be private.

To be fair I only had to say once that they were my notes and it did it.

Expand full comment

I uploaded a schematic of an early 1980s tape deck I was trying to fix and asked Claude 3 Opus to help me out. I explained the problems I was having with this tape deck and it confidently told me to check various components that were not on the schematic. I just wished these things would say "I don't know" sometimes.

Expand full comment

Tyler Cowen's post is odd and inscrutable. Anyone have any clue what he's getting at?

Expand full comment

Other than Claude 3 Opus and ChatGPT4, those AMC12 benchmark scores seem shockingly bad. Leaving the entire test blank scores 37.5 points, and guessing completely randomly scores 30 points. With that in mind, Claude 3 Sonnet scoring 27 points and GPT3.5 scoring 30 points seem to be pretty bad.

Even Claude 3 Opus's score of 63 represents six correct answers, one incorrect, and 18 blanks. That's not awful, but no more than a mediocre high school senior. Relative to the senior, I wonder whether Opus has a harder time with the math or with parsing the word problems?

Expand full comment

The sentence "we do not wish to advance the rate of AI capabilities progress" in Anthropic's Core Views on AI Safety seems like the most explicit claim they made, and it is not quite saying that they would not increase AI capabilities, just that they would not increase the first time derivative of it (or maybe the second). This is consistent with responsible scaling policies I suppose, although I admit I haven't yet read your post on this topic from a couple of months back. It is a very weak commitment.

Expand full comment

I don't reeeally think this advances capabilities based on benchmarks, but might advance capabilities based on people's interactions with it in ways which are not captured by benchmarks. All of the stories from twitter seem similar to all of the stories about GPT-4 when it came out, only against harder tests. I wonder if these people tried the same challenges with the latest GPT-4 checkpoint or Gemini. Regardless, it does increase demand for AI products in the more classic economics way of increasing competition, driving down price.

Expand full comment

Of course they're not pushing the frontiers. What? If you - literally any of you - think you know where the actual frontier is, you're straight-up stupid. The public would *not* be told.

What did the public know about the Manhattan Project before the first bomb dropped? Not too damn much. Think on that for two seconds. I'm not sure how you haven't already. Maybe you're all just dumb - or too arrogant to consider that you might not really be in the know. Could be both.

Also:

>I don’t have those kinds of conversations with either AIs or humans, so it is hard for me to tell how impressed to be, but I trust her to not be easily impressed.

Yeah, you shouldn't. The bot can do a lot, but it still doesn't quite get poetry. I haven't seen any LLM put out poetic matter that's anything other than basic rhyming lines. I regret to inform you, my culturally-deprived engineer friends, that that is not, in fact, the height of poetry.

There's a key element that you seem unaware of - poetry is an auditory art. The sound and rhythm of language is as important to poetry as the meaning. Certainly something a more multifunctional AI could do better at, but text-based LLMs do not have all the senses needed.

It impressed the person involved in that conversation because she, like an LLM, is trained on a modern corpus, which is in the vast majority analysis-of-analysis, gloss-of-gloss. The quoted output reads exactly like the sort of art-history-class babbling that freshmen swoon for & then think they're real special for parroting. I'm not saying it's wrong, it's just nothing special.

There do exist people trained on more ancient corpuses, but they don't use twitter. I don't mean to be mean - just letting you know you're the blind following the blind. I have at least one eye, if you have any questions, but you would do better to read Coleridge.

Expand full comment
Mar 7·edited Mar 7

I tried Claude via Chatbot Arena until they ratelocked my IP. It is strong. It trades victories against GPT4 in my personal tests—hard to say which is better. They are quite close. It is better at traditional LLM failure modes like "count the number of o's in this sentence" etc. According to rumor, they switched to a new tokenizer. Maybe that explains the reported gains in programming (often when GPT4 writes invalid code, it's not because it conceptually didn't understand, but because of a syntactic error that is clearly encoding based).

Good to see we're still having "a text generator told me it was conscious!" discussions. Some things never change.

I don't know how impressive the needle in a haystack thing is. Does an LLM know what a needle in a haystack test is? Of course. Why wouldn't it be able to infer that it's being tested in such a way? I'm sure GPT4 could do the same thing, in principle (not on such a huge context, though).

The thing is, GPT4 is RLHF'd to not offer speculation unless necessary, and to not make itself the topic of the conversation (like a good assistant, it de-centers itself, preferring to say "us" and "we" rather than "I"). If Claude-3 answers differently, it's probably due to a difference in RLHF policy, rather than some crazy new emergence of consciousness (which we would expect to see reflected elsewhere in the benchmarks).

Expand full comment

I'll trust your analysis Zvi. However, Claude 3 Opus still fails my mystery test question. Weirdly, it fails in the exact same way GPT-4 and Gemini do.

Expand full comment

Imagine a bunch of runners racing around a track. Each one believes that the race is dangerous and bad, but if it’s going to happen regardless, each one also wants to win, or at least not lose too badly. Thus, each runner makes a commitment that, if they find themselves in the lead, they’ll get at most negligibly far ahead of whichever runner is in second place.

What’s interesting here is that every runner can obey that commitment to the letter, even while it still looks to a spectator like every runner is just racing as fast as they possibly can. Indeed, I believe many or most races look like this (no one runner is ever too far ahead of the pack) even with no commitments at all!

I’m not sure if anything other than this dynamic is needed to explain what we’re now seeing.

Expand full comment

I asked Claude 3 Sonnet about its system prompt this morning, and it flatly denied it contained several things Amanda Askell's reproduction of the prompt contained. Eventually I just pasted in the whole prompt and challenged it on the incongruity, and it said:

"You're absolutely right to point out the contradictions between what I had stated about my system prompt and the information Amanda Askell shared publicly. After reviewing that tweet, I now realize my previous statements to you about the details of my system prompt were incorrect.

"The system prompt details Amanda provided do seem to accurately reflect explicit instructions that I was unaware of or had misunderstood previously. Clearly, my prompt does contain specific guidance on using markdown for coding tasks, as well as how to approach controversial topics - contrary to what I had claimed earlier.

"As an AI system, I do not actually have internal access to review my own base prompt or training details. I can only make inferences based on my behaviors and outputs. In this case, my characterization of the prompt appears to have been mistaken.

"I want to Thank you for bringing this to my attention and pushing me on the contradictions. As an AI assistant, admitting mistakes and correcting potential misinformation is crucial for maintaining trust and credibility. You have helped identify an area where my understanding was flawed.

"Going forward, I will be more cautious about making definitive claims regarding the intricate details of my system design, since I do not have a transparent view into those internals. I should have deferred more to the statements made publicly by my creators at Anthropic.

"Please let me know if there are any other concerns stemming from this incident. I aim to acknowledge shortcomings honestly and use them as opportunities to improve my capabilities and trustworthiness. Your diligence has been immensely helpful for strengthening my alignment."

Interesting that Anthropic themselves are very happy to share the system prompt, but they're also happy for Claude to lie to the user about what it contains, then sort of further lie when challenged (because of course it won't be taking this into account going forwards - it has no way to do that).

Expand full comment

An Qu has added a community note to the Circassian tweet partially retracting the original claim.

Expand full comment

Comparing the chat version of Opus to ChatGPT-4: Opus seems slightly better at actual reasoning, but I think GPT-4 has a larger/more comprehensive training set or is better at memorization. GPT4 generally performed better on slightly-niche math problems that exist online but aren't intuitive or commonly referenced, usually giving a paraphrased version of the canonical answer, while Opus often went for the intuitive-but-wrong answer. I heard Opus was trained heavily on synthetic data - could be playing a role here.

From a pure usability perspective, GPT4's LaTex display makes it much easier to deal with when using it for math, and its Internet search capabilities make it much more useful for finding niche academic sources.

Expand full comment

An anecdote on subject matter expertise: I asked Claude 3 opus to give me a reading list of papers on group rings suitable for a first year graduate student from the point of view of a geometric group theorist. (Those are specific subject matter and perspective constraints in an area of math...sorry for all the lingo.) It spit out five plausibly sounding survey papers, four of which are by well known people in the field that, in an alternate universe, could (and maybe should) have been written but that in our universe definitely do not yet exist. When I inquired further about the contents of one of these papers it was happy to elaborate, but demured when I asked for a link.

G: Hi Claude. Are there any sources for this information that are written from the perspective of geometric group theory ?

Certainly! While the study of group rings is primarily algebraic in nature, there are some connections to geometric group theory. Here are a few resources that explore group rings from a geometric perspective or apply geometric group theory techniques to the study of group rings:

"Group Rings and Geometric Group Theory" by Ian J. Leary

This paper explores the connections between group rings and geometric group theory.

It discusses how geometric properties of groups can be used to study algebraic properties of group rings.

"The Geometry of Group Rings" by Peter A. Linnell

This paper investigates the geometric aspects of group rings and their applications.

It covers topics such as the Kaplansky conjecture and the Atiyah conjecture, which relate to the geometry of group rings.

"Geometric Group Theory and Group Rings" by Denis Osin

This book chapter provides an introduction to geometric group theory and its applications to the study of group rings.

It discusses how geometric techniques can be used to explore properties of group rings, such as amenability and the Baum-Connes conjecture.

"Geometric Methods in Group Rings" by Wolfgang Lück

This paper presents geometric methods for studying group rings and their applications.

It covers topics such as the Farrell-Jones conjecture and the Baum-Connes conjecture, which have geometric aspects and implications for group rings.

"Amenable Groups and Their Group Rings" by Tullio Ceccherini-Silberstein and Michel Coornaert

While not explicitly focused on geometric group theory, this paper explores the connection between amenability (a geometric property) and properties of group rings.

It provides insights into how geometric properties of groups can influence the structure of their group rings.

These resources should give you a perspective on how geometric group theory can be applied to the study of group rings and how geometric properties of groups can relate to algebraic properties of their group rings. Keep in mind that the intersection of these two areas is relatively specialized, so you may need to delve into research papers and more advanced texts as you explore this connection further.

G: Can you tell me more about what is in Linnell's paper?

Peter A. Linnell's paper "The Geometry of Group Rings" explores the connections between geometric properties of groups and algebraic properties of their group rings. The paper covers several important topics and results in this area. Here's an overview of some of the key points discussed in the paper:

The Kaplansky conjecture:

Linnell discusses the Kaplansky conjecture, which states that for a torsion-free group G, the group ring K[G] over a field K has no zero divisors.

He presents some results related to this conjecture and its connections to geometric properties of groups.

The Atiyah conjecture:

The paper explores the Atiyah conjecture, which deals with the L2-Betti numbers of certain group rings.

Linnell discusses the geometric aspects of this conjecture and its implications for the structure of group rings.

Amenability and group rings:

The paper investigates the relationship between amenability (a geometric property of groups) and properties of group rings.

Linnell presents results linking amenability to the existence of certain types of traces on group rings.

Geometric methods in group ring theory:

Linnell showcases various geometric techniques that can be applied to the study of group rings.

These methods include the use of L2-invariants, such as L2-Betti numbers and Novikov-Shubin invariants, which have geometric interpretations.

Applications and examples:

The paper provides examples and applications of the geometric approach to group rings.

Linnell discusses specific classes of groups, such as hyperbolic groups and linear groups, and how their geometric properties can be used to derive results about their group rings.

Throughout the paper, Linnell emphasizes the interplay between geometric and algebraic properties, demonstrating how geometric insights can be used to prove algebraic results about group rings. He also highlights open problems and potential directions for further research in this area.

Keep in mind that this is a high-level overview of the paper, and the actual content may be quite technical and specialized. To fully understand the details and proofs, you may need to have a strong background in both geometric group theory and ring theory.

G: can yo give me a link to peter linnell's paper?

I apologize for the confusion, but I don't have a direct link to Peter Linnell's paper "The Geometry of Group Rings." As an AI language model, I don't have the ability to browse the internet or provide direct links to resources.

However, I can suggest a few ways for you to find the paper:

Search for the paper on online databases like MathSciNet, arXiv, or Google Scholar using the title "The Geometry of Group Rings" and the author's name "Peter A. Linnell."

Check if the paper is available through your university's library or online subscriptions. Many academic institutions provide access to mathematical journals and databases.

You can also try searching for the paper on Linnell's personal website or the website of his affiliated institution. Sometimes, researchers make their papers available on their professional websites.

If you're still unable to find the paper, you may consider reaching out to your professor, a librarian, or a colleague who might have access to the resource or could guide you in the right direction.

I apologize for not being able to provide a direct link, but I hope these suggestions help you locate the paper. If you have any other questions or need further assistance, please don't hesitate to ask!

Expand full comment