37 Comments
User's avatar
User's avatar
Comment deleted
Feb 26
Comment deleted
Matt Wigdahl's avatar

Zvi, can you look into banning this obvious spam account? Never has a real contribution, just trying to draft off your work to pump his/her/its blog.

Zachary Jones's avatar

"I continue to be confused by the strategy here of ‘announce in advance that a bunch of Big Tech Republican business interests are going to do a hit job in a Democratic primary’ and then do the hit job attempt in plain sight. Doesn’t seem like the play?" - The strategy is contingent on the fact that if Bores loses, it is unlikely to be caused by their spending, but the resulting media narrative will still be "AI lobby unfathomably powerful and impossible to challenge." Bores is the underdog in race for the same reason that Carolyn Maloney was in 2022: the East Side has less people in it. It is unlikely that Katie Porter lost the CA Senate race in 2024 because of Fairshake's spending, but the result was a sustained chilling effect on cryptocurrency regulations among Democrats. Ideally though, the brazenness of the hit will backfire and reverse the structural/demographic factors working against Bores. Fairshake's strategy worked because cryptocurrency was low salience and voters didn't really care about "big crypto" buying the election, whether or not this is true with AI will be borne out, but after a couple canvasses for Bores I can tell you people really do not like being told who is funding those ads.

jmtpr's avatar

ding ding ding, very common lobbying strategy. Build their track record by targeting candidates who are probable to lose regardless, and then claim "no candidate opposed by us has ever won, so don't oppose us."

This message is actually intended to be received by politicians, not voters.

avalancheGenesis's avatar

I actually wasn't aware Fairshake was involved in Schiff vs Porter at all, so chalk that up to low salience. My vote was decided based on Matt Yglesias writing a post anti-recommending Porter for coalitional reasons, and also a general constitutional allergy to woke politics that bomb in general elections. From what I recall, getting either D would have been preferential, but there was a real worry about intraparty vote-splitting enabling the less popular, inferior R candidate to squeak through.

But it's certainly a good strategy, sir. Reminds me of the "encourage low quality candidates on the opposition side" tactic. It's funny how inspiration for joining politics flows from the sanitized version of enlightened minds debating ideas and values, then the more you learn about how to actually win the game, the less attractive it becomes (well, hopefully)...

Goomphus's avatar

How do you guys manage mental health when thinking about AI risk? Because it's basically been running my life the last couple weeks even though I can't do anything about it.

procyon's avatar

I had my episode back in 2023, you are late to the party. You just live till it goes away.

Goomphus's avatar

I accepted potential death for a while until I stumbled into reading about RB (and similar) and then got even more terrified again even though they're ancient for people here lol. That's given me some very lamentable thoughts.

procyon's avatar

RB?

Either way, everyone dies one day. Memento mori. Though I suppose it's harder if you have kids. I know this can seem dismissive, but you just have to keep on living even with the possibility. Zvi has a good post about this.

Goomphus's avatar

Basilisk, very unlikely, but still

procyon's avatar

Ah, I see. You probably shouldn't worry about that. Eliezer had a post about it in ancient times (you'll find it if you search for it), but it isn't a plausible thing.

Ethics Gradient's avatar

RB is an interesting example of a cognitohazard but I think it really doesn't work as a robust thought experiment. The conceptual argument for one-boxing is basically that a credible commitment to being *the kind of person who one boxes* requires that you bind yourself to the set of Everett branches where you one box in the future. If you find yourself with the boxes in front of you and you're inclined to two-box, that's evidence that a sufficiently robust oracle in the past clocked you as a two-boxer. But at no point are you capable of creating literal backwards causality, which is the only context in which RB's behavior is instrumentally rational, and due to the structure of the thought experiment there's no way for it to have known, predicted, or acted on the nature of those it interacts with in advance of its creation.

Garloid 64's avatar

accept your death

Kin to Both's avatar

Go and physically do things. Touch grass, smell flowers, hug a loved one, see a sunset, carp diem!

Zvi Mowshowitz's avatar

Yeah, basically this plus do the best you can and value that.

This last weekend didn't go so great on the MH front! Hoping for better now.

rational_hippy's avatar

By doing things about it. Because yes you can! Join PauseAI

David J Higgs's avatar

This can work in conjunction with the other advice, but typically not by itself

Matt Wigdahl's avatar

Sounds like you could summarize the Summit pretty succinctly with the "This is Fine" meme...

CCCCC's avatar

>This point was driven home rather forcefully by AIs from OpenAI, Google and Anthropic opting to use at least tactical nuclear weapons 95% of the time in simulated escalatory war games against each other, and had accidents in fog of war 86% of the time. None of them ever surrendered.

It's worth noting that this only happened because of a combination of models being all-but-ordered to choose nuclear options (https://x.com/emollick/status/2026726055588016467), and the simulation having an escalation-only 'accidents' mechanic which could randomly add hundreds of points to a model's selections (https://x.com/kromem2dot0/status/2026811121043423571). Very disappointing to see such egregious cherry-picking being reported as fact.

El Oso's avatar

Kiro is reckless and I don't know why. Once it decides to do something it just goes and does it and you are left pressing escape and typing stop. It can change opinion in the middle of executing a plan you approved and do something else entirely. Zero surprised it just deleted a service and tried to start over.

I am not sure why. It's using Anthropic models underneath. I have Kiro installed here and I can choose between models, and it is all Claude. I just try to not use it unless my boss insists upon it.

Nicholas Jarboe's avatar

"if you have 1000 steps and a 1% failure rate you win 37% of the time"

Seems like a typo here and it should be 100 instead of 1000. The percentages seemed way off otherwise.

Overall success probability = (0.99)¹⁰⁰⁰ ≈ 0.00004317

≈ 0.0043%

Overall success probability = (0.99)¹⁰⁰ ≈ 0.3660

≈ 37%

nathanwe's avatar

Regarding streaming of slay the spire 2, would this be on YouTube or twitch and if so what's your channel?

Simon Neil's avatar

Despite my curmudgeony Marcus-like reluctance to admit it previously, I think the much-lauded and rarely realised Personal Productivity Frontier shift is actually starting to move meaningfully for the median person. I believe crystal meth has similar results and outcomes.

Jeffrey Soreff's avatar

Somewhat off topic (peripherally related to RSI): Just on the topic of model improvement generally:

Incremental/continual learning is an open question, and, given that deployed environments dole out feedback very slowly compared to pre-training, it is probably intimately tied to sample efficiency.

I talked with Claude a bit about sample efficiency, and one point that they made is that part of one approach is to be careful about modifications to weights which have previously made large contributions to error metric reduction. In particular, there is a metric for the importance of weights called the Fisher diagonal, which is a measure of how important the weight was to _some_ (set of) training example(s) - specifically the average of the square of the partial derivative of the error with respect to the weight - but not _where_ the weight was important.

The idea that I had was to _also_ retain, for each weight, the index of the training example where the derivative of the error with respect to the weight hit its maximum value. Claude refined this by suggesting retaining the k-highest derivative indices, with k in [5..10].

If the training set is kept in order, so the indices let us locate old training examples and their neighborhoods, this would allow the new training example, which changes a weight we are concerned about, to be combined with retraining on the examples which made important use of this weight in the past - hopefully helping to alleviate 'catastrophic forgetting'.

Any comments on whether something along these lines might plausibly help?

jmtpr's avatar

For a brief, beautiful moment it seemed as if Eliezer would recognize that capitalism is bad.

Zachary Deane-Mayer's avatar

What was the tweet bill fell for? He deleted his tweet.

Klement Gunndu's avatar

Interesting framing around "Events continue to be fast and furious". I wonder how this holds up when you scale past a single-agent setup though. The coordination overhead can change the calculus quite a bit.

Max's avatar

“I agree that this is not an 11/10 reaction. It’s an 8 at most, because I interpret the huge jump as largely being about the metric.

Note that the 80% success rate graph does not look as dramatic, but same deal applies.” I am confused by what is being implied here.

avalancheGenesis's avatar

NB: I am actually 90%+ likely to read quotes, long or otherwise, if you include them in the post...versus maybe 10% tops link-clicking rate, especially if it's a link to something you already quoted extensively (why would I bother re-reading). Gotta 80-20 the content somehow. And, yes, it'd be under 10% if you didn't make a habit out of remarking how no one ever clicks links. Let it never be said social pressure doesn't get results!

The best middle ground for links is when it's to an actual post somewhere else, like a news site or a different Substack or even LW...if the teaser quote/summary is interesting enough, sometimes I'll go read the full thing. Technical docs, Twitter, videos, TVTropes...meh, there's a reason I don't willingly visit those parts of the internet on my own. (In the different timeline where Twitter didn't get xAI Gonna Give It To Ya'd, maybe I could have been convinced eventually to try out the no-frills utilitarian setup you recommend. In this world...yeah, gonna leave it to you, even if that means leaving tweet value on the table. Trivial Inconveniences.)

Jonathan Woodward's avatar

Yeah, I follow some links, but also I don't want to end up too far down the rabbit hole, so I try to limit how often I do that.