20 Comments

Excellent, thank you. Regarding Wikipedia: I am a software professional working in energy for the last decade, and Wikipedia's English language pages on modern energy systems (not an obscure topic!) are dominated by semi-informed ideologues who keep them inaccurate and quite dated. Pigs, mud, they like it, etc.

Expand full comment

Ironically also true in topographic field research from mountaineering, which is a polar opposite type of domain.

Expand full comment

"They came for the topographers, and I did not speak out, because I was not a topographer ..."

Expand full comment

Could you please give an example? I have zero knowledge in this field, hence being a typical reader of such articles (curiosity i.e). Therefore it is interesting how these ideologies influence what an uninformed reader learns.

Expand full comment

https://en.wikipedia.org/wiki/Nuclear_power_in_the_United_States begins:

"In the United States, nuclear power is provided by 94 commercial reactors with a net capacity of 97 gigawatts (GW), with 63 pressurized water reactors and 31 boiling water reactors.[1] In 2019, they produced a total of 809.41 terawatt-hours of electricity,[2] and by 2024 nuclear energy accounted for 18.6% of the nation's total electric energy generation.[3] In 2018, nuclear comprised nearly 50 percent of US emission-free energy generation.[4][5]

As of September 2017, there were two new reactors under construction with a gross electrical capacity of 2,500 MW, while 39 reactors have been permanently shut down.[6][7] The United States is the world's largest producer of commercial nuclear power, and in 2013 generated 33% of the world's nuclear electricity.[8] With the past and future scheduled plant closings, China and Russia could surpass the United States in nuclear energy production.[9]"

Try updating those absurdly outdated metrics with non-controversial information from authoritative 2024 sources and see what happens.

Expand full comment

That's not an example. Can you tell us what happened when you tried to update it please?

Expand full comment

Yes, I bet the need for a "private" Wikipedia is underrated since the actual process doesn't necessarily converge on the best information.

Expand full comment

Re: "To me there are mostly two speeds, ‘don’t care how I want it now,’ and ‘we have all the time in the world.’ Once you’re coming back later, 5 minutes and an hour are remarkably similar lengths.", to me this is only true if you're doing one-shot, get the result, take it or leave it, which is not at all how I use LLMs most of the time. I do a lot of refining: "this is good but change this and this and focus more on this", and often I am only satisfied with a result after 10+ prompts. At that point, the difference between 5 minutes and an hour becomes very prominent!

Expand full comment

> Wikipedia curates the information that is notable, gets rid of the slop.

Not a good thing at all, given their standards for 'slop'. https://gwern.net/inclusionism

Expand full comment

Statement: I believe the lack of visible safety testing for a highly intelligent internet browsing agent (DR) is abnormally concerning, even as a reader that typically disagrees with your conclusions on safety.

I think the rushed nature of DR's release should receive more coverage.

Expand full comment

Many Thanks! Great information!

I'm in the "plus" tier, so waiting for access to run my benchmark-ette.

As an aside, one of the questions in the benchmark-ette concerns the color of a CuCl2 solution in HCl, which contains the ( CuCl4 ) 2- ion. I used sci-hub myself to double check that the d-d transitions of this ion are too low in energy to be in the visible and affect the color.

Out of curiosity, I tried o3-mini-high on this aspect of the question, telling it to look for papers using sci-hub. It refused, viewing sci-hub as illegal.

On the positive side, this can be viewed (from the point of view of the scientific journal industry) as dodging a bullet, as not having "Agentic AI spawns a subtask from a user prompt and performs an illegal, malicious act."

On the negative side, this makes o3-mini-high significantly less useful to researchers (with presumably an analogous restriction in DR).

As you cited, this sort of behavior will make making use of access credentials very important!

Expand full comment

I wonder whether you can specifically ask it to use Sci-Hub, Anna's Archive, archive.is, etc. to kinda "solve" the paywalled sources issue.

Expand full comment

Sounds legally tricky. The agent is not going to go there 'spontaneously' because those are all excluded from the search engine indices it's learning from by searching. You would have to *make* it go there... while you are well-aware (because that's the point!) that those are highly illegal sources to be downloading thousands of books & papers from all day long for potentially hundreds of millions of users.

If I made that suggestion to an OA lawyer, I would want to step back several meters first to clear the blast radius from the meltdown.

Expand full comment

"Instead of giving me a giant report, give me something focused. But probably I should be thinking bigger."

The leap I'm waiting for is turning information analysis into visualization.

Wikipedia is great, but like much of the internet it's a wall of text. What I'd love is for my first interaction with information to be a visual representation. Then to drill down to deeper layers of visuals, and only arrive at text some layers down.

This goes both for information retrieved by the agent, but also information I submit. You would input a large text, and be able to pull out visual representations of structure, much as an architecture program can let you pull out the electrical system or the load-bearing walls.

So if I'm writing a novel (or studying one), it would be amazing to request, say, "Show me all scenes with character X", and get a visual representation of where they appear in the larger structure. Or "Show me the arcs of the main character with respect to the mission and with respect to romance." Or "Here are the works of Plato. Show me a structural overview of his arguments about virtue". But obviously the same applies far beyond literature, it could be a huge ream of EU case law about transportation of meat products or whatever.

And then you can drill down into the structure to access more and more granular summaries and ultimately the original text. Humans – or at least a lot of us – are far better at assimilating the information in maps than a vast winding scroll of text.

Expand full comment

I feel like all the rave reviews were basically "it's GREAT provided you can't see the 10% inaccurate, made-up stuff". Nassim Nicholas Taleb must be having heart palpitations with all this

Expand full comment

> I notice I haven’t pulled the trigger yet. I know it’s a mistake that I haven’t found ways to want to do this more as part of my process. Just one more day, maybe two, to clear the backlog, I just need to clear the backlog. They can’t keep releasing products like this.

I'd skip it, I found Pro / Deep Research to be mostly useless.

You can't upload documents of any type. PDF, doc, docx, .txt, *nothing.*.

You can create "projects" and upload various bash scripts and python notebooks and whatever, and it's pointless! It can't even access or read those, either!

Literally the only way to interact or get feedback with anything is by manually copying and pasting text snippets into their crappy interface, and that runs out of context quickly.

It also can't access Substack, Reddit, or any actually useful site that you may want to survey with an artificial mind.

It sucked at Pubmed literature search and review, too. Complete boondoggle, in my own opinion.

Expand full comment

After Deep Research, the next need is Deep Review.

And for code generation, Deep Security. And so on.

Anecdotally, I'm finding that Claude still seems to be a worse prompter than me unless I give it ~infinite context. There are just too many potentially incorrect assumptions to make about what I want. But the general idea that prompts are becoming more important seems true. Garbage in, semi-garbage out. I wonder how that will change as we approach AGI.

Zvi, you know the product releases are actually going to get faster rather than slower, right? That's how this all works.

Expand full comment

> Suppose this integrated the user’s subscriptions, so you got paywalled content if and only if you were paying for it. Credentials for all those academic journals now look a lot more enticing, don’t they?

How long until someone teaches DeepSeek Deep Research or something how to use SciHub?

(and in certain fields, basically anything worth reading nowadays is either open access or available on the arXiv or somewhere in a near-final version)

Expand full comment

I guess this is the right place to put this...

tl;dr: ChatGPT deep research 02/26/2025

7 questions, results: 3 correct, 4 partially correct

About the same net results as o3-mini-high. (b) improved (g) deteriorated

a) Correct

b) Fully Correct

c) mostly correct (two errors)

d) correct

e) initially incorrect, two prods gave correct result (some deterioration)

f) misses a lot, accepts additional entries smoothly

g) partially correct - find correct examples, but includes incorrect ones too

full description at

https://www.astralcodexten.com/p/open-thread-370/comment/96473557

Expand full comment