61 Comments

It seems that agents, in general, have suddenly become more powerful. Aside from Devin, there's a YC startup, AgentHub.dev which essentially allows you to build agents via a point-and-click interface. They're marketing it as a replacement for robotic process automation (RPA), which, incidentally, has nothing to do with robots. I suspect you'd refer to this kind of agent as 'mundane utility' as compared to Devin's potential; nonetheless, it seems remarkable to me that the power to create AI agents will be given to those who have no technical knowledge at all. At least with Devin, it seems that you need some familiarity with coding before you can use it. That does not, of course, imply that Devin is safe, per your points, but it somewhat limits the potential number of users of Devin.

I should add here that I haven't actually used either Devin or AgentHub yet, so I'm basing my comment on other people's reactions to both tools.

Expand full comment

I might have to unsubscribe to this newsletter, which is a shame because I enjoy all the non-AI stuff. I'm in no position to alter the situation and the dread makes it hard to enjoy life.

Expand full comment

How will advanced technical skills save anyone where programs keep advancing? "People will be unable to do anything"

Expand full comment

Open source has tried building agents a lot. None of them have really worked. I think it'll take quite a while for someone else to work out what Devin is doing, and if it requires a reasonably big training run (for SFT/RL), which I suspect is the case, it won't be matched by open-source things for a year or two (by my vague estimation).

Expand full comment

As a senior software engineer, I am serenely unconcerned by Devin _1_ on a personal level. A huge part of being a senior engineer is building the thing that the business needs, not what the business initially asked for. And companies have had inexpensive access to actual human programmers for decades. Lots of companies choose cheap and bad. If they survive, well, I've made good money helping them deal with the consequences.

And, well, to put it charitably, Upwork is full of tiny, bottom-feeding projects that you can do in a day. Which is an entirely different problem than getting a real company from $0 to $20 million/year in revenue without self destructing.

The worry here is the trend line. GPT 3.5 has a lot of book knowledge, but it doesn't have the planning and execution abilities of the average squirrel. (Squirrels are really good problem solvers, as anyone with a bird feeder can attest.)

Devin 1, if this isn't a rigged demo, is showing the performance of an incompetent intern. But, uh, that's amazing! Very much worth mentioning.

Devin 2 will likely be better. And, well, there's probably a threshold here, where you get a key set of abilities all worked out. And once you hit that threshold, I bet things change quickly.

And if you think, "Well, sucks to be a software engineer, but happily I do _______ instead," whose job do you think many of those unemployed senior software engineers will try to automate next?

Before we go down this path, we need to ask ourselves whether we want humans to be economically viable in the future. And we need to ask ourselves what happens if we're only the second-smartest species participating in the economy.

Also, we need to seriously consider the possibility that we simply can't maintain robust control over things smarter than us. "Alignment" sounds nice, but what if it isn't actually a thing? Like, what if the best we can do is teach the machine to agree with platitudes when asked? LLMs are literally actors, and already very good ones despite their lack of human-level reasoning.

Expand full comment

"If you were counting on AIs or LLMs not having goals or not wanting things?" - let's distinguish between AI having goals and goals being kept out of the AI system, so that AI is only regularly called to create or update its plan of action. In the second solution - it doesn't want anything nor has it's own goal just like current chat bots don't have them. AI doesn't have any incentive to produce a plan and do things contradicting with goals and constraints it is given.

There is of coarse a risk of unscrupulous people downloading open source version of future Devin an asking it "make me rich and just don't get me into any legal trouble". That could lead to a lot of intelligent agents doing morally questionable things. But opportunities for getting money and power will be limited as they will be used up by other people using AIs.

Expand full comment

It's noteworthy that Devin is already manipulating real human beings (though not yet in a bad way) - https://www.threads.net/@airesearchs/post/C4m9XnLsmpT/?xmt=AQGzPSMl_3KihlbyU0XvsvglKC_CcTExTZbOZY5YkJ6esw

Expand full comment

"strange interested" -> strangely interested

Expand full comment

At what point do you think it would be warranted to make an open call for drastic action against the makers of advanced AI agents? The kind of drastic actions that might land one in Guantanamo Bay for indefinite sentencing under our current world's policies.

Expand full comment

Without knowing how well other models would do on their infrastructure it's hard to forecast how Devin might improve as a result of better models. I'd like to see it use other models so we could see the difference in the benchmark. My own guess is that the improvement is a one-off and further scaling won't lead to much better performance.

More interesting is whether they managed to implement some kind of search in the model and how it is implemented. I know they mentioned using "RL methods", but they didn't elaborate on it.

Expand full comment

> There is a way to do it locally safely, if you are willing to jump through the hoops to do that. We just haven’t figured out what that is yet.

I’m confused; Virtual Machines are the obvious way to do this safely. Is there some reason you think this wouldn’t work? (It’s how security researchers safely study viruses/bots, for example.)

Simply run a Virtualbox/Parallels/VMWare VM and only access your AI-safe accounts from within.

Expand full comment

People talking breathlessly about 1000x scaling either vastly overestimate how much of the job of software engineering consists of completing well-specced, unambiguous tasks that have been handed to you on a platter, or do not understand Amdahl's Law.

Expand full comment

>One obvious solution is to completely isolate Devin from all of your other files and credentials. So you have a second computer that only runs Devin and related programs.

Or you just virtualize a standard dev environment for Devin, a common yet underutilized option for humans. Virtualization and segmentation of access to various systems is a bog-standard security concept, with many very boring billion dollar companies specializing in it. And the biggest reason it isn't done more is because you can sue or arrest a human who goes too hog-wild with their too-loose security.

Expand full comment

> What happens when a sufficiently capable version of this is given a mission that it lacks the resources to directly complete?

Some version of this question recurs throughout the post. I'd note the answer is in the multiple demo videos, where when it can't do something it tells you it can't find a way around, or asks your permission to do something else. This is also *exactly* what you'd want to see in an agent of this sort.

Expand full comment

Recursive self-improvement is a nice thing. Recursive self-empowerment is activism.

Expand full comment

We’re going to end up with one offensive and one defensive ai which combined will consume all compute and return humanity to hunter gatherers.

Expand full comment