The big news this week was that OpenAI is not training GPT-5, and that China’s draft rules look to be crippling restrictions on their ability to develop LLMs. After all that talk of how a pause was impossible and working with China was impossible and all we could do was boldly rush ahead, the biggest American player and biggest foreign rival
Hey I just started reading this, but when I click on something in the table of contents in the app it takes me to a different page instead of staying within the same page, which is very annoying. Substack is usually good on technical stuff, I’d reach out to them on this.
"This is Zvi Mowshowitz, my blog itself is funded but if you fund me I will hire engineers at generous salaries to try out things and teach me things and build demonstration projects and investigate questions and other neat stuff like that, maybe commission a new virtual world for LLM agents to take over in various ways"
I support this, but I don't have "hiring engineers" kinds of money! What else can I do?
On the Larry Page thing; the more I see the ways some of the central figures talk about AI, the more I think these people are completely insane.
The link to your podcast appearance appears to be broken:
I had not heard of Duolingo Max. I will likely sign up for it the instant it becomes available for English Speakers learning German (I already use Duolingo). However, it doesn't have the one feature I think would be super useful: longer form reading that sticks primarily to the vocabulary Duolingo knows that you know. The Role Play could serve a similar function, but I think having content to consume that is just reading, is level/vocab appropriate, and is longer than their current "stories" feature would be great. Especially for _very_ early language learners, finding medium length content to consume is _extremely_ difficult. It seems like GPT 4 should be able to generate such content pretty trivially.
"It is still early. The key claim of Fan’s is that the problems of AutoGPT are inherent to GPT-4 and cannot be fixed with further wrapping. If we are getting close to the maximum amount we can get out of creating a framework and using reflection and memory and other tricks, then that seems rather fast. We are only doing some quite basic first things here. Perhaps the core engine simply is not up to the task, yet there are definitely things I would try well before I gave up on it."
Agreed that there is a lot more that can be done by wrapping things around GPT-4, and that it would be surprising if this did not yield some results. That said, for many values of "the task", I do think GPT-4 is probably not up to it. When building a complex workflow out of multiple LLM invocations, if any one of those invocations goes wrong (i.e. hallucinates or otherwise generates bad output), it's hard to avoid spoiling the entire workflow. Meanwhile at the planning level, it's easy to get caught in loops or other bad attractor states. I am somewhat skeptical that a "lightweight" layer on top of an LLM, such as AutoGPT on GPT-4, can robustly manage complex tasks in general. Of course there will always be some simpler cases where this is less of a problem, and some of those will be useful to automate.
FWIW, I recently gave some serious thought to the question of what it would take to stretch GPT-4 to the point where it could undertake more complex tasks, such as a nontrivial software engineering project. I wrote up the results here, if you're interested (and I'd love to hear any thoughts you might have): https://amistrongeryet.substack.com/p/can-ai-do-my-job. A few relevant snippets:
> When a problem is too difficult to solve in a single intuitive leap, you need to undertake a process of exploration. ... Exploration entails a mix of activities: generating new ideas, modifying and refining existing ideas, breaking a problem into subproblems, exploring the context (e.g. reading through existing code), gathering new data, asking for help. There are always an infinite number of potential next steps, so judgement is constantly needed: is it time to give up on an idea, or should I keep tinkering with it? What information would help me make a decision? Have I reached a dead end, do I need to push back on the project requirements? Is this result good enough, or is it worth the effort to optimize it further?
> ...There are many ways for exploration to go wrong, so another important skill is self-monitoring, noticing when you’ve fallen into a bad pattern and need to change course.
> ...Arguably, the ability to effectively, efficiently, and astutely explore and refine a complex idea is the fundamental difference between shallow and deep thinking, and one of the critical elements missing from current LLMs.
To clarify, there aren't enough GPUs in the world to meet current AI demand, that is definitely a high-order bit. But Azure is also going to have a lot more capacity come online as orders get delivered. The 12-16 months is paraphrasing a comment Scott Guthrie made on how like there's no fast twitch muscle that lets you wave a wand and have a new datacenter appear, that's the absolute minimum time it takes between going "oh we need a new building to put the chips in" and being able to have that ready.
Why do you believe anything China says?
I think the medium-long term utility of AI is very high, but I am really struggling to get good utility out of these models in their current form. GPT is very good at coding quick prototypes or interpreting APIs but loses usefulness (for me) after an hour or two into a months long process. It is also good at automating busywork like data whitespace adjustments or something, I don’t do that very much but maybe some people do. I’d say maybe I’ve gotten like a 5% productivity increase? I tried to get it to summarize too, but I think most of the stuff I read is either already pretty concise or you lose a lot of it if you turn it into bullet points.
Not to be a dick, but I notice that a lot of people saying they’re getting 3x or 10x out of the current tools have jobs where I could not describe what they actually do at their job aside from generating a lot of emails and meetings.
I don't believe OpenAI or any other major player if they say they are pausing or even slowing down. I'm more likely to believe the opposite, that they are accelerating their efforts towards the next big thing in secret. There is an AI race now, and in a race it makes sense to use every trick to confuse and screw with your opponents, which is every other AI company and also those people that want slow progress or at least prefer it be done more thoughtfully. All the companies involve will lie if they think it can gain them an advantage, and they face no real penalties for doing so.
Re #7: They Took Our Jobs: I've looked at different studies and they call out various occupations that will be impacted and by how much. Of course, A) it's too early to have a lot of confidence in these, B) there will be positive as well as negative ones and C) as you say, some jobs will get better while others will be eliminated (like with globalization).
That said, I think it's pretty clear women will be impacted more than men simply because a higher percentage of working women are in white collar jobs (~70%) vs. blue collar ones (~30%) vs men, where the ratio is roughly 50/50. This was intuitive to me but i did run the following analysis on the Goldman Sachs study as a test.
Thanks as always, best content around. Driverless cars: is a component of the problem how to rapidly make *most* cars driverless, which will then presumably make being a driverless car easier? Or perhaps how to make all cars able to communicate useful data about speed, direction, etc. etc. to each other which can be used to overrule the humans? My own car, newish VW Tiguan, does a surprising amount of overruling me: lane change protection when I want to change lane, automatic braking if it thinks it’s going to hit something *even when it is not!*, etc.) Feels like something manufacturers are unlikely to do unless incentivised, perhaps by the promise of lots of driverless subscriptions / subsidy handouts. Maybe linked to electrification too.
Aaronson strikes me as someone who does what he feels like doing then finds post-hoc rationalisations for why what he feels like doing is ok. (I mean, more than average.) He also definitely wants to see general AI in his lifetime more than he wants to make sure it doesn't kill anyone.
When I read his blog I imagine him as a cross between Dennis Nedry and Bishop.
It should be noted that the FDA does not certify anything. They "clear" medical devices and they "approve" drugs, but they do not do any independent assessments of the products they review. Self-audit amounts to self-assurance, which is not safe or reliable. "Certifications" are statements of assurance validated and verified by an independent third-party audit. Presumably, in China self-audit doesn't mean quite the same thing it would mean in the West because the CCP is so deeply embedded in every company in the country, so in this context it may mean something more like "internal political review".