I haven't tried asking GPT to summarize these, or training GPT to identify the parts that are likely to interest me, but it seems like something that's squarely in the "LLMs provide mundane utility" case.
I appreciate that this is a real issue, and I encourage choosing sections and discarding others and try to make things readable for those who do that. That is the best I have been able to do.
It is plausible that someone could distill these well, but I don't have the time necessary to do it myself, and LLMs simply won't do the job yet.
It is a shame that substack does not have collapsible subheaders.
Yes these are too long, but I skim a few parts and read them on mobile, I find them valuable since I do not have to expend the effort of collating the info myself and I appreciate the analysis.
for the record i disagree with jeremy; i look forward to these long articles and read them over multiple days. please don't deliberately truncate your posts
> Indeed, what may happen in the coming months is that, by releasing Llama-3 400B, Zuck instantly gives Russia, China, North Korea and everyone else exactly this ‘cutting edge technology’ with which to interfere
Yes, just like my $20/mo GPT-4 subscription allows me to elect whoever I want. Worth every penny.
I'm not sure if you meant it this way, but that's a good rebuttal: a broad deployment that allows re-arrangement of all activities to suit LLMs would be enabled by having them be open source vs commercial, and that leads to different things being possible than a very limited rollout.
I do think that the stated concern of "Existential risk" is so far removed from the actual concrete risks identified of "misinformation" and "election interference" that it's hard to take it seriously. And again a mention of "bioweapons" without observing the fact that basic information of the sort LLMs can provide is already available and not catastrophic.
Yes, that's what I mean. The scale and sophistication of a government using something like Llama 3 for interference makes it a completely different concern than individuals screwing around.
IMO, the way to think about bioweapons or other flavors of existential risk is to focus on what happens in 5-10 years when the model is smarter and can not only manipulate the basic information but perhaps orchestrate a whole funding, synthesis, and distribution operation, especially when paired with a bad actor. If the models stay comparable to search engines (the basic information is already available, etc.), then we have nothing to worry about and existential risk just becomes the dog that didn't bark. But we don't know that they won't become much more capable, and that combined with their speed and scalability is worth worrying about ahead of time.
OK, maybe we aren't so far apart. I come to the opposite conclusions for two reasons
(1) Llama 3 vs existing open source -- My experience has been that GPT-4 is better than GPT-3.5... but so much better that it enables bad actors to do totally new things, instead of using Mixtral? No, I'm not seeing it. GPT-4 is a helpful tool that does not seem especially impactful for nefarious purposes vs mundane utility. The marginal nefarious utility of GPT-4 vs previous generations seems insignificant.
(2) 5-10 years -- So maybe we both agree that there is no actual safety concern with today's technology? I think it would be better to make the case for limiting access when there is a concrete harm which you can point to. Anyway, it seems extremely plausible to me that training LLMs on more data to do better token prediction might not lead to something which can "orchestrate a whole funding, synthesis, and distribution operation". Zuck seems to clearly be of that opinion. Considering his history and investment in the area, I'm a little surprised how easily his opinion is dismissed around here. I know that's appealing to authority, but c'mon. I would call his the default assumption (it being a tool, progress being incremental), and that's why it makes the news when it's able to do some basic math calculation.
> The scale and sophistication of a government using something like Llama 3 for interference makes it a completely different concern than individuals screwing around
I write software for a living, and I try to use GPT-4 as much as I can. Despite much talk on X to the contrary, writing software is still the same concern as individuals screwing around. I don't see why it would be so different in other places. It's a tool, one which so far is less impactful than the spreadsheet. I hope it keeps developing.
Yeah, I think there's definitely a case to be made that progress will be incremental. At the same time we have tens of billions of dollars and many of the smartest people on earth in a race to make AGI happen, and so far scaling laws have been pretty good predictors. So I consider it a totally open question. I think Zuck is partly of that opinion and partly just avoiding talking about what if scenarios because of his incentives.
Regarding interference, I mean specifically that a state actor putting 100 people on a "mess with the election" project can now be X times more effective using lots of AI interns. If or when we have reasonably good agents, does misinformation become free as in beer?
I think the money will be in applications and integrations that solve big problems for people and companies. Maybe that is actually the white pill -- the LLM is the novelty for now, but I don't think it's going to be the source of the productivity improvements. Releasing an open source LLM will not get you the phalanx of applications and integrations that will constitute most of its power. Probably there will be some open source ones which won't be very good and commercial ones which are much closer to a standalone employee (e.g. Devin). But it's easy to imagine that Devin is a typical software system that uses some number and variety of LLMs to deliver a service, and even if those LLMs are open source, Devin is not and could not be easily turned to do someone's bidding in secret.
Regarding interference, I just don't see what the big deal is. There is already endless garbage available online and cheap labor available for hire if they wanted more. The quantity of source material does not seem to be the bottleneck today, so it doesn't seem concerning if LLMs made it cheaper to generate more. Besides that, there are many more powerful sources of narrative control involved in the discourse, on all sides of the political spectrum.
Zvi writes "Zuck says hate speech is not super adversarial in the sense that people are not getting better at being racist. . . . I think in this sense that is wrong, and they totally are in both senses?"
I'm in between. One the one hand, if racists are forced by filters to move into stuff other people don't recognize as racism, like having posts that are 88 words long or using the OK symbol or something, then I imagine that's less harmful than telling racist jokes that include the n-word?
On the other hand, there is sort of a Darwinian memetic evolution where you stamp out all the racial slurs and obvious threats, and people move into things that aren't stamped out. Should Zuck shut down an account that disproportionately shares videos of immigrants committing crimes and doesn't share videos of any other group committing crimes? I'm not sure.
On the gripping hand, I can see our AI overlords engaging in a lot of social engineering post-AGI, assuming we're still here. In fact, I have trouble seeing them [i]not[/i] doing it - once you have the power to direct public opinion and action, even not using that power is a decision with consequences.
My take is that Zuck has seen so many big swings in Meta's history and has been so effective at responding to them rather than anticipating them that he has a default time frame of something like two years. To me, all his statements sound implicitly contingent, like if the situation changes then we'll change our strategy. So I'm not directly concerned about his playing down AGI/existential concerns, although if the issue is a multi-step process like Llama 3 leads to X leads to Y and we don't want Y, that slips through the cracks in his model.
I agree that Llama 3 will be an interesting test of use in the wild, and that it's good that we get the test sooner rather than later.
If energy is the bottleneck, then doesn't everyone in the stack double down on efforts to be more energy efficient? Nvidia's Blackwell GPU is supposed to be "up to" 25X more energy efficient than Hopper, so this is already happening fast. The real constraint will be a combination of energy and energy efficiency, and for better or worse that seems a lot more manageable than strictly having to permit and build a 1 GW+ generation facility.
I'm not an expert, but I bet that the next generation of LLMs will largely be incrementally better. I don't think you get agency, judgment, or fully superhuman cognition for free from a bigger model with more data. I think we'll need to find new training/data or architecture methods to make those happen. But who knows? If some version of self-play is made to work on LLMs, then all bets are off. And the bitter lesson may apply here as well.
It is interesting to speculate on how much data humans are trained on. Since arguably our brains are a foundation model, those foundation models have benefitted from millennia of highly adversial training runs which allowed them to develop hyper-efficient compression and attention algorithms (i.e. how much of the visual field is actually processed and in response to which degree of variation from the prediction).
It is entirely possible that we have 1) multi-quadrillion parameter models, 2) trained on the equivalent of a couple zottabytes of raw data, with 3) vastly more optimised scaffolding.
In this example, the bootstrapping is indeed the risk as it was the "scaffolding" (attention and compression "algorithms") that enabled the training to happen at that scale.
Are we assuming you need government approval to access a gigawatt of power because you would want to be connected to the grid?
Surely you can set up off-grid power to solve this problem. For a conventional datacenter you might not want this because the energy grid is more robust, but for a datacenter that only does model training? One that you could throttle up or down based on energy availability? Should be feasible.
I'm pretty sure you can install off-grid solar without government approval. What does it take to install a gigawatt of solar panels? Napkin math says only a couple of square miles.
Are other forms of DIY power available without government approval?
> I'm pretty sure you can install off-grid solar without government approval
You can't even install an AC in your own single family home without government approval, unfortunately. You might need a bit less paperwork for a disconnected solar farm but the amount of paperwork required is still far from zero. And you'll still face plenty of veto points.
You can't build anything without government approval, not even a single solar panel. At least in the US. An off grid power plant on private property would still need building permits, electrical permits, environmental review. And then after you get approved you can probably expect a few lawsuits challenging the approval. A gigawatt power plant is going to be a multi year approval process.
Also, since energy is the bottleneck in terms of cost in the long run, you really do want to be connected to the grid so you can make use of off peak electrical hours, or feed electricity back into the grid if you don't need all of it. Efficiency is key at that scale.
7) but surely Zuck would have sold for some absurd amount. A sufficently large overvalue would let him rebuild a similar company and be able to do much more and face fewer limitations.
Regarding the ability of government to regulate, I think that depends heavily on what exactly you mean.
If you mean regulating the ability of companies to train time sensitive models -- eg what video you'll like next or what ad you'll click on -- sure. But if you mean preventing AGI, I think that's a completely different issue.
After all, creating AGI isn't time sensitive in the same way so if you have to take more time to train your model so be it. And the primary obstacles are still conceptual not necessarily computational -- eg how do you handle some kind of self-representation so the system can learn how to update a set of beliefs to best match observations. It's not clear that you can't make those breakthroughs with less computing time and then just slowly train the final product.
> Note that it looks like he got through by simply asking a second time.
It seems you missed the subtle difference in the two tweets: the one with "Jews" in it tripped the hate speech detector, once it was changed to "lizard people" with the rest of the tweet unchanged, the model was OK with categorizing it.
These posts are way too long for my taste. Surely lots of people would like your intelligent analysis, but don't have time to read posts like this.
I think the established position is that you either have to be good at skimming or have a lot of time, or both. I struggle with this as well.
I haven't tried asking GPT to summarize these, or training GPT to identify the parts that are likely to interest me, but it seems like something that's squarely in the "LLMs provide mundane utility" case.
No one has been able to make it work yet, despite several attempts - every LLM-generated summary I have seen for one of my posts has been quite poor.
Have you written up what a good versus bad summary of your posts looks like? Maybe there should be a Zvi summarization benchmark.
It would be the extraction of the central points, I presume? But it is not about a formulaic benchmark.
I appreciate that this is a real issue, and I encourage choosing sections and discarding others and try to make things readable for those who do that. That is the best I have been able to do.
It is plausible that someone could distill these well, but I don't have the time necessary to do it myself, and LLMs simply won't do the job yet.
Thanks for taking the time to reply to my post. I appreciate you putting these out, even if I really have to skim them.
Bolded section titles in your weekly posts are good, maybe in podcast summaries something similar could make sense?
It is a shame that substack does not have collapsible subheaders.
Yes these are too long, but I skim a few parts and read them on mobile, I find them valuable since I do not have to expend the effort of collating the info myself and I appreciate the analysis.
for the record i disagree with jeremy; i look forward to these long articles and read them over multiple days. please don't deliberately truncate your posts
If I had more time, I would have written a shorter letter.
A thousand times this, every day.
> Indeed, what may happen in the coming months is that, by releasing Llama-3 400B, Zuck instantly gives Russia, China, North Korea and everyone else exactly this ‘cutting edge technology’ with which to interfere
Yes, just like my $20/mo GPT-4 subscription allows me to elect whoever I want. Worth every penny.
And just like my car allows me to drive everyone in the country where they want to go.
I'm not sure if you meant it this way, but that's a good rebuttal: a broad deployment that allows re-arrangement of all activities to suit LLMs would be enabled by having them be open source vs commercial, and that leads to different things being possible than a very limited rollout.
I do think that the stated concern of "Existential risk" is so far removed from the actual concrete risks identified of "misinformation" and "election interference" that it's hard to take it seriously. And again a mention of "bioweapons" without observing the fact that basic information of the sort LLMs can provide is already available and not catastrophic.
Yes, that's what I mean. The scale and sophistication of a government using something like Llama 3 for interference makes it a completely different concern than individuals screwing around.
IMO, the way to think about bioweapons or other flavors of existential risk is to focus on what happens in 5-10 years when the model is smarter and can not only manipulate the basic information but perhaps orchestrate a whole funding, synthesis, and distribution operation, especially when paired with a bad actor. If the models stay comparable to search engines (the basic information is already available, etc.), then we have nothing to worry about and existential risk just becomes the dog that didn't bark. But we don't know that they won't become much more capable, and that combined with their speed and scalability is worth worrying about ahead of time.
OK, maybe we aren't so far apart. I come to the opposite conclusions for two reasons
(1) Llama 3 vs existing open source -- My experience has been that GPT-4 is better than GPT-3.5... but so much better that it enables bad actors to do totally new things, instead of using Mixtral? No, I'm not seeing it. GPT-4 is a helpful tool that does not seem especially impactful for nefarious purposes vs mundane utility. The marginal nefarious utility of GPT-4 vs previous generations seems insignificant.
(2) 5-10 years -- So maybe we both agree that there is no actual safety concern with today's technology? I think it would be better to make the case for limiting access when there is a concrete harm which you can point to. Anyway, it seems extremely plausible to me that training LLMs on more data to do better token prediction might not lead to something which can "orchestrate a whole funding, synthesis, and distribution operation". Zuck seems to clearly be of that opinion. Considering his history and investment in the area, I'm a little surprised how easily his opinion is dismissed around here. I know that's appealing to authority, but c'mon. I would call his the default assumption (it being a tool, progress being incremental), and that's why it makes the news when it's able to do some basic math calculation.
> The scale and sophistication of a government using something like Llama 3 for interference makes it a completely different concern than individuals screwing around
I write software for a living, and I try to use GPT-4 as much as I can. Despite much talk on X to the contrary, writing software is still the same concern as individuals screwing around. I don't see why it would be so different in other places. It's a tool, one which so far is less impactful than the spreadsheet. I hope it keeps developing.
Yeah, I think there's definitely a case to be made that progress will be incremental. At the same time we have tens of billions of dollars and many of the smartest people on earth in a race to make AGI happen, and so far scaling laws have been pretty good predictors. So I consider it a totally open question. I think Zuck is partly of that opinion and partly just avoiding talking about what if scenarios because of his incentives.
Regarding interference, I mean specifically that a state actor putting 100 people on a "mess with the election" project can now be X times more effective using lots of AI interns. If or when we have reasonably good agents, does misinformation become free as in beer?
I think the money will be in applications and integrations that solve big problems for people and companies. Maybe that is actually the white pill -- the LLM is the novelty for now, but I don't think it's going to be the source of the productivity improvements. Releasing an open source LLM will not get you the phalanx of applications and integrations that will constitute most of its power. Probably there will be some open source ones which won't be very good and commercial ones which are much closer to a standalone employee (e.g. Devin). But it's easy to imagine that Devin is a typical software system that uses some number and variety of LLMs to deliver a service, and even if those LLMs are open source, Devin is not and could not be easily turned to do someone's bidding in secret.
Regarding interference, I just don't see what the big deal is. There is already endless garbage available online and cheap labor available for hire if they wanted more. The quantity of source material does not seem to be the bottleneck today, so it doesn't seem concerning if LLMs made it cheaper to generate more. Besides that, there are many more powerful sources of narrative control involved in the discourse, on all sides of the political spectrum.
Zvi writes "Zuck says hate speech is not super adversarial in the sense that people are not getting better at being racist. . . . I think in this sense that is wrong, and they totally are in both senses?"
I'm in between. One the one hand, if racists are forced by filters to move into stuff other people don't recognize as racism, like having posts that are 88 words long or using the OK symbol or something, then I imagine that's less harmful than telling racist jokes that include the n-word?
On the other hand, there is sort of a Darwinian memetic evolution where you stamp out all the racial slurs and obvious threats, and people move into things that aren't stamped out. Should Zuck shut down an account that disproportionately shares videos of immigrants committing crimes and doesn't share videos of any other group committing crimes? I'm not sure.
On the gripping hand, I can see our AI overlords engaging in a lot of social engineering post-AGI, assuming we're still here. In fact, I have trouble seeing them [i]not[/i] doing it - once you have the power to direct public opinion and action, even not using that power is a decision with consequences.
My take is that Zuck has seen so many big swings in Meta's history and has been so effective at responding to them rather than anticipating them that he has a default time frame of something like two years. To me, all his statements sound implicitly contingent, like if the situation changes then we'll change our strategy. So I'm not directly concerned about his playing down AGI/existential concerns, although if the issue is a multi-step process like Llama 3 leads to X leads to Y and we don't want Y, that slips through the cracks in his model.
I agree that Llama 3 will be an interesting test of use in the wild, and that it's good that we get the test sooner rather than later.
If energy is the bottleneck, then doesn't everyone in the stack double down on efforts to be more energy efficient? Nvidia's Blackwell GPU is supposed to be "up to" 25X more energy efficient than Hopper, so this is already happening fast. The real constraint will be a combination of energy and energy efficiency, and for better or worse that seems a lot more manageable than strictly having to permit and build a 1 GW+ generation facility.
I'm not an expert, but I bet that the next generation of LLMs will largely be incrementally better. I don't think you get agency, judgment, or fully superhuman cognition for free from a bigger model with more data. I think we'll need to find new training/data or architecture methods to make those happen. But who knows? If some version of self-play is made to work on LLMs, then all bets are off. And the bitter lesson may apply here as well.
It is interesting to speculate on how much data humans are trained on. Since arguably our brains are a foundation model, those foundation models have benefitted from millennia of highly adversial training runs which allowed them to develop hyper-efficient compression and attention algorithms (i.e. how much of the visual field is actually processed and in response to which degree of variation from the prediction).
It is entirely possible that we have 1) multi-quadrillion parameter models, 2) trained on the equivalent of a couple zottabytes of raw data, with 3) vastly more optimised scaffolding.
In this example, the bootstrapping is indeed the risk as it was the "scaffolding" (attention and compression "algorithms") that enabled the training to happen at that scale.
Are we assuming you need government approval to access a gigawatt of power because you would want to be connected to the grid?
Surely you can set up off-grid power to solve this problem. For a conventional datacenter you might not want this because the energy grid is more robust, but for a datacenter that only does model training? One that you could throttle up or down based on energy availability? Should be feasible.
I'm pretty sure you can install off-grid solar without government approval. What does it take to install a gigawatt of solar panels? Napkin math says only a couple of square miles.
Are other forms of DIY power available without government approval?
> I'm pretty sure you can install off-grid solar without government approval
You can't even install an AC in your own single family home without government approval, unfortunately. You might need a bit less paperwork for a disconnected solar farm but the amount of paperwork required is still far from zero. And you'll still face plenty of veto points.
You can't build anything without government approval, not even a single solar panel. At least in the US. An off grid power plant on private property would still need building permits, electrical permits, environmental review. And then after you get approved you can probably expect a few lawsuits challenging the approval. A gigawatt power plant is going to be a multi year approval process.
Also, since energy is the bottleneck in terms of cost in the long run, you really do want to be connected to the grid so you can make use of off peak electrical hours, or feed electricity back into the grid if you don't need all of it. Efficiency is key at that scale.
7) but surely Zuck would have sold for some absurd amount. A sufficently large overvalue would let him rebuild a similar company and be able to do much more and face fewer limitations.
Regarding the ability of government to regulate, I think that depends heavily on what exactly you mean.
If you mean regulating the ability of companies to train time sensitive models -- eg what video you'll like next or what ad you'll click on -- sure. But if you mean preventing AGI, I think that's a completely different issue.
After all, creating AGI isn't time sensitive in the same way so if you have to take more time to train your model so be it. And the primary obstacles are still conceptual not necessarily computational -- eg how do you handle some kind of self-representation so the system can learn how to update a set of beliefs to best match observations. It's not clear that you can't make those breakthroughs with less computing time and then just slowly train the final product.
> Note that it looks like he got through by simply asking a second time.
It seems you missed the subtle difference in the two tweets: the one with "Jews" in it tripped the hate speech detector, once it was changed to "lizard people" with the rest of the tweet unchanged, the model was OK with categorizing it.