30 Comments
Apr 22·edited Apr 22

These posts are way too long for my taste. Surely lots of people would like your intelligent analysis, but don't have time to read posts like this.

Expand full comment
Apr 22·edited Apr 22

> Indeed, what may happen in the coming months is that, by releasing Llama-3 400B, Zuck instantly gives Russia, China, North Korea and everyone else exactly this ‘cutting edge technology’ with which to interfere

Yes, just like my $20/mo GPT-4 subscription allows me to elect whoever I want. Worth every penny.

Expand full comment

Zvi writes "Zuck says hate speech is not super adversarial in the sense that people are not getting better at being racist. . . . I think in this sense that is wrong, and they totally are in both senses?"

I'm in between. One the one hand, if racists are forced by filters to move into stuff other people don't recognize as racism, like having posts that are 88 words long or using the OK symbol or something, then I imagine that's less harmful than telling racist jokes that include the n-word?

On the other hand, there is sort of a Darwinian memetic evolution where you stamp out all the racial slurs and obvious threats, and people move into things that aren't stamped out. Should Zuck shut down an account that disproportionately shares videos of immigrants committing crimes and doesn't share videos of any other group committing crimes? I'm not sure.

On the gripping hand, I can see our AI overlords engaging in a lot of social engineering post-AGI, assuming we're still here. In fact, I have trouble seeing them [i]not[/i] doing it - once you have the power to direct public opinion and action, even not using that power is a decision with consequences.

Expand full comment

My take is that Zuck has seen so many big swings in Meta's history and has been so effective at responding to them rather than anticipating them that he has a default time frame of something like two years. To me, all his statements sound implicitly contingent, like if the situation changes then we'll change our strategy. So I'm not directly concerned about his playing down AGI/existential concerns, although if the issue is a multi-step process like Llama 3 leads to X leads to Y and we don't want Y, that slips through the cracks in his model.

I agree that Llama 3 will be an interesting test of use in the wild, and that it's good that we get the test sooner rather than later.

If energy is the bottleneck, then doesn't everyone in the stack double down on efforts to be more energy efficient? Nvidia's Blackwell GPU is supposed to be "up to" 25X more energy efficient than Hopper, so this is already happening fast. The real constraint will be a combination of energy and energy efficiency, and for better or worse that seems a lot more manageable than strictly having to permit and build a 1 GW+ generation facility.

I'm not an expert, but I bet that the next generation of LLMs will largely be incrementally better. I don't think you get agency, judgment, or fully superhuman cognition for free from a bigger model with more data. I think we'll need to find new training/data or architecture methods to make those happen. But who knows? If some version of self-play is made to work on LLMs, then all bets are off. And the bitter lesson may apply here as well.

Expand full comment

Are we assuming you need government approval to access a gigawatt of power because you would want to be connected to the grid?

Surely you can set up off-grid power to solve this problem. For a conventional datacenter you might not want this because the energy grid is more robust, but for a datacenter that only does model training? One that you could throttle up or down based on energy availability? Should be feasible.

I'm pretty sure you can install off-grid solar without government approval. What does it take to install a gigawatt of solar panels? Napkin math says only a couple of square miles.

Are other forms of DIY power available without government approval?

Expand full comment

7) but surely Zuck would have sold for some absurd amount. A sufficently large overvalue would let him rebuild a similar company and be able to do much more and face fewer limitations.

Expand full comment

Regarding the ability of government to regulate, I think that depends heavily on what exactly you mean.

If you mean regulating the ability of companies to train time sensitive models -- eg what video you'll like next or what ad you'll click on -- sure. But if you mean preventing AGI, I think that's a completely different issue.

After all, creating AGI isn't time sensitive in the same way so if you have to take more time to train your model so be it. And the primary obstacles are still conceptual not necessarily computational -- eg how do you handle some kind of self-representation so the system can learn how to update a set of beliefs to best match observations. It's not clear that you can't make those breakthroughs with less computing time and then just slowly train the final product.

Expand full comment

> Note that it looks like he got through by simply asking a second time.

It seems you missed the subtle difference in the two tweets: the one with "Jews" in it tripped the hate speech detector, once it was changed to "lizard people" with the rest of the tweet unchanged, the model was OK with categorizing it.

Expand full comment