> It seems like Tyler is thinking of greater intelligence in terms of ‘fitting together quantum mechanics and relativity’ and thus thinking it might cap out, rather than thinking about what that intelligence could do in various more practical areas.
Maybe the implicit premise is that intelligence is not the bottleneck in practical areas, but rather e.g. coordination problems? Of course, there's probably a sufficient level of intelligence that could figure out how to coordinate people/institutions.
>I believe that with proper attention to what actually causes success plus hard work small business can have higher private returns than a job for a great many people
All of Cowen's statements on AI seem to reflect an underlying belief that there is something inherently special about human intelligence that cannot be significantly improved upon. He also kind of dodges that question when put to him directly even when asked to assume that it is possible.
That does seem to be his position. Or, to be more charitable, that there is only so much intelligence can do, and we are already doing most of it (but are not 'special').
You are tremendously skilled at sharing your brain. I came here for the AI, stayed for the intellectual rigor. I really want to debate you about fertility reduction, population. Perhaps there will be an opportunity.
I am not tech, but you make the subject easy to digest. One question, please explain to me "model weights".
An LLM consists of a series of model weights in various layers, that determine how calculations are done to find the next token. If you have the weights to a model, you can run it locally, and you can fine-tune it, it is yours to use. But that doesn't mean you could recreate it from scratch, you don't know what data or algorithms were used, so open model weights is distinct from open source.
Okay, so weight in this sense means preferred tendency to indicate? More or less a programmed bias toward direction of the next token? If so, I can get a grip on that
PREFACE: I'm not a expert, would love to be corrected on any of my misconceptions. The below is roughly my conception from being a amateur follower of AI in the last year.
----
In a metaphoric sense, the weights are like a mathematically/numerically/encoded "soul/personality/instincts/raw capability/in-built memory" of the AI model (as Elizer puts it "inscrutable matrices of numbers". So when you "run" a model, you are passing the inputs (your user input prompts/system prompts/other context) to the system system (In general the computer(s)/infrastructure/operating system and other code) that the model is hosted in and then the input is processed into numbers/code in such a way that you can for all intents and purposes (IE something definitely much more complicated than this) "multiply it" by the weights , (this running process is "inference". Inference is obscenely cheaper than trainig the model, and is why OpenAi can run 100+million convos a week) and out the other end comes the output. For a LLM that's working, usually something like a text output that has something to do with your text input, for a image model like Stable Diffusion, it's text in -->inference --> image out etc.
The BIG training runs, like the one OPENAI is supposedly nowadays doing for GPT5, grind a potential unformed set of weights against a certain target problem (predicting the next segment of text) for (in GPT4's case) months and months on huge amount of computers, constantly refining the given set of weights until what is left at the end is a whittled set of weights, supposedly a quite manageable sized file of numbers around a reasonable size of gigabytes (NOTE: A potentially easily steal-able thing that fits on a usb, if you ever managed to access them directly. Check out the file sizes of the models/weights on hugging face for grounding on the size of less impressive models)
In the weights are a encoded distillation of everything it learned in it's training run (fine tuning is some more minor training that you apply to weights after you have a base working model.). So the weights contain the knowledge that New York is in America, it "know's" a lot of facts and has internalized a lot of skills (writing and formatting text, coding, translating, intuitions about decions to make when presented with a problem, *Insert Capabilities*) and some internal knowledge, the "better" a model (usually bigger is better, but there is some serious amount of secret sauce in the whole process to get to GPT4 level).
Also note that the weights having knowledge in them is true, but it ISN'T a database of literally everything it has read. Imagine something like if you had super speed and had read every book in the world(the "data" that the model is trained on), you would be really good at a lot of things, possibly a expert in many, you'd have a lot of takeaways but you are still not a robotic database. More frequent and common things would have stuck, but you would still just be a normal person and so wouldn't be able to word for word recount all of Wikipedia, but you might remember to some degree or other anything you thought was interesting or simply something that came up enough times.
This is why most models can tell you where new york is or know what The Simpsons are, but it will really break down on details the more you zoom in or go off the beaten track if what you are talking about wasn't hammered into it by the data in the training run. The weights are small gigabyte level distillations and so couldn't possibly contain the whole internet etc for perfect recall. This is why companies/people are building other parts of the overall system and have geared their models to reach out to the internet and only consume a small part of it related to your prompt, in a size that it can digest in it's "context length" (it's short term memory of the convo, varies between 1500ish words for smaller models to 150k+ words for Claude 2.1).
Also, another intuition pump, is that weights are much closer to being "grown" than "constructed". More like growing a useful novel bacteria in a petridish. (where you affect the environment it grows in at a macro level, and give it some pruning occasionally) than compared to "assembling" something out of lego/mechano to make a bigger structure.
Weights being a black box and the concerns about how we are getting capabilities faster than understanding them stems from the fact that we are like 15 year olds who have never had a biology class, knowing the mechanics of how to have kids and have some concept of "heritability" given their family features being passed down around them, but are miles away from understanding DNA or being able to account for anything like genetic diseases, managing instincts etc at a decent enough level.
We're selectively breeding Smart Fungus in computer test tubes to get them better at doing Clever Ape Things. Opinions on the wisdom of this is mixed.
I am surprised to read that Tyler thinks Argentina's cyclical hyperinflation is a mystery. Adam Tooze summarizes a thoroughly reported explanation of the political economy here: https://adamtooze.substack.com/p/chartbook-144-the-energy-shock-and . Short version: it is caught in a policy trap, heavy energy subsidies to the most powerful, which repeatedly snap under macroeconomic pressure then are restored, causing pressure to build until the next snap. (The argument could be wrong of course.) Perhaps the takeaway is that neither LLMs nor polymaths such as Tyler save us from the challenge of knowing whether we have thought everything through, a sort of research equivalent to the halting problem.
Thanks for your wonderful newsletters. I find them very rewarding.
Came here to write this. I would add the most simple explanation, the Argentines print too much money. They could have had a functional currency board like Hong Kong, but they CHOSE to break it. I am terrified that the US is on its way to joining Argentina and Brazil, and that is a one-way trip.
"very long, very detailed, and very good. Interesting throughout!" Tyler Cowen rightly about the post above :) "Zvi Mowshowitz covers my podcast with Dwarkesh Patel
by Tyler Cowen February 3, 2024 at 2:59 pm" Also: a great interview.
On VCs and ROI, as you probably know, Marc Andreessen thinks of whaling as a precursor to the VC funding mechanism. Out of curiosity I did a bit of looking around and found an article that looked at 11K whaling voyages between 1800 and 1899 and found: "During the nineteenth century, US government bonds, a risk-free asset, returned an average of 4.6%; whaling, a risky asset, returned a mean of 4.7%. This shows 0.1% as the risk premium for whaling over US government bonds."
I note that US government bonds, while they didn't default in our timeline, were hardly risk-free throughout the 19th century! Many countries do default on their bonds in times of hardship, and that could easily have happened to the USA in the war of 1812, or in the Civil War
But we did not. I am seriously thinking about knocking some Congressmen heads if the Republicans pull anymore default games. They could turn us into Argentina tomorrow.
> So in the end, if you combine his point that he would think very differently if international coordination were possible or others were rendered powerless, his need for a hegemon if we want to achieve safety, and clear preference for the United States (or one of its corporations?) to take that role if someone has to, and his statement that existential risk from AI is indeed one of the top things we should be thinking about, what do you get? What policies does this suggest? What plan? What ultimate world?
So your interpretation is that Cowen thinks Straussianism is instrumentally rational for public intellectuals who forecast AI capabilities? This didn't draw any attention in the comments here or on LW, which surprises me, so I think there's a good chance I've misunderstood.
Anyway, if my interpretation of your interpretation is correct, do you think it's because Cowen wants to maintain his personal ability to influence key decision-makers, public and/or privately?
Or would it be because he wants to justify a maximal US lead in AI capabilities over China, for a broad readership which has diffuse political influence on whether this will happen? If this were the case, maybe he thinks the most likely scenario for effective international AI safety cooperation is the US gaining full hegemony in global politics, so that it can punish defectors.
I believe that Cowen strongly believes in Straussian takes across the board, both that they are good to use and that they are common. And I also think that there are clear implications of Cowen's stated views, that if stated explicitly would be things people would not take kindly to, and that he is aware of this.
"I essentially read this as Tyler engaging in superintelligence denialism..." I interpreted Tyler's statement more literally. I thought he was saying that the universe is not legible to intelligence. Do you disagree?
> It seems like Tyler is thinking of greater intelligence in terms of ‘fitting together quantum mechanics and relativity’ and thus thinking it might cap out, rather than thinking about what that intelligence could do in various more practical areas.
Maybe the implicit premise is that intelligence is not the bottleneck in practical areas, but rather e.g. coordination problems? Of course, there's probably a sufficient level of intelligence that could figure out how to coordinate people/institutions.
>I believe that with proper attention to what actually causes success plus hard work small business can have higher private returns than a job for a great many people
Have you written more about this somewhere?
In bits and pieces but I've never offered a guide per se.
All of Cowen's statements on AI seem to reflect an underlying belief that there is something inherently special about human intelligence that cannot be significantly improved upon. He also kind of dodges that question when put to him directly even when asked to assume that it is possible.
That does seem to be his position. Or, to be more charitable, that there is only so much intelligence can do, and we are already doing most of it (but are not 'special').
You are tremendously skilled at sharing your brain. I came here for the AI, stayed for the intellectual rigor. I really want to debate you about fertility reduction, population. Perhaps there will be an opportunity.
I am not tech, but you make the subject easy to digest. One question, please explain to me "model weights".
An LLM consists of a series of model weights in various layers, that determine how calculations are done to find the next token. If you have the weights to a model, you can run it locally, and you can fine-tune it, it is yours to use. But that doesn't mean you could recreate it from scratch, you don't know what data or algorithms were used, so open model weights is distinct from open source.
Okay, so weight in this sense means preferred tendency to indicate? More or less a programmed bias toward direction of the next token? If so, I can get a grip on that
PREFACE: I'm not a expert, would love to be corrected on any of my misconceptions. The below is roughly my conception from being a amateur follower of AI in the last year.
----
In a metaphoric sense, the weights are like a mathematically/numerically/encoded "soul/personality/instincts/raw capability/in-built memory" of the AI model (as Elizer puts it "inscrutable matrices of numbers". So when you "run" a model, you are passing the inputs (your user input prompts/system prompts/other context) to the system system (In general the computer(s)/infrastructure/operating system and other code) that the model is hosted in and then the input is processed into numbers/code in such a way that you can for all intents and purposes (IE something definitely much more complicated than this) "multiply it" by the weights , (this running process is "inference". Inference is obscenely cheaper than trainig the model, and is why OpenAi can run 100+million convos a week) and out the other end comes the output. For a LLM that's working, usually something like a text output that has something to do with your text input, for a image model like Stable Diffusion, it's text in -->inference --> image out etc.
The BIG training runs, like the one OPENAI is supposedly nowadays doing for GPT5, grind a potential unformed set of weights against a certain target problem (predicting the next segment of text) for (in GPT4's case) months and months on huge amount of computers, constantly refining the given set of weights until what is left at the end is a whittled set of weights, supposedly a quite manageable sized file of numbers around a reasonable size of gigabytes (NOTE: A potentially easily steal-able thing that fits on a usb, if you ever managed to access them directly. Check out the file sizes of the models/weights on hugging face for grounding on the size of less impressive models)
In the weights are a encoded distillation of everything it learned in it's training run (fine tuning is some more minor training that you apply to weights after you have a base working model.). So the weights contain the knowledge that New York is in America, it "know's" a lot of facts and has internalized a lot of skills (writing and formatting text, coding, translating, intuitions about decions to make when presented with a problem, *Insert Capabilities*) and some internal knowledge, the "better" a model (usually bigger is better, but there is some serious amount of secret sauce in the whole process to get to GPT4 level).
Also note that the weights having knowledge in them is true, but it ISN'T a database of literally everything it has read. Imagine something like if you had super speed and had read every book in the world(the "data" that the model is trained on), you would be really good at a lot of things, possibly a expert in many, you'd have a lot of takeaways but you are still not a robotic database. More frequent and common things would have stuck, but you would still just be a normal person and so wouldn't be able to word for word recount all of Wikipedia, but you might remember to some degree or other anything you thought was interesting or simply something that came up enough times.
This is why most models can tell you where new york is or know what The Simpsons are, but it will really break down on details the more you zoom in or go off the beaten track if what you are talking about wasn't hammered into it by the data in the training run. The weights are small gigabyte level distillations and so couldn't possibly contain the whole internet etc for perfect recall. This is why companies/people are building other parts of the overall system and have geared their models to reach out to the internet and only consume a small part of it related to your prompt, in a size that it can digest in it's "context length" (it's short term memory of the convo, varies between 1500ish words for smaller models to 150k+ words for Claude 2.1).
Thanks for taking the time, I found that pretty useful, good expansion piece to the short form answer Zvi gave.
Hurts my brain a little, but I like to stretch it out when I can.
Also, another intuition pump, is that weights are much closer to being "grown" than "constructed". More like growing a useful novel bacteria in a petridish. (where you affect the environment it grows in at a macro level, and give it some pruning occasionally) than compared to "assembling" something out of lego/mechano to make a bigger structure.
Weights being a black box and the concerns about how we are getting capabilities faster than understanding them stems from the fact that we are like 15 year olds who have never had a biology class, knowing the mechanics of how to have kids and have some concept of "heritability" given their family features being passed down around them, but are miles away from understanding DNA or being able to account for anything like genetic diseases, managing instincts etc at a decent enough level.
We're selectively breeding Smart Fungus in computer test tubes to get them better at doing Clever Ape Things. Opinions on the wisdom of this is mixed.
I am surprised to read that Tyler thinks Argentina's cyclical hyperinflation is a mystery. Adam Tooze summarizes a thoroughly reported explanation of the political economy here: https://adamtooze.substack.com/p/chartbook-144-the-energy-shock-and . Short version: it is caught in a policy trap, heavy energy subsidies to the most powerful, which repeatedly snap under macroeconomic pressure then are restored, causing pressure to build until the next snap. (The argument could be wrong of course.) Perhaps the takeaway is that neither LLMs nor polymaths such as Tyler save us from the challenge of knowing whether we have thought everything through, a sort of research equivalent to the halting problem.
Thanks for your wonderful newsletters. I find them very rewarding.
Came here to write this. I would add the most simple explanation, the Argentines print too much money. They could have had a functional currency board like Hong Kong, but they CHOSE to break it. I am terrified that the US is on its way to joining Argentina and Brazil, and that is a one-way trip.
Re ancestors and descendants: the flip side of compound interest is exponential fading
"very long, very detailed, and very good. Interesting throughout!" Tyler Cowen rightly about the post above :) "Zvi Mowshowitz covers my podcast with Dwarkesh Patel
by Tyler Cowen February 3, 2024 at 2:59 pm" Also: a great interview.
On VCs and ROI, as you probably know, Marc Andreessen thinks of whaling as a precursor to the VC funding mechanism. Out of curiosity I did a bit of looking around and found an article that looked at 11K whaling voyages between 1800 and 1899 and found: "During the nineteenth century, US government bonds, a risk-free asset, returned an average of 4.6%; whaling, a risky asset, returned a mean of 4.7%. This shows 0.1% as the risk premium for whaling over US government bonds."
Here's the blog post where I report that: https://new-savanna.blogspot.com/search?q=whaling
I note that US government bonds, while they didn't default in our timeline, were hardly risk-free throughout the 19th century! Many countries do default on their bonds in times of hardship, and that could easily have happened to the USA in the war of 1812, or in the Civil War
But we did not. I am seriously thinking about knocking some Congressmen heads if the Republicans pull anymore default games. They could turn us into Argentina tomorrow.
Dwarkesh has the best in-depth podcast that I know of.
> So in the end, if you combine his point that he would think very differently if international coordination were possible or others were rendered powerless, his need for a hegemon if we want to achieve safety, and clear preference for the United States (or one of its corporations?) to take that role if someone has to, and his statement that existential risk from AI is indeed one of the top things we should be thinking about, what do you get? What policies does this suggest? What plan? What ultimate world?
So your interpretation is that Cowen thinks Straussianism is instrumentally rational for public intellectuals who forecast AI capabilities? This didn't draw any attention in the comments here or on LW, which surprises me, so I think there's a good chance I've misunderstood.
Anyway, if my interpretation of your interpretation is correct, do you think it's because Cowen wants to maintain his personal ability to influence key decision-makers, public and/or privately?
Or would it be because he wants to justify a maximal US lead in AI capabilities over China, for a broad readership which has diffuse political influence on whether this will happen? If this were the case, maybe he thinks the most likely scenario for effective international AI safety cooperation is the US gaining full hegemony in global politics, so that it can punish defectors.
I believe that Cowen strongly believes in Straussian takes across the board, both that they are good to use and that they are common. And I also think that there are clear implications of Cowen's stated views, that if stated explicitly would be things people would not take kindly to, and that he is aware of this.
"I essentially read this as Tyler engaging in superintelligence denialism..." I interpreted Tyler's statement more literally. I thought he was saying that the universe is not legible to intelligence. Do you disagree?