Zvi's Thoughts on the Survival and Flourishing Fund (SFF)
I recently served as a recommender deciding how to distribute a bunch of money to charities. This is my report. Read it if and only if it is relevant to your interests!
Epistemic Status: The views expressed are mine alone. They are various degrees of strong and weak, and are various degrees of weakly or strongly held. Due to time constraints, I’m not confident I was careful to mark every claim here with the proper ‘I think that’ style caveats. If something seems to you like it has a huge ‘citation needed’ sign on it, I’m probably not claiming to have proven anything.
I was one of the recommenders for the most recent round of grants for the Survival and Flourishing Fund. In accordance with our recommendations, $9.6 million dollars was allocated for distribution to various charities. The process involved four hours-long meetings where we discussed various questions, several additional discussions with other recommenders individually, many hours spent reading applications, doing research and thinking about what recommendations to make, and a number of meetings with various applicants.
This felt simultaneously like quite a lot of time, and like far less time, and the utilization of far fewer resources than the scope justified. $9.6 million dollars is a lot of money, and the share of that directed by my recommendations was also quite a lot of money.
Getting both the process right and the answers right are rather big deals.
The Survival and Flourishing Fund is related to the Effective Altruism (EA) movement, in that some of the people involved in SFF have also been involved in various EA activities, like EA Global, and SFF is trying to answer a question (how to do good with money) that is historically core to EA thinking and discourse.
It is also related to the EA movement in that, despite no official relationship between SFF and EA, despite the person who runs SFF not considering himself an Effective Altruist (Although he definitely believes, as I do, in being effective when being an altruist, and also in being effective when not being an altruist), despite SFF not being an EA organization, despite the words ‘altruist’ or ‘effective’ not appearing on the webpage, at least this round of the SFF process and its funds were largely captured by the EA ecosystem. EA reputations, relationships and framings had a large influence on the decisions made. A majority of the money given away was given to organizations with explicit EA branding in their application titles (I am including Lightcone@CFAR in this category).
Before going further, it is very important to make two things clear.
First: I am not an Effective Altruist.
And second, to reiterate: SFF is not in any way formally related to or part of EA.
If you are not familiar with EA you’ll want a basic familiarity before continuing (or deciding not to continue, if this post is not relevant to your interests). Without at least some sort of an introduction to and basic familiarity with EA, a lot of this isn’t going to make a lot of sense. The Wikipedia article on it gives a reasonable first explanation. If you want the EA perspective on themselves, and want to read their pitch knowing it is a pitch, this is their essential pitch.
I know many EAs and consider many of them friends, but I do not centrally view the world in EA terms, or share the EA moral or ethical frameworks. I don’t use what seem to for all practical purposes be their decision theories. I have very large, very deep, very central disagreements with EA and its core components and central organizations and modes of operation. I have deep worries that important things are deeply, deeply wrong, especially epistemically, and results in an increasingly Goodharted and inherently political and insider-biased system. I worry that this does intense psychological, epistemic and life experiential damage to many EAs.
Some of that I’ll gesture at or somewhat discuss here, and some of it I won’t. I’m not trying to justify all of my concerns here, I’m trying to share thoughts. If and when I have time in the future, I hope to write something shorter that is better justified.
I also want to make something else clear, for all my disagreements with and worries about it: These criticisms of Effective Altruism are comparing it to what it can and should be, and what it needs to be to accomplish its nigh-impossible tasks, rather than comparing it to popular alternatives.
If you read my Moral Mazes sequence, you’ll see how perversely I view most of what many people do most days. I critique here in such detail because, despite all our disagreements and my worries, I love and I care.
This post is an attempt to do a few different things.
Share my experience of the S-process we used to allocate funds.
Think about how the S-process works, what places it creates weird incentives or results, and how the S-process could be improved.
Encourage worthy applicants for future rounds, since it was clear that organizations generally underapplied.
Share some of my thoughts on various organizations that applied, in the hopes this information can be useful to others.
Share some of my heuristics and models of the Effective Altruist space in general and how it distributes funds, and how that become a large part of what happened at SFF despite SFF not having any formal or intended relationship to EA.
Anything else that seems salient, keeping in mind that a lot of this stuff goes infinitely deep and requires going over actual everything.
How the S-Process Works
The S-process works like this, skipping over some minor stuff:
Jaan Tallinn (with some input from other funders, or a decision to delegate to Jaan) chooses candidates he thinks would be good recommenders, and asks them if they’d like to participate. He then uses several heuristics plus a source of randomness to select the final group. Heuristics include having a variety of perspectives, and having a mix of repeat and new participants.
The candidates look over the list of applications to get a sense of the landscape, and to check for and declare any conflicts of interest.
Meeting one. We meet, we vote on all the declared conflicts of interest (and do this again if something new comes up), go over the schedule and how the process works, and who is interested in investigating various organizations. If you have a conflict of interest you can answer questions but otherwise stay out of discussions on the group in question, and you can’t fund them (which matters, but less than you’d think it would, because funding is based on who wants to fund you most, not on a vote, so others can and often do pick up the slack).
We also go over the goal, which is survival and flourishing over the long term, as the funders (mostly Jaan, the largest funder) would understand such things. This is of course open to interpretation. Jaan made himself available in case we had any questions.
Everyone goes over the list of organizations, does an initial pass including concrete evaluations of how much good it would do to give them various amounts of money.
Some applications come in late. Recommenders are free to ignore late applications, or to treat them normally, or anything in between.
For each organization, we have three knobs to turn: Value of their first dollar, how many dollars would be useful at all, and what the concavity of the curve between them should be. This is elegant and forces concreteness, but has issues I’ll discuss later (an update to the app is planned to allow inputting an arbitrary monotonically decreasing function). This included a virtual organization called “hold” which represented not giving out the money in this round, instead advising holding onto the funds for future rounds.
Various recommenders will do various forms of further investigation, including meetings with applicants, as seems worthwhile.
From this point on, we all continuously revise our funding decisions until they are locked late in meeting three. After a while, this focuses on decisions that would plausibly change allocations - I think we all did what we could to make our ‘background’ evaluations robust and fix clear mistakes, but didn’t think carefully about things unlikely to matter.
Meeting two we vote on what seem like the highest value discussions, and use our time as best we can. The software helps highlight disagreements to suggest discussions.
Meeting 2.5 is an extra optional similar meeting, more discussions.
Meeting three is another set of similar discussions, except that at the end our decisions are locked, so it was focused on things that would change decisions within that time frame.
Funding worked by a formula that was effectively: Each of our virtual representations took automated turns allocating $1K to whatever we thought had the most marginal value if funded, until we were finished allocating funds. You are funded based on who most is excited to fund you, not based on a consensus on what to fund. I’ll discuss the implications more later.
The funders decide which recommenders to give how much money to, so you do need to ensure your allocations are robust to changes in who gets how much money. This round, there was a factor of ~2x difference in funding between the recommender who ended up directing the most funding and the recommender who ended up directing the least, if you exclude money returned to the funders for use in future rounds (3x if you include money returned to hold).
Or, from an outside view that excludes internal steps:
Jaan chooses a group of recommenders.
Organizations apply for funding.
Recommenders evaluate applicants.
Recommenders generate a payoff function for funding each of them based on (value of first dollar, number of net useful dollars, concavity of the resulting curve connecting those points).
Funders adjust how much and when funding should flow through each recommender..
System allocates funds by (virtually) giving funds to each recommender, and then we take turns allocating $1K to our top choice until all the money has been allocated.
Money is donated.
Hopefully good things.
This was intense. There were millions of dollars being allocated, and both my decisions on funding and the arguments I made in discussions made a big difference to how the money got allocated.
The Recommenders
Jaan chose a very strong group of recommenders given the task at hand. Everyone took the job seriously, everyone was doing their best to be cooperative rather than strategic, and everyone helped everyone else think through the issues, gather information, share models and considerations, and reach our own opinions based on our individual epistemic perspectives, world models and values, as reflected in the desire to improve long term survival and flourishing.
Whether or not they would consider themselves EAs as such, the other recommenders effectively thought largely Effective Altruist frameworks, and seemed broadly supportive of EA organizations and the EA ecosystem as a way to do good. One other member shared many of my broad (and often specific) concerns to a large extent, mostly the others did not. While the others were curious and willing to listen, there was some combination of insufficient bandwidth and insufficient communicative skill on our part, which meant that while we did get some messages of this type across on the margin and this did change people’s decisions in impactful ways, I think we mostly failed to get our central points across more broadly.
To the extent one thinks any or all of that is wrongheaded or broken, one would take issue with the process and its decisions, especially the resulting grants which ended up giving the majority of the funds distributed to explicitly EA-branded organizations.
From many of the valid alternative perspectives that do think such things about EA as it exists in practice, being unusually virtuous in executing the framework here doesn’t make the goings on much less horrifying. I get that.
But given the assumptions of the core EA framework to the extent it was relied upon, I have nothing but praise for all the recommenders and their efforts. That’s important.
Three of us decided to make our names public: Myself, Oliver Habryka and Beth Barnes.
The others have chosen to remain anonymous. I’m sad that the others made that choice, but of course it must be respected.
Incentives of the S-Process for Applicants
The S-process has several interesting impacts on someone considering applying for funds. There’s some advantage to such a process having ‘security through obscurity’ where no one knows what the strategic moves are and thus plays more honestly, but in general this is an error and it is better to be transparent. Most importantly, outsiders not knowing how it works will cause insiders to be favored over outsiders, which is already a failure mode I am worried about.
Here are the things an organization should do if they want funding. These are not the ten things that I would prefer organizations do, or that anyone else would want, rather they are the ten things that will cause them to be more funded more often under similar conditions:
Apply at all.
Ask for lots of money.
Have a credible case that they could usefully spend a ton of money.
Be legible and have a legible impact story. Fit into existing cause areas and stories of impact.
Establish legible threshold levels of competency and execution
Establish threshold levels of anticipated and current cause alignment.
Associate with insiders without generating clear conflicts of interest.
Avoid salient potential harms.
Avoid negative associations that make people uncomfortable.
Don’t be caught lying.
This is due to a mix of the implementation details of the s-process and how people interact in practice with those details, especially anchoring effects, and the ways in which EA insiders in this type of situation currently decide how to allocate funding to projects.
A simple model of the default s-process behavior, with respect to an application whose paperwork is in order, is to ask a series of questions that is something like this.
What cause area does this fit into? Do I care?
What exactly are you claiming to do? Do I care? Would that do anything?
Are you credibly going to attempt to do this thing? Can we trust you?
Can you execute? Do you have a track record? Do we know you?
Could doing this backfire? Should I worry about that?
Does anything about this feel uncomfortable? What vibe do you give off? OK to associate and fund?
How much money are you asking for? How much capacity do you have?
Are there game theory or political considerations in how much to give you?
The most important driver of this is that there is more demand to allocate funds than there is supply of high-quality legible known places to put those funds. Too much money is chasing too few known good places to put it. This round of the SFF had enough money, and arguably it had TMM, or Too Much Money.
If you had given any one recommender the full allocation of money, all of us would have given at least a substantial chunk of that money back to the funders rather than spend it, because there weren’t enough worthwhile places to put the money. By combining the places any one person thought was worthwhile, we almost managed to spend it all.
That is why the questions above look like negative selection, a series of filters to pass through, rather than positive selection. To get funded, including to get funded for quite a large amount, it helped to convince at least one person to be super excited by your project, but it wasn’t necessary. All you had to do to get funded was convince someone that your option was better than doing nothing, and that was mostly good enough. It wouldn’t get you quite as much as someone who was super excited, but depending on how much you could claim to have a lot of capacity, that could easily matter far more.
It is an interesting question, among the organizations worth funding, which organizations are more impactful and deserving of funds. For a small individual donor, that is the most important question, you want to find the best thing to do and do that.
But for the purposes of something like the S-process, under current conditions, it mostly does not matter much. If you are good enough, there’s no reason other than game theory not to fund you up to your level of capacity to spend money. If you’re not good enough, there’s no reason to fund you regardless of who else is applying. Yes, the more excited we are, the more money we’ll find a way to give you, but the test is mostly pass or fail.
In order to get to that threshold, the correct strategy seems to me to be that you should follow the above principles. You are doing something where you’ve proven you can be trusted to do the thing you’re saying you will do, and that you can execute on that, and that you have capacity to spend this kind of money, where that something you are doing can be quantified as being clearly good and also clearly not bad. The something you’re up to can be pretty vague, depending on trust in the group, ability, capacity and cause area in question.
Going over the individual points:
Apply at all.
This one’s a slam dunk. The amount of money per organization, even per organization excluding those who got grants in the past, is very high. The cost to apply is very low. At a minimum, you should make an honest low-effort application, because people want to find things to fund and will find ways to overlook such things. I know a group that applied despite knowing they almost certainly wouldn’t get funded, and they didn’t, and it was still a very good expected value play to apply.
Ask for lots of money.
You can overdo this, but it’s hard to do so unless you fail at number three. Asking for less does make it more likely you’ll get your full ask and more likely you’ll get anything at all, of course it does, but when people are deciding how much to give you, they have to set a maximum.
In addition to the standard anchoring effects, by default utility of money is assumed by s-process to be a linear decline from the first dollar to last dollar. Thus, if you can get me to say you could spend $1 million, then that means that spending $500k will still have you half as efficient as the first dollar spent. Whereas if you ask for $100k, but have no ability to spend more, then it’s hard to even give you the $100k because the system will assume utility of money to be declining so fast. The ability to spend the extra money makes it much easier for the process to naturally give you a lesser amount. There are ways to try and adjust for it, but they clearly didn’t come close to fully adjusting.
Let’s say you as a recommender in the S-process think that some organization, GoodCause, can use $200k, but at that point it has no ambitions beyond that. You have three knobs to turn: First dollar value, last useful dollar, and concavity. If you enter the last useful dollar as $200k, you can use concavity to almost give them $200k, but you’re going to give them less. If they make up some story about what they’d do with more money, even if it’s ‘we will not need to raise money next year’ or something, then you can enter $400k as the last dollar, and have a good shot of getting them at least $200k, but unless you’re hacking the numbers to get to exactly $200k you’ll likely give them a bunch extra.
I do get why the process works that way, to force it to make logical sense and avoid very weird curves that don’t reflect reality all that well, but it has some weird effects that have predictable distortions.
Have a credible case that they could usefully spend the money you ask for.
You need that credible case in order to get people to be willing to put in high last-dollar values with straight faces. The ‘real’ curve often has an additional inflection point (or two) in it. There’s the first $X, which you actually have good uses for, then there’s the next $Y, which you could spend if you wanted to but it’s expendable so there really is a phase shift at some point, although given the uncertainty of other funding sources a strict discontinuity is suspicious. But the lack of one is also suspicious, especially given the many instincts running around.
Note that if you have a case that you can spend money, and you get the money, that puts a lot of pressure on you to actually spend it rather than bank it for the future. If you don’t, then people will stop giving you money and fall out of that habit too, and you’ll have less money, whereas if you scaled higher you could instead raise more money.
This is commonly a problem within corporations and government agencies as well, where it is very clearly destructive. Departments are careful to always spend exactly their budget.
We tried to emphasize to a few places ‘here is some money but please don’t feel obligated to spend it any time soon,’ which hopefully will have some effect, but this is an ongoing problem.
The history of getting funding is used to get more funding, and the history of raising and then spending money generates more money. Thus there are strong incentives for organizations to go bigger and expand beyond their needs or ability to keep quality high, taking on additional tasks as needed, and few incentives to stay small or to disband once the initial mission is complete.
Be legible and have a legible impact story. Fit into existing cause areas and stories of impact.
When an organization had a legible standard-form way to claim they had impact - e.g. ‘we will write a bunch of AI policy papers in which we point out how safe actions are good, unsafe actions are bad, but unsafe actions are cheaper, perhaps saying the words Prisoner’s Dilemma a lot’ - there was a lot of ‘oh all right that’s something that does something, I guess’ or similar. Generic claims to gain ‘influence’ or to be a place to regrant, or to raise additional funds, had similar effects.
The influence case is interesting because it illustrates that the division is unnatural and you can get strange classification decisions. It’s actually not clear at all that this cashes out in anything, yet it has now seemingly been classified as legible. There’s a kind of conventional wisdom among EAs as to what set of actions ‘counts’ as legible positive action, and which ones don’t, and it’s your call how coincidental it is that it largely correlates to increasing the influence and power of EA, and/or to giving money to things EA does but outsiders don’t do.
Establish legible threshold levels of competency and execution.
If people are worried that you’re not competent in general, or in particular that you can’t execute on your plans, then they won’t want to fund you.
Since this means a lot of people who don’t have infinite time and largely know each other are trying to figure out who is competent and can execute, it becomes important to create general impressions that you can be competent and execute. A vague sentence claiming concerns about this can sink you quite a bit. Thus, this type of thinking causes a lot of risk aversion, and a desire to find ‘normal’ concrete wins, whereas (as Eliezer Yudkowsky says in his recent dialogues) the things most worth doing are mostly things that are doomed to probably fail, but have enough payoff that they’re worth trying anyway.
6. Establish threshold levels of anticipated and current cause alignment.
If people think you won’t know to stick to officially approved good causes and avoid related but alas bad versions of such causes, or worry you’ll pivot into unrelated things, that’s also a reason not to fund you.
These two problems are also central in academia, getting and keeping a position and getting funding for your work. This isn’t surprising since both areas are working on a system where people give you money and you’re expected to do a thing, then you ask for money to do another thing, and mostly it’s a Boolean where you either get it or you don’t. If you do a bunch of small but concrete and legible things, people go ‘oh that person or group can execute, and has evidence of impact’ or other similar things, and you get funded. If you do a few moonshot projects, and can’t point to ‘evidence of impact’ or anything concrete that came out of it, then people start to doubt you and think there’s something wrong with you. Or there’s some sort of bad vibe around you, and that sinks you. And people see all this coming after a while, and adjust accordingly.
7. Associate with insiders without generating clear conflicts of interest.
There are three key dynamics promoting association with insiders.
The first is informational for the applicant. If you associate with insiders, you’ll know to apply for things like SFF, and you’ll know more about what it takes to improve your chances. To some extent this is unavoidable, but I also think it is on us to reach out to those who don’t know about such things, and let them know. We can say things like ‘that should be their responsibility’ all we like, but that doesn’t accomplish anything beyond turning us into a kind of Venture Altruist that is giving out money to those who demonstrate skill at seeking money, a pattern I worry about that generalizes quite far.
The second reason is informational for the recommenders. If some of the recommenders know you personally, or know people they trust who know you personally, that goes a long way, whereas if we don’t have that then it would take a bunch of time. Time was one thing none of us had in abundance, there was no way to do a remotely complete look at all the reasonable organizations given our schedules. It would have required being full time on the project. Maybe that’s what should have happened, but it’s a much more expensive proposition in several senses.
In the meantime, ‘cached thoughts’ and impressions about various organizations and people were largely used because they were quick and we didn’t have anything better.
The third reason is making people comfortable and allowing them to trust you. Things that are explicitly EA-labeled in ways that are credible seemed to get benefit of the doubt on many levels. There was definitely a vibe that such people would know to play by the rules of the game, avoid things that made people uncomfortable or raise alarms about potential harms, and generally keep up the game of playing at being good, and encourage more game playing, in addition to any real good that may or may not get done.
The flip side is that if you got too close to the recommenders, we’d have to recuse. I was recused from Lightcone, which I wasn’t sure was necessary but makes sense, and cost them a decent amount of money that I believe I would have given them had there not been a conflict of interest. I was also recused from Median Foundation, which I did agree was completely necessary. It’s an unfortunate side effect, but I don’t see a way around it without making things worse.
8. Avoid salient potential harms.
When you’re doing something that’s clearly importantly good and impactful, some amount of harm, or risk of harms, is both acceptable and usually inevitable. Big things tend not to be universally beneficial, and if they are then it’s even weirder that we have the opportunity to fund or do them.
When you’re doing something that isn’t as impressive, there’s much more worry about harms. A common mode of thinking was something like ‘this might not do all that much good, but it seems highly unlikely to do harm, so sure, why not’ especially when the operation was cheap to fund.
Whereas when there were salient potential harms, this caused a lot of reluctance to fund. And while we were free to fund things others thought would be harmful, there was definitely a spirit of cooperation and taking such opinions into account. There were several organizations people were initially looking to fund that others thought were harmful, and many of them ended up not being funded. There was clearly also risk aversion involved here.
One debate we inevitably had was the ‘is science/progress/growth/knowledge bad, actually?’ question that gets asked periodically. Given the threat of AGI, there’s the hypothesis that good things are actually bad, whereas bad things are actually good, because good things make AGI timelines shorter and nothing else matters. There’s also those who are concerned (in ways I consider at best wrong, but more centrally I consider confused and not even wrong, although I won’t defend that here) about S-risks and negative utilitarianism (or how coming down from the trees wasn’t such a good idea and agriculture was even worse, or what not, which didn’t come up during the S-process but definitely does happen).
That’s potentially fully distinct from the motive ambiguity phenomenon or the reversed morality of Moral Mazes, where good things are bad and bad things are good because if you support bad things it proves your loyalty and dedication and focus, making you a good ally who won’t be distracted or stopped by moral considerations.
It’s also all potentially not distinct from the motive ambiguity phenomenon, depending on your model of what people are thinking and how charitable you intend to be.
I do think that some people come to the ‘science and technology are bad’ conclusion for the ‘right’ reasons, but I also think that some people come to it for the wrong reasons, and often both are in play.
In this case, we did manage to effectively agree to treat science, technology, progress, economic growth and other neat stuff like that as neat stuff. The exception was when things differentially impacted AGI development, which everyone agreed was quite bad and very much not neat. Otherwise, still neat, but with a ‘discount’ based on such worries.
An implicit theme of many discussions was which potential harms should be considered salient and taken seriously into account, versus which ones weren’t understood or justified sufficiently to enter into evidence, beyond worries about progress. How worried should we be about things such as…
Very very broadly and also specifically, as a kind of catch-all: Various incentives?
Rewarding/punishing or failing to punish/reward bad/good past behavior or results?
Impacts on EA or cause area culture?
Perceptions of projects, or of cause areas, or of EA in general?
Information cascades?
Information flow? Polluting the information stream, or cutting off info, or (in places like AGI or biological risks) failure to keep things secret?
Game theoretic impacts?
Concerns about S-risks that are often misinterpreted in harmful ways, or that might be harmful even if not misinterpreted?
Various ethical or moral considerations?
EA becoming about promoting EA rather than doing actual work?
EA cycling money around in ways that disguised what was happening or allowed people to pass the buck?
Wasting the time of people who could do something important elsewhere?
Noting that this list is incomplete, and that #1 encompasses a LOT of stuff.
One of my greatest frustrations was the difficulty in conveying many of the downsides/harms I saw from a combination of the organizations and their projects, and of the S-process and EA money distribution structures more generally. I definitely got through in some places, and I definitely failed to get through in others. Where I failed, it was some mix of ‘this person genuinely doesn’t care about the thing I care about here,’ ‘I failed to convey my intuitions and model, and that’s a hard thing but it’s also on me’ and probably some amount of me having confused or invalid concerns for various reasons.
One recommender noted that I always seemed to have unique concerns they never would have anticipated.
9. Avoid negative associations that make people uncomfortable.
Several organizations clearly set off various alarm bells and made people uncomfortable. Things the people there had done, or had been associated with, were considered red flags. Without getting into details, there was a large incentive to avoid this happening to you, which is some mix of ‘don’t do stuff like that’ and ‘don’t let this become the perception’ which in practice meant ‘don’t cause anyone to loudly complain about you’ and ‘be careful who and what you associate with.’ It’s not entirely bad, it definitely is a filter for bad actors and you need that, but it wasn’t exactly a robust justice system either.
10. Don’t be caught lying.
Being caught lying was clearly quite bad. There was one organization that was going to get a lot of money, that did get a lot of money, but that got (at least in expectation) substantially less money because of a concern that their claims of impact were based on invalid calculations, and they didn’t correct them when alerted to the problem. I was willing to mostly overlook that in the end because I saw the system of grantmaking as putting a ton of pressure on that organization to do something similar, because the mistake wasn’t obviously a mistake even though I agree that it was one (even once it’s pointed out, there’s a counterargument), and because when I did the calculation on my own it was clear there was plenty of impact (and others did the same and got the same answer). It still did sour me on the whole enterprise, although it’s hard to know exactly how much.
If they hadn’t made the claim/calculation in question it wouldn’t have been an issue at all. We spent a bunch of time on that, and what else it might indicate and how we’d need to react to that, and I’m not sure if we reacted to it too little, too much or about the right amount. I don’t know where the lines should be drawn.
I do know that if the claim had been more brazen, or if there had been a lie that had been material that had clearly occurred, it would have been a severe black mark. Which seems good and right to me.
Incentives of the S-Process for Recommenders
The S-process is an opportunity to direct a lot of money to any organization that applies, both by directing your funds and by persuading others. Your decisions and arguments will of course be listened to carefully, and if you go sufficiently rogue your allocation could have been dramatically reduced, as well as not being invited back in the future.
The time pressure was real. I devoted what time I could.
The most important core tension was ‘spirit of the process’ and ‘get invited to keep playing the game’ versus ‘get the allocation you care about for this round.’
A secondary core tension was ‘figure out the right answers’ versus ‘only so many hours I can spend on this.’
One can consider the process in three stages. First, there’s the individual evaluation, then the discussion, then the adjustments in light of the broader picture.
For the first stage, there are questions like this.
If you think purely in terms of how many dollars are useful at all to an organization, you reinforce the biases discussed above towards claiming to have capacity. If you don’t, you’re not reporting your model accurately.
If you actually attempt to measure relative impact of dollars, your estimates of impact should be orders of magnitude different for different organizations, leading to a clear hierarchy of preferences. To not do this mostly represents one trying to seem reasonable or hedge one’s bets, but is not a reasonable EV perspective. No one put in different orders of magnitude, and we all did more of a rank ordering thing.
There are lots of weird inflection points in our instinctive desire to give money, but the curves don’t allow inflection points. What to do?
If you want to allocate a particular amount of money to a particular organization, the obvious thing to do is to select an extreme concavity so that you aggressively allocate that many dollars but no more. But that’s a hack and rather dishonest if you take the process inputs seriously.
Once you know you’re not going to fund something, how much attention do you pay to getting their curves right or to bothering to enter non-zero numbers?
Where to prioritize one’s time, in general?
How do we think about the value of holding onto money, especially factoring in that we won’t be the one allocating it in the future?
Essentially, there’s a lot of tension between trust the process and input individually accurate things versus think about what this does to the allocation process and what distribution this results in.
Then there’s also the question of ‘where I think the money should go and how I evaluate things’ versus ‘what I think Jaan and the other funders think about such things’ since it’s their money we’re giving away. To some extent they want us to substitute our own work and judgment and world model and values, but also to optimize for their world model and values.
I decided I was going to ‘trust the process’ for as long and as much as possible, and go with the spirit, whenever possible. I think everyone else did too. Decision theory agreed. That didn’t make it easy.
For the second stage, the questions are things like:
How much do I strategically steer conversations to try and make the allocations closer to what I think would be better?
How much do I filter information flow and emphasis based on what outcomes I want?
Are the things I’m curious about a good use of group time or should I let others raise their questions instead?
Again, I did my best to be as non-strategic as possible while still making the case for my views, and I strongly feel that others did the same. I did my best to direct curiosity where I felt curiosity was appropriate, while also making sure my big concerns got discussed since I did feel they were important, and attempting to convey my world models. These types of tensions are probably impossible to fix, and need to be navigated using a positive-sum cooperative culture.
For the third stage, the tensions get more explicit, because we can see a likely approximation of the final outcome that is about to happen, and we can figure out how changes we would make would change that allocation. So we’re trying to update our evaluations as we change our minds without letting strategic questions distort us, but also it’s very salient how much money is changing hands based on where parameters are set.
All of that requires a commitment to the process being more important than the opportunity to redirect the allocation of the money, even when there are very large swings because of the dynamics involved.
The central dynamic is the question of ‘who is going to fund this?’ Often there will be several people who agree that a given organization is worth funding, and would give them similar amounts. Who funds them as their top priority then often determines whose other priority also gets funded, and whose gets left behind, and people’s rank orderings. Strategic options were abundant. This was especially tempting when someone else was considering funding something I thought was harmful.
The Unilateralist’s Curse
There is a concern known as The Unilateralist’s Curse, where it only takes one person to fund ConflictCause or WeirdCause, or to do pretty much anything else in the world.
There’s a time and a place for this. At one point a few months ago, in one of my Covid-19 posts, I mentioned something non-Covid-related that some other people thought I shouldn’t be drawing attention towards. I was asked to take down the note. My inside view disagreed, but the argument was reasonable, so I gave deference to the request and took down the note.
In general, I think we have a general bias against action rather than in favor, a bias towards regulations and prohibitions and trying to centrally decide and plan things, and otherwise tell people what they must or can’t do far more than is wise, both culturally and as a matter of law. I’d rather make less things political and social, and let people do more things more easily.
The S-process, however, takes this to an extreme position, where six people are deciding to distribute money, and it’s very easy for projects that the group collectively thinks are very harmful to get funded. All that matters is the person most enthusiastic about giving funds, and the process mostly cares not the difference between the opinions ‘this isn’t that great’ and ‘this is actively terrible,’ except insofar as people could use social persuasion.
There was one tool, the ultimatum, that could have been used in a sufficiently extreme situation, and I believe it would have been used if one particular organization had been about to be funded, and possibly one or two others. I am curious if it would have had teeth in practice, but the bar for using it was clearly very high.
The other tool was to prevent endorsement, but we decided to remove that. Initially, there was a rule that if any one person disagreed with a grant, the grant would still happen, but it would not appear on the public list. I would have felt obligated to invoke the veto for one organization. If I had the option to veto and hadn’t used it, that would have been interpreted as an endorsement. There were two others that almost got funded by others, that I also would have felt the need to veto if they were about to be funded.
I observed this, others made similar observations, and we collectively agreed unanimously to take away our right to veto. That way, we could maintain transparency, which we all agreed was good, without feeling like we had individually given our approval where we didn’t want to give it.
There’s no great solution to this, either in the S-process or in general. If we weren’t in a process together, we mostly wouldn’t have even noticed. Having people who care about each others’ concerns and update on them somewhat even when they don’t agree or fully understand helps somewhat, but we’d ideally like more than that, and we’d like to take into account how many people want to fund something. You could argue that three people each thinking GoodCause could make use of a million dollars shouldn’t be that different from one person thinking that, if none of them think it would be useful to give more, but that seems like someone made a mistake somewhere.
I’ve tried to brainstorm mathematical solutions to this, but all of them have horrible strategic incentive issues, and none of them deal with the weirdness of what happens when you don’t happen to be in the S-process at the same time.
The core issue is that there’s no clear relation between the values that I put in for impact, and the values that a different recommender puts in. All we have are the relative values I put in, and even those are mostly only meaningful as a rank order, because our real beliefs should involve different orders of magnitude but our written numbers don’t do that. And that’s when there’s no incentive issues. Add in the incentive issues of letting those numbers cross over, and things get even stranger.
On another level, the core issue is that our true preferences over distributions and outcomes, when asked, are not all that well-represented by smooth curves. Yet we want to capture many of the benefits of the smooth curves. That includes the correction of large biases in our instinctive preferences, and the combination of the two gets hacky no matter what you do.
I’d also note that our discussions of when donations would be harmful mostly were about whether the organizations would do harmful things with the money. There’s a lot of other ways for a donation to be harmful, because it impacts the ecosystem more generally, and the incentive gradients involved. Giving people money for doing mostly worthless things that look right gives you more of what you reward, and in the bigger picture, that too is harmful. No one (or at least no one wise) said playing for keeps to change or save the universe was going to be easy.
Concretely, my suggestion would be something like:
Allow and encourage people to enter negative numbers into the S-process, to represent harm, and have that be highly visible to others.
Allow others to choose how to represent the impact of such concerns, and draw a distinction between model updates that changed their inside view, versus adjusting for knowing someone else’s conclusion even if they disagree.
Before allocating money to individual recommenders, if there are organizations that most people’s unilateral allocation (e.g. what they’d do if they gave out all the money by themselves) would fund, fund those first, so no one has to worry they won’t be funded or that by funding there they’ll be giving away their leverage to do other things.
Given the stakes, think about it more.
The Time and Resource Gap
Very little time was wasted during the process. The case was made that our first meeting could have been shorter, and there were a few places where we could have dealt with various things faster with a better process, but the gains to be had there are relatively minor.
The big issue is simpler than that. There’s a ton of information to find and to process, about a ton of different organizations, and one starts with very little to go on. This forces us to evaluate quickly. Cached impressions have big impacts. Previous results get copied in information cascades. Investigations only happen for some of the organizations, each usually by one person and not lasting very many hours. A number of applicants got six figure or seven figure grants with remarkably little backing those decisions up.
One could say ‘well then you all should have spent more time on this’ but our time is stretched thin as it is. It would not have been practical for me to double my time investment, and I am guessing the same is true for most of the other recommenders as well. You could of course pay me enough to free up that kind of time, at the expense of other things, you could make me quit my job at some price, but none of it would be cheap.
A suggestion has been made to have other people do at least preliminary investigations in advance. I think that’s pointing in exactly the right direction.
One of the great frustrations in my life is that, as far as I can tell, concierge services, assistants and secretaries are useless. With notably rare exceptions, even when they are provided free of charge, I have never been able to get more out of them than the time I put into them. I am confident that this would change for a sufficiently skilled and high-level person, and I am confident that I am lacking key social technology to hire well and to direct such people well.
In this context, in particular, it seems like delegation is clearly The Way. Thus, if I were to do this again, I would hire assistance to do at least the following:
Do a preliminary investigation of every organization, before the recommenders even start looking. Do later deeper dives on the ones that are potentially getting large funding.
Assemble key information about the organizations into good and consistent form.
Let us ask questions, and attempt to answer them, with or without contacting the organizations for answers as appropriate. Investigate particular questions recommenders are curious about. Summarize papers. Compile histories. Fact checks.
Do a sanity check on whatever we write in our notes, and on our evaluations, to look for things that are mistaken, or don’t seem to make sense. Think about what questions we would want to be asking, based on what we’re thinking.
Help schedule meetings for us to talk to people at the orgs, as needed.
The main worry I’d have is this might pull evaluations even more towards the things such people could evaluate, but the hope would be that the extra time and resources allowing deeper exploration is the dominating factor.
It might be a good instrumental cause of its own, of course, to simply have an organization that finds and trains good people capable of providing concierge assistance to people, and then providing it to select people or for help with select tasks when that seems worthwhile, free of charge. This would be highly related to a lot of what Lightcone is up to, only kind of the ‘next level up.’
Too Much Money
Professional poker player Antonio Esfanidari would often say he had TMM, or Too Much Money. Having enjoyed great success at the poker table, Antonio found himself with the ability to buy everything he ever wanted. Money outside of the poker table lost meaning to him. This resulted in some dumb decisions, which even became the topic of a TV show he created with friend and fellow poker pro Phil Laak.
Another form of TMM is when you feel obligated to find ways to spend it, especially when it is your budget or the money you’ve been given. A charitable organization can end up feeling like they need to be reenacting Brewster’s Millions.
TMM is not a fixed amount. TMM is a state of mind and a set of dynamics, and depends on a particular context. It happens when, in terms of spending, your grasp exceeds your reach. The reason Antonio had TMM, despite not being anything close to a billionaire, is the lack of a bigger goal.
Whereas Elon Musk is a billionaire but he is also trying to get civilization to Mars, so in important senses he very much does not have TMM, but his willingness to move large amounts of stock in highly inefficient ways in order to be a better Twitter troll is some combination of bespoke prioritization and having personal TMM.
It is often better to have what I refer to as EM, or Enough Money. That’s a sweet spot where money holds its meaning, and you care about value at all, but lack of money doesn’t hold you back from your goals. Of course, if you had orders of magnitude more money, perhaps you’d have different goals, or at least different methods to seek your goals. I know I would. But in a given local context, you can still have EM.
The other thing you can have, of course, is NEM, or Not Enough Money, or you can even be what Kanye West refers to as Broke Fi Broke.
(And interestingly, in my model, NEM and TMM are states of mind and it’s possible to have both at once, which is what happens when people are buying gold plated toilets.)
Anyway, I mention all this because there are several senses in which the process and the system around it could be considered to have TMM, or Too Much Money.
EA has TMM.
SFF had TMM.
A lot of people in crypto have TMM.
SFF Grants that were too large might cause organizations to have TMM.
If one thinks about the broad range of things one can do with money when playing for keeps and playing to win, then it is crazy to say that EA has TMM.
As a proof by example, rather than the best possible use of such funds: There are a lot of companies out there that one could purchase, and then run as public goods and in the service of important cause areas. Not everything is for sale, but many things are. Twitter’s market cap alone is around 33 billion. Pfizer is over 300 billion. Again, this is purely an existence proof.
However, if one is limited, for whatever reason, to giving away money to charitable organizations that already exist and which legibly fit the mold of EA causes and frameworks, then in the context of funds earmarked for such things, EA does seem from where I sit to have TMM.
In that context, SFF also had TMM. If you looked at each individual recommender’s allocation, everyone gave away substantially less than all the money. When I went looking for additional organizations to encourage to apply, I did find (or think of) one and with time likely could have found more, but my guess is that the organizations that didn’t apply despite being legible EA causes I’d have been excited to fund, did so because they didn’t need the funds.
The core reason for this is crypto. If you invested early in crypto, there’s a very good chance you have TMM. You see the signs of this all over the space, and it’s not a coincidence people are paying premium prices for bored apes. I didn’t think of it in time to get in early, but in hindsight, the moniker fits.
Regardless of how you would score the community’s performance on crypto, enough people who are EA/rationalist adjacent enough did buy enough that they’re in position to give such causes quite a lot, and also that’s where many of our giant funders got their bankrolls. If anything, as a group we are now overinvested, even if you are bullish on the space.
These idiosyncratic large grants are, in my view, a very good thing, but when looking to allocate SFF funds it means there’s a lot more to distribute and less places available to distribute to.
Meanwhile, what happens when an organization gets TMM? There are several potential problems, stuff like this:
The organization may no longer have to prove itself in order to get more funding, which can have many effects both good and bad.
The organization is no longer something others can usefully fund, which… makes those other potential funders sad? Makes them less engaged? People seem to care about this.
The organization is under pressure to spend the money, and to expand its scope and mission. Potentially under pressure to do a lot of this and quickly, and in ways they’re not capable of doing well. This can end up destructive to your production, or it can lead to wasting time of people who could otherwise do something valuable.
The organization could become the target of those who care mostly about money.
The organization is in more danger of fights over money and power, to be brought ‘in line’ in various ways, or of the dynamics involved in Moral Mazes.
Other organizations might lose sight of their missions in order to chase these kinds of funds.
The hope was that if an organization was explicitly told ‘we are giving you more money than you can usefully spend right now, please do not be in any hurry or feel under any obligation to spend it’ that this would help, but I have no idea the extent to which this will turn out to actually help. They have to take it to heart, and have to believe that we believe it, and they need to not think that spending the money will unlock similarly large grants from others now that they’ve seen SFF’s grant.
That brings us back to the question of whether there’s TMM floating around in general in the space, and what ways the money in the space drives organizations to act, in general. I’m sure I’ve gestured broadly at much but not all of it, but I want to stay within scope so I will decide to cut it off here.
And the Nominees Are
Such a post would be incomplete if I did not share at least some of my thoughts on the organizations that applied for money. Not sharing such thoughts makes it that much harder for others to make good decisions, and in a real sense wastes the work that was done. Despite that, there are dangers of doing this, the most salient of which are:
Information cascades. To some extent you want to cause an information cascade when sharing such information, but in an even more important sense information cascades are harmful. It’s important that you, yes you, in when making decisions of what to work on or give money to, think for yourself, model the world as best you can and come to your own decisions.
Discouraging others from investigation. Similar to an information cascade, if someone else has done the work, you might be tempted to skip the work, whereas the work is the most valuable thing most people can do here - using their decisions as costly signals to communicate local or unique information.
Mistakes. I’m gonna mess up slash did mess up a lot of this and it’s not a great look and it’s going to be embarrassing and people are going to complain about it and it would be an additional mistake to deny the giant ugh field this generates.
Politics. Saying in public who should and shouldn’t get large piles of money gets political, and it gets political fast, and oh my do I not want to go there if it can be avoided. Another giant ugh field. But it also can’t be used as a threat to suppress important information, especially asymmetrically by bad actors.
Distraction. Even if the resulting discussions manage to avoid being political, the term ‘demon thread’ still applies, and I could lose unlimited amounts of time and attention, along with a lot of stress, while the other points that were the prime motivation for writing this could get mostly forgotten.
Hackability. Revealing too many details about your ‘choose whether to give people money’ algorithm can provide good incentives and motivation to people, but it can also encourage them to fake it and hack you, and make you worry that your interactions with people are fake. I hate this, and have only some idea how much it sucks for people like Jaan.
All right, that’s all noted, time to just do go ahead. A powerful mantra.
There were a bunch of applicants. We can sort them into a few broad categories.
Unserious Applications
Firing an application into the void was made intentionally easy, subject to the need to include sufficient information for the recommenders about new organizations. One of the costs of this is that there will inevitably be some people who apply because there’s no downside and maybe you get a check, and be wasting your time.
This is a situation in which, once you have no business getting a check, worse is better, because it lets us dismiss your application faster. Thus, I appreciate that all these applications were very very obviously unserious and thus didn’t waste much time, and some of them put a smile on my face in a ‘nice try, kid’ kind of way.
Orthogonal Applications
As an example of this category, there was an application to take kids in Flint, Michigan fly fishing. I am not against taking kids in Flint, Michigan fly fishing. Quite the opposite, I’m all for it and I’d rather fund that than light the same amount of money on fire, but this was in no way relevant to our interests or a plausibly efficient use of funds.
This category also included plausibly good uses of funds, but for cause areas that weren’t relevant to the mission of improving the long term future. You could (or could not) conclude that it was a good idea to buy a bunch of malaria nets because you believe they save lives directly, or give money to poor people to make them less poor. Again, I have no problem with any of that, but it wasn’t relevant to our interests.
If we’d had any animal welfare charities, which we didn’t, they too would have gone into this category.
The interesting border case is economic growth. I certainly do not buy the Tyler Cowen model from Stubborn Attachments that all that matters long term is economic growth. Then again, there’s always that good old chestnut of ‘what if more economic activity means faster doom because AGI or other tech.’ Undifferentiated economic growth, especially catch-up growth that wouldn’t involve meaningful innovation or change in the culture or ability to do worthwhile things, didn’t seem relevant to me.
Thus the question of the Charter Cities Institute. Creating charter cities that operate under First World rules seems like a good thing. People should totally do that, and totally be willing to support it. I talked to them hoping it would be aligned with the mission.
The call was great, because they were honest with me and told me they weren’t doing the thing I wanted them to do. This is The Way. I didn’t do a good job hiding what answers I wanted to hear, and they said ‘nope, sorry, that’s not what we do here.’ Bravo. We also talked about a bunch of other stuff.
What I was looking for on that call was the ability to do things you can’t do in First World countries. In particular, challenge trials seemed like a strong litmus test. If your charter city allows the world to do challenge trials, then it’s super valuable. If it doesn’t, then you might be helping the particular people, but you’re not mostly doing the thing I care about. One of the things I learned on the call was that Prospera messed this up due to the way they intertwine existing legal systems, and a lot of other similar things too, which are the places that matter most to me.
That’s because there was a broad different category that I do think matters a lot, which is hard to describe in exact words, but involves changing the cultural landscape to favor the ability to think, communicate, innovate, produce and act in meaningful ways, and show that it is possible to do real things. To make there be more people in the world, in the sense that are not very many people in the world.
To fight the blight.
Innovation Station
A common theme of several applicants was finding better ways to Do Science, and improve levels of innovation. As noted several times there’s the risk that Actually Innovation Is Bad, Yo, the same way there’s the question of whether economic growth is actually bad. So there was a lot of asking about the extent to which such things would differentially advantage the ‘good’ innovation and science in its race against the ‘bad’ innovation and science.
I’m going to broadly gesture at A Thing in a bunch of ways and hope it’s sufficiently good training data that you can get an idea of what I’m pointing at, in the kind of mode where one attempts to transfer intuitions and models rather than prove anything.
My thinking about this heavily rhymes with what Eliezer Yudkowsky in his recent series of discussions (that came out after SFF was done) calls ‘shallow’ versus ‘deep’ patterns.
In this model, GPT-3 works by memorizing a ton of different shallow patterns, then uses this to predict text. Most existing ML systems do broadly similar things. They don’t do this other ‘deep’ pattern recognition, which one could also call ‘thinking’ or ‘actually thinking.’
Not doing deep thinking isn’t an attribute limited to artificial intelligence. One can model most humans most of the time as using entirely shallow patterns, and as doing something much closer to running a more-coherent and more-state-retaining version of GPT-3 (GPT-4?) than one might otherwise think. This is remarkably good at getting one through the day, provided the training data has pointed you to sufficiently resonant and appropriate shallow patterns.
One can also model organizations like academic institutions or corporations as mostly also running shallow patterns. Of broader society, and what I call the Implicit Conspiracy, existing entirely in shallow patterns. And all of them as evaluating and rewarding people with shallow patterns, and thus as pushing them towards the exclusive use of shallow patterns. The shallow patterns get ahead, and it looks like many things are crumbling around us if we look at them.
Thus, for a central example, you have this mockery that claims the name ‘science’ that is executing (relatively, in context) shallow patterns designed to produce ‘scientific output’ in the form of papers and grants. It occasionally finds something useful, but not that often, and decreasingly often, and it is displacing the actual act of doing science.
This is both directly relevant to ML in the sense of the question ‘if you put together enough memorization of shallow patterns do you get AGI?’ and for the question ‘if you execute a bunch of shallow patterns as AI researchers do you end up with AGI?’ My tentative gut answer to both questions is no (although you might still end up with a bunch of dumb machine learning systems that do a lot of damage or even get us all killed), but with a lot of very scary uncertainty.
Favoring deep patterns over shallow patterns, and enabling people to execute deep patterns at all and show others that they too can execute deep patterns at all, is one way to think about one of the things I believe is currently super important. It is highly related to the danger of Moral Mazes, the fight against the Blight and the Implicit Coalition (also known as Moloch’s Army), the decline in the discourse, the increased fakeness, scamminess and falseness of everything around us, the general inability of almost anyone to do almost anything real, and even more than in other places it is a vital weapon in our ability to successfully work on AI Safety. We need to culturally establish the act of Doing Actual Thing.
Anyway, when I look at something that is promising to create innovation or otherwise enable the Doing of a Thing, and asking whether it’s net good to encourage that, the question in my mind is whether it’s favoring shallow or deep patterns of thought and action.
If one could reinvigorate science for real, that seems clearly on the good side, so to the extent that I saw promising such attempts I was excited.
There were several proposals in this category looking to directly reinvigorate or enable science of a sort: NewScience, PrivateARPA, SocialMinds@CMU and Ought.
NewScience, SocialMinds and PrivateARPA seemed like they were good ideas if we were optimistic about execution. I was able to get there on NewScience, but not on PrivateARPA or SocialMinds. Somehow my notes on PrivateARPA were not saved, and I’m worried that I relied too much on others' vibes here and made a mistake not funding. For SocialMinds, I wasn’t sufficiently convinced on execution, but would have been onboard if I had been so convinced.
Ought was a weird case, where I had the strong initial instinct that Ought, as I understood it, was doing a net harmful thing. My thinking was something like this. They are using GPT-3 to assist in research, to do things like generate questions to ask, or classify data, or do whatever else GPT-3 can do. The goal is to make research easier. However, because it’s good at the things GPT-3 is good at, this is going to be a much bigger deal for those looking to do performative science or publish papers or keep dumping more compute into the same systems over and over again, than it will help those trying to do something genuinely new and valuable. The hard part where one actually thinks isn’t being sped up, while the rest of the process is. Oh no.
On top of that, it would be evaluating papers, along with extracting information from them, and thus encouraging papers to align themselves with such shallow patterns, and would also be tying a black-box research assistant into the scientific process in ways that couldn’t possibly go wrong in the sense that when they did go wrong they would be nearly impossible to get at or repair. I was also confused why this was a non-profit, their defense of which was ‘to avoid bad incentives’ in various ways, which on reflection is at least reasonable.
A lot of others positivity seemed to reflect knowing the people involved, whereas I don’t know them at all. A lot of support seemed to come down to People Doing Thing being present, and faith that those people would look for net positive things and to avoid net bad things generally, and that they had an active eye towards AI Safety. With time, I’ve forgotten a number of details here, and also notice that I’m not confident I understand what they’re actually doing (I wasn’t the only one confused) and should probably talk to them more about this at some point.
When I read a comment on LessWrong by Jessica Taylor questioning why one of MIRI’s latest plans wasn’t strictly worse than Ought, I realized her question didn’t make sense if I’d been understanding Ought correctly, so I presumed that I was confused and asked her, which helped me understand better:
They're trying to convert Paul Chistiano's alignment research (e.g. humans consulting HCH) into near-term testable models. E.g. Paul hypothesizes that if you're trying to solve a big task, it's possible to break it into lots of small tasks each of which can be solved by someone thinking for a bounded amount of time (e.g. 1 hour).
They're considering the feasibility of training machine learning models to solve these specific tasks. They're trying to build AI tools that help people with breaking a problem into tasks and using AI to help solve sub-tasks, e.g. by predicting what someone is likely to approve of. These AI tools might cognitively enhance people by providing them with advice they would consider good upon reflection.
This frames the whole thing on a meta-level as a way to test a theory of how to build an aligned AI. As per Paul’s theory as I understand it, if you can (1) break up a given task into subcomponents and then (2) solve each subcomponent while (3) ensuring each subcomponent is aligned then that could solve the alignment problem with regard to the larger task, so testing to see what types of things can usefully be split into machine tasks, and whether those tasks can be solved, would be some sort of exploration in that direction under some theories. I notice I have both the ‘yeah sure I guess maybe’ instinct here and the mostly-integrated inner-Eliezer-style reaction that very strongly thinks that this represents fundamental confusion and is wrong. In any case, it’s another perspective, and Paul specifically is excited by this path.
I wouldn’t be surprised to learn this was net harmful, but there was enough disagreement and upside in various ways that I concluded that my expectation was positive, so I no longer felt the need to actively try to stop others from funding. Since I was confident I also wasn’t going to get excited enough to become the one funding them, I mostly stopped there to save time. I notice I’m still confused in various ways.
EuroBiostasis also fell into this category for me, because they were clearly actually doing the thing of figuring out which physical techniques would work and which ones wouldn’t, in order to accomplish a clear goal (do cryopreservation properly so people can actually be revived and it’s therefore a real thing). I also did buy the claim that by showing cryonics could work, we could motivate people to care more about the future - I’m not going to get into it here but ‘get people to think there is a future and they should care about it’ is pretty important as part of this whole thing, and if that means we have to (checks notes) solve climate change so people can stop thinking of the future as non-existent and doomed (in ways other than the way it actually might be doomed) then maybe that’s actually a good idea purely for that reason aside from the direct benefits. And I did notice that signing up for Alcor had a ‘care about the future more’ impact on myself, although I am unsure of magnitude. But largely this was ‘f*** around and find out’ in all the right ways, with the potential to show people that you could Do Thing and in particular that the kinds of things people who are worried about other important causes say are important to do can be done in particular, and maybe listen to them about other stuff, and so on.
The applicants for this round of the S-process was a sufficiently ‘weak class’ that this was enough for me. I definitely had a vibe of ‘I should be able to do better’ but I couldn’t, so to some extent I went with it.
Finally, Emergent Ventures India was the one applicant I managed to bring in when I realized I was allowed to go do that, and that the current pool didn’t have enough things I was excited to fund. I’ve been super impressed by Tyler Cowen’s ability to select people who have the potential to have big impacts, and to make them more ambitious and more likely to have those big impacts. Giving money differentially to innovative people likely to do innovative things seems likely to favor the deep over the shallow, and the restriction in our ability to do this type of granting is lack of targets rather than lack of funds, whereas Tyler is very good at identifying targets. They’re not EA insiders and they’re not speaking EA buzzwords, but that really really shouldn’t be the thing that matters here. I hope that having raised this one to attention, it can help find other EA funding sources as well. The idea that this is funding constrained at current operational margins seems nuts to me.
I also contacted Robin Hanson to see if there were any prediction market projects we could explore, including the fire-the-CEO markets he said would be what he’d do with a million dollars, but they weren’t shovel-ready. He needs a founder type to actually execute, and there’s a shortage of those so I couldn’t give him one.
There were a few others that were in various ways Doing Thing that I was in theory ready to fund, but there were red flags raised in various ways or upon examination I didn’t think the Thing in question would work in the relevant sense. I’m making a judgment call and not naming them.
Nuclear War
There are other ways for things to go terribly wrong, but none of them make the possibility of nuclear war go away. Nuclear war could be extremely bad. There was the inevitable ‘but would it be that likely to keep us down permanently?’ along with the also inevitable ‘it might help stop AGI’ and all that, but if felt obligatory rather than true objection territory.
That isn’t always the case, and there are reports that those funding long term causes often say things like ‘nope, that’s not an existential risk, that only kills most people and we don’t care about that very much’ and that’s where the issue would otherwise fit, making it difficult to secure funding. This certainly should matter at least somewhat in terms of ability to get funding, but I sense that this is one of those ‘you don’t fit into any of our slots’ issues that ends up being far more annoying than it should be. Others would know better than I, however.
We were given a bunch of applications that involved preventing nuclear war, and one, Alliance to Feed the Earth in Disasters (ALLFED), which was about mitigation in the aftermath of a nuclear war, centering on but not limited to finding practical ways to keep everyone fed during nuclear winter.
The plans that involved preventing nuclear war were certainly aiming at a goal I considered highly relevant, but none of them seemed at all promising in terms of having any effect. There’s a long history of people who don’t like nuclear war, and of some of those people saying ‘but look nuclear war would be really bad, everyone!’ over and over. They try to ‘raise awareness’ and all that. I don’t see how this leads to a lower probability of nuclear war. It might create feelings of hope, but like hope it is not a strategy.
None of the candidates here seemed like they were even implementing well, so none of them got any consideration for funding.
As an example, Strategic Risks wanted to create a show called ‘Radioactive Road Tripping.’
That left ALLFED, which was a very different case, and where I ended up working with them a bit after this post was originally put up, in ways that addressed the main concerns I had at the time.
ALLFED noticed something few others had noticed or done much about, that being ready could make a huge difference if the nukes did fly in terms of people not starving to death and civilization holding together, and that almost no effort was being made to get ready. While amateurs talked strategy, they studied the logistics, and got others to notice them, with the hope that solutions could be found. Some academics are working on solutions, but ALLFED is especially interested in very cheap, practical solutions that aren’t going to be fun for anyone, but would promise to get the calories into people, and be able to be implemented at scale when the time comes.
The parallel to ‘actually try in advance to deal with something similar in some senses to but far worse than the Covid-19 pandemic’ was not lost on me.
I bought the case that the cause was super neglected and in danger of not getting funding, and could have a huge impact even if that was with small probabilities multiplied together. When I did Fermi calculations, this was a very good investment.
My worry, before talking to others, was whether their technological proposals were feasible, and made sense to work on. I tried a bit to get those who would be in better position to look into this for me, but not as hard as I should have, and I got a bunch of ‘I don’t know either’ back. I fell back mostly on my priors, which was that they were doing the types of things that had any chance of working in practice at all, and as the people who noticed the problem it seemed only reasonable to let them try and solve it, so while I had uncertainties, I was excited to fund them.
When we discussed ALLFED as a group, there were several concerns. I’m going to document the whole thing with the aim of giving the senses I got rather than an aim of maximum charity, please don’t take this as a criticism of any particular person or their actions or decisions, or anything like that.
Capacity. Could ALLFED scale? Could it remain effective, hire and manage well, and so on? Was it mostly the one person who produced value?
Amateurism. Basically a ‘yes, thank you, you founded the space, but now we should leave this to the professionals no?’ kind of vibe thing.
Feasibility. Are their ideas good? I had this too, as noted above.
Honesty. There were concerns, especially around impact calculations.
On the flip side, there was the consideration of a potential ‘hindsight grant,’ the idea that we should give ALLFED money because of what they had already done, to align incentives for things like starting hugely valuable new fields, even if we didn’t have high expectations for what they’d accomplish with the money. I don’t know to what extent ‘the founder of the thing should be by default trusted to figure out how to keep doing it, or at least given the tools to be one of the people trying’ factored in for others but it definitely did for me.
The capacity argument wasn’t invalid. There’s a capacity concern with every organization, otherwise it would be easy to choose the best one and write one check. The unilateralist issue comes up here, as we each had an idea of how much funding room there was and the biggest number is the one ALLFED got.
The amateurism thing seemed wrong to me. Academics are working on some solutions, sure, but they’re working on much less efficient, much more expensive solutions that would be more difficult to implement in a crisis, and they’re doubtless doing it in a very academic way, and by assumption focused on paper writing and grant getting. I’m not saying to can the academics, but there is no sense in which ‘don’t worry, the adults in the room are on it’ is ever going to give me comfort anymore, or cause me to think that now someone else will and therefore I don’t have to.
The feasibility thing wasn’t explored enough, I’m sad we didn’t get a better handle on the physical-world landscape. More research is needed and all that.
The weird one was the honesty concern. There were reports that were essentially of bad vibes around this issue, a general sense of a lack of epistemic rigor and honesty. There was a potentially big grant here, so a lot of questions were asked, and the concrete thing that got identified was their impact statement and how it was calculated. In particular, the calculation surrounding the likelihood of nuclear war, and on top of that the general sense that their estimates of how much of the impact of such wars they were preventing and their overall impact calculations seemed unreasonable.
The first thing I did in response was make sure I’d done my own impact calculation and wasn’t using theirs, and that came back ‘yeah, this is overdetermined to be a good idea, that’s not an issue.’ I think a lot of the concern actually boiled down to something like ‘they’re claiming all this impact and that’s a really big status claim’ and ‘they’re claiming all this impact which would force me to draw conclusions so I’m looking for a way to avoid thinking that’ and ‘they’re claiming all this impact without being sufficiently insider or going through all the proper channels and laying the foundations.’ That’s paraphrasing and somewhat uncharitable, but also my best attempt to be accurate.
The substantive complaint was that they did an invalid calculation when calculating the annual probability of nuclear war. They did a survey to establish a range of probabilities, then they averaged them. One could argue about what kinds of ‘average them’ moves work for the first year, but over time the lack of a nuclear war is Bayesian evidence in favor of lower probabilities and against higher probabilities. It’s incorrect to not adjust for this, and the complaint was not merely the error, but that the error was pointed out and not corrected.
I reflected on this. It certainly wasn’t good but I noticed I wasn’t overly bothered by it, and was only imposing a moderate-sized penalty, so here’s what feels like my intuitive reasons why I only imposed a moderate penalty.
The EA space’s focus on ‘impact’ and in particular on putting together numbers to quantify impact is essentially telling everyone to find the way to write down the highest possible number, and not to worry about whether that number corresponds all that well to reality. When someone tells you to lie to them or hide information or be misleading or p-hack or what not, and then you notice people doing it, it’s a mark against them but it’s kind of on you.
Expanding on that: In particular, the EA space pattern seems to often or by default be that things that can’t be quantified don’t get counted in many funding decisions, and the numbers get taken overly seriously, or at least the way people talk to outsiders about such decisions makes it sound like that. Meanwhile there are a lot of benefits that are hard to quantify, and it seems like you’re competing to put up a higher number against others who are ‘playing the game’ in these ways.
The calculation is wrong and I instantly saw it was wrong and why when the calculation was pointed to, before anyone explained what the error was, but that kind of thing is my comparative advantage. I live for this stuff. My guess is that most scientists wouldn’t see this, most scientists don’t even understand Bayes Rule. That doesn’t mean they can’t do good work, and the original mistake doesn’t reflect all that badly on them when I think about the context. Also, given a nuclear war would kill a lot of people, depending on how you view Anthropic Bias, it can all get a lot murkier. Also, it’s very different to notice you’re making a (somewhat motivated to not be noticed) mistake with the default way to estimate something (in general, ask ten experts and average their guesses is a good heuristic) than it is to realize what the mistake is when it’s pointed at. So I don’t dock them many points for making the mistake, at all.
When the mistake was pointed out, they made a decision not to fix it. That’s the key issue - but see note below that they did fix it after I discussed it with them, and that I believe their failure to fix it came from a lack of understanding their error. When I put myself in their shoes, I see the request that I alter this as a kind of isolated demand for rigor from an insider targeting an outsider, telling them to lower their key numbers for technical reasons in a way that’s not being applied to others and isn’t obviously even correct. And it’s a demand that I go back and fix what isn’t broken to make myself look worse and hurt my cause, and which would require me kind of admitting I did a wrong thing that then everyone would point at forever, and all of that might mean tons of people die in a nuclear war and I’m not excusing it or anything but what the hell did you expect to happen.
So that’s one angle, when I ask whether this means I expect them to be unusually untrustworthy in other ways, I answer no, and I also answer that if you look at OpenPhil or GiveWell analyses with a similar standard I expect to find lots of things like this or worse, and them not to be eager to retroactively correct everything when you point them out and they would lead to conclusions the people involved didn’t like. There’s a kind of person who values ‘getting it right’ enough and such people are sadly very out of fashion and rare in 2021. They don’t seem more common in EA in general.
The other question is whether this represents a strategic mistake, or some sort of bad cultural fit, as in: You should have known to fix your error because we’d have liked it when you fix your errors in this way, thus we should penalize you for being the type of person who makes mistakes. Or who doesn’t understand the cultural codes for interaction in this space. This is like when VCs see a founder screw something up, and they dock them points way way out of proportion to the thing itself because it’s a marker for future such things and the perception of future such things that results from future such things, and so on. Except that now it’s even more of a raw norm enforcement thing, and it’s also a punishment for not listening to what I say you should do over what you should think would actually work or even what actually does work. Thus, the people reacting in this way are doing what is often done to outsiders-in-context. If they did fix the numbers the response would be ‘now the numbers are too low and you’ve admitted you inflated your impact numbers, so we can’t fund you.’ If they didn’t fix the numbers, the response is ‘You didn’t fix it.’
This seems like a good place to have gone into that detail because here there seemed to be broad agreement to fund them anyway, based on the potential impact, neglectedness, track record and uniqueness considerations, among other things, but it likely did matter and counterfactually could have mattered a lot more.
Postscript: ALLFED Corrects Its Estimates
After I wrote this post, Dave Denkenberger of ALLFED reached out to me regarding their estimates of inadvertent nuclear war. He wanted to clear up both what had happened and also the estimate itself. We spent several hours on the phone talking about how this had to work, and I also spent some time in email correspondence with the person who originally pointed out the error to gain additional context.
Those discussions made it clear that the things required to get this calculation correct were indeed difficult to understand for someone who lacked the background doing them, but Dave showed an interest in actually understanding them, which I hope will transfer into general improvements in thinking, and in the end I do think he got it. One can quibble with any answer to a question like this, but I do not have any worries about lack of good faith.
I am satisfied with the correction. That correction has now been made, and can be see here. One could disagree with the answer by arguing about what the prior should be, but it is reasonable and also this is standing in as a proxy for all nuclear war and thus missing important things such as a different pair of countries (USA/China, Russia/China, perhaps India/Pakistan if it in-context counts, etc) and also the possibility of intentional nuclear war. Which is much more salient now in March 2022 given Russia's invasion of Ukraine and its explicit threats to escalate to nuclear weapons and calling lack of control of Ukraine a 'threat to the existence of the state'.
Like many others, I am far more interested in causes that minimize the impact or probability of nuclear war than I was a year ago, and ALLFED remains the best known-to-me long-term way to do that. But it will need to be supplemented by other approaches and taken more seriously in many other ways. There's a real chance for example that Russia will soon have a civil war or breakaway regions, and securing all the nuclear weapons will not be easy in such cases. There are six thousand and it's scary to lose even one.
AI Safety Paper Production
I consider AI Safety and related existential risks to be by far the most important ‘cause area,’ that’s even more true given the focus of SFF, and I am confident Jaan feels the same way. The problem is that saying the words ‘AI Safety’ doesn’t mean you’re making the AI situation safer, and to the extent there are obviously good things to do it would be weird to not find someone already doing them. So when something like SFF gets applications, there’s negative selection effects.
There’s also the problem that as AI Safety becomes more of ‘a field’ there’s more of the traditional pressures that turn previously real things into largely fake things. And the problem that the actual AI Safety problem is an impossible-level problem (as in Shut Up and Do the Impossible) without clear intermediate signs of progress or publishable papers, and all the stuff like that. And one in which a solution might not exist or might come far too late. It’s easy to see how most efforts end up dancing around the edges and working on shallow easy problems that don’t much matter, or in many cases not even doing that, rather than working on things that might possibly work.
It’s hard to find things that might possibly work in the AI Safety space, as opposed to plans to look around for something that might possibly work.
Thus, I was excited to fund late applicant Topos Institute. As far as I could tell, they’re people with strong mathematical chops working on difficult math problems that they think are most important to solve, along the lines they think might actually work. I wouldn’t have chosen many of the details of their focus and approach, and they don’t even buy the concerns over AGI the same way I or Jaan do, but I want them to do what they think is the right thing to do here, and I’m thrilled for any and all efforts of this type, by as many people as possible, so long as they both have the chops and are aligned with us in the sense that they have their eyes on the prize. All sources I asked confirmed that they count. On reflection I regret not giving them more than I did, and I believe this was due to the S-process default curves and them only asking for a reasonable amount of money.
CHAI@BERI also seemed clearly worthwhile, and they got a large grant as well.
I should also mention CLTR@EAF. I was excited by the detailed contents of what they are working on, relative to the baseline the applications set for excitement, but their focus on s-risks was concerning to me. I don’t want to have the debate on this, but I consider concerns about s-risks a bigger thing to be concerned about right now than actual s-risks. They do have a reasonable plan to mitigate the risk of concern about s-risk, and are saying many of the right things when asked, so I came around to it being worth proceeding.
In contrast with the ones above, there were a number of organizations that all looked alike to me. This will be fair to some of them, and less fair to others, but they seemed to centrally be writing papers that model the AI Safety space in this way.
Building an AI does stuff.
But is not ‘safe.’
Unless, of course, you push the button marked ‘safety.’
Alas, pushing that button is costly relative to not pushing it.
We can model the problem as two players in an iterated prisoner’s dilemma who can defect (not press safe) or cooperate (press safe) each round.
Sometimes IPDs go quite badly.
Do something! Regulation?
I read enough of these papers for my eyes to glaze over quite a bit.
There’s a steelman of what they’re doing in these cases, which is that in order to get people to listen to you, you have to write exactly the right official paper in the right place with the right emphasis and tone twenty times for twenty different subgroups, after which perhaps they’ll pay attention to you at all, or something like that. I’m not fully discounting this, but I notice I don’t have any expectations that meaningful things will result.
Then there’s ‘field building’ in the sense of things that make more people put ‘AI Safety’ into their job descriptions, but without anything I found plausible that would cause this to result in AI becoming safer. Diluting the field with a bunch of shallow work doesn’t seem like it should count as helping.
I don’t feel any need to point out who fits into these categories, but if you’re considering funding an organization based on it helping with AI Safety I’d urge you to check to see if they’re actually doing things that are useful. Part of that is that given our time constraints, I’m reluctant to start a potential negative information cascade with this high a risk of being mistaken about the particular claims or organization.
Then there’s the people who think the ‘AI Safety’ risk is that things will be insufficiently ‘democratic,’ too ‘capitalist’ or ‘biased’ or otherwise not advance their particular agendas. They care about, in Eliezer Yudkowsky’s terminology from Twitter, which monkey gets the poisoned banana first. To the extent that they redirect attention, that’s harmful. The avalanche has already begun. It is too late for the pebbles to vote. To the extent that they do get control over existing bananas that aren’t fully poisoned, or future ones that are, the goals are in my model even worse than that.
I do feel the need to mention one organization here, AIObjectives@Foresight, because they’re the only organization that got funding that I view as an active negative. I strongly objected to the decision to fund them, and would have used my veto on an endorsement if I’d retained the right to veto. I do see that they are doing some amount of worthwhile research into ‘how to make AIs do what humans actually want’ but given what else is on their agenda, I view their efforts as strongly net-harmful, and I’m quite sad that they got money. Some others seemed to view this concern more as a potential ‘poisoning the well’ concern that the cause area would become associated with such political focus, whereas I was object-level concerned about the agenda, and in giving leverage over important things to people who are that wrong about very important things and focused on making the world match their wrong views.
Getting deeper into that would be an even longer thing, and maybe it’s worthwhile but I’m going to stop there. In general, the group of applicants here made me despair about the work being done and its prospects for being useful.
Access to Power and Money
In my model, one should be deeply skeptical whenever the answer to ‘what would do the most good?’ is ‘get people like me more money and/or access to power.’ One should be only somewhat less skeptical when the answer is ‘make there be more people like me’ or ‘build and fund a community of people like me.’ The more explicitly and centrally this is what one is doing, the more skeptical one should be. The default reasons people advocate for such things are obvious, regardless of how conscious or intentional such paths might or might not be.
The art must have an end other than itself. By its fruits ye shall know it, the shining city on a hill. Power corrupts, if you gaze into the abyss it gazes into you, we are who we pretend to be and our virtues are that which we practice. If we are functionally about seeking power and money then we’ll turn into the same thing as everyone else who is about seeking power and money. Be wary of anyone saying “only I can fix it.” And all that. The more EA funds are giving to other EA funds and those funds are about expanding EA, the more one should worry it’s a giant circle of nothing.
These could be split into a few categories.
Some organizations focus on access and influence. If you can get people with power to listen to you and adopt your ideas, that’s valuable. The best example of this was Alpenglow@CEA. They had a solid case that they were successfully getting meaningful access for people who would use that access in ways that matter.
This was kind of the best case scenario for this sort of thing, where there was relatively less danger of corruption or wasted money compared to the potential for tangible benefit. The bar for such efforts should be quite high. I still think we overfunded because there are others out there and I think SFF overpaid versus its ‘fair share’ here, but that’s not the biggest mistake. I wish we knew how to do such things ‘safely’ in terms of keeping ourselves intact in the process. Until then, I’ll continue to be deeply uncomfortable in such waters.
A second category are regrants. I have a strong aversion to giving out grants in order to give out grants (in order to give out grants?) without a damn good reason for this, especially to organizations that seem like they shouldn’t be running into funding issues, or that seemed like they would be able to make lower-quality decisions than we would. The argument in favor boiled mostly down to ‘the other places can use lower-cost labor and thus make smaller-size grants and find things we don’t have time to evaluate.’ Which isn’t nothing, especially if one can be confident the money only flows in one direction. I do worry about a lot of self-dealing and double-counting here, and the various things it might be messing up.
I don’t think we should have been anything like this eager to give money to the EA Infrastructure Fund. Ironically, it was a question I asked at one of the meetings that led someone to fund them at this level, as I clarified the situation in their mind. That’s how the process is supposed to work, even if I feel a little sick about it - the whole goal is to be strategically unstrategic (or is that unstrategically strategic? Both?), as it were. That’s not to say I think the fund shouldn’t exist or have money, and especially that if we believe Buck in particular is very good at finding good small targets and small things to do that Buck shouldn’t have the ability to go do that, but this felt very much like overkill and a kind of giving up, especially given the goal of ‘infrastructure.’
Then again, I do think ‘fund individuals’ is a great thing to be doing if executed well, far better than funding organizations, so maybe I was wrong about this. I’d need to look into it in more detail to know. I do know that if I was given money for ‘infrastructure’ in this sense I would expect it to be well-spent relative to current margins, but also that it would look and feel weird.
Along similar lines there was the LTFF@CEA, or Long Term Future Fund. They have some clear wins on their book (e.g. John Wentworth) and my notes indicate I thought the bulk of their targets seemed reasonable, although on reflection that makes me worry about the extent to which ‘seem reasonable’ was an optimization target. It’s another case of ‘find individuals and other places to put small amounts in ways that seem plausibly good and do it’ and it seems like something like SFF should be able to do better but if the applicant pool is this shallow maybe we can’t.
As an isolated thing, almost all small grants of these types that are issued without forcing people to apply first seem like they’re net good, but they also end up warping the space and culture around the seeking of such grants, whether or not formal applications have to be involved. There’s a lot of ways this can turn toxic and ruin things, and the technology to avoid this doesn’t seem to exist anywhere - it’s not a unique failure of EA, or an unusual lack of skill, although there are doubtless places that know how to do somewhat better.
The limiting factor on such efforts, in any case, should be the ability to find good small targets without the process of finding them overly corrupting the process or the more general ecosystem. To the extent funding didn’t reach that point, and the process is sufficiently non-corrupted, it seems reasonable to give it more funds. I’m in no position to evaluate where we are on such scales, but I am at a minimum skeptical that we have untapped pools of ability-to-find-good-targets that aren’t being used but that we could tap at reasonable effort and cost.
The comparisons to ‘field building’ in AI Safety seem relevant.
Finally, there were the two explicit pyramid schemes slash plans to use money to extract more money. One wanted to target founders of companies and convince them to pledge to give money, the other wanted to go after heirs of fortunes.
These seemed deeply terrible. If you think the best use of funds, in a world in which we already have billions available, is to go trying to convince others to give away their money in the future, and then hoping it can be steered to the right places, I almost don’t know where to start. My expectation is that these people are seeking money and power, largely for themselves, via attempting to hijack that of others, especially for the one targeting heirs, with the goal here (and in a few other places that were more about power than money, but mostly similar) being to become the court advisor/wizard that it is the power behind the throne, and then we hope that this is used in the right ways, but the people who seek that position tend to be power seekers, or become that over time. It’s weird and surprising when one of them cares about The Realm. Even if they do, their ability to steer their targets will be limited in the best of circumstances. To the extent that such projects even have positive returns on capital, which isn’t clear, the vast majority is likely to go to causes that don’t matter, and the vast majority of money that goes to causes that matter as measured by what it says on the tin will go to fake versions of those. And with such people directing their funds in these places, incentives will shift towards being fake and appealing to fakeness, and away from doing real things, so the money likely does not harm on top of everything.
I’d like to think the virtue/ethics/moral considerations mattered in the end, but my read is that a practical ROI calculation ended up carrying the day - it looked like there was willingness to be what I would view as the villain in the play, but that the calculations said that for our purposes this type of strategy didn’t pay even if you discount such concerns, and so the strategy was not funded, whatever anyone would have chosen to call it.
I am happy about this particular outcome, but sad about the process. Looking over everyone’s comments again it seems clear that my concerns mostly were not shared, and the whole project didn’t give people the willies, and that in turn does give me the willies. When we talk about how various moves evaluate in terms of connections and money and power and all that rather than trying to Do the Thing, we have lost The Way. I wish I had a better way to communicate what I find so deeply wrong here, to make people really Look, and my inability to effectively do that and the inability of many others to see it is the thing I find most deeply troubling and wrong, if that makes sense.
Lightcone Infrastructure
As a final note I should likely mention Lightcone Infrastructure.
We decided I had a conflict of interest here, so I didn’t have the option to fund them, but if I’d had that option I would have happily done that. I do think there’s good reason, especially from Outside View, for me to not have that option, as I write on LessWrong quite a bit and know the people well, and Raymond Arnold has been a close friend for a while, and also on reflection yes, recusing here was the right decision for other reasons too, as I’ll note in a bit.
I do think I would have funded Lightcone as a full outsider, but it’s impossible to be confident of such things.
While this is the team behind the current LessWrong, which I believe to have high value, that’s not what grants would fund at the margin. I do think that we’re likely underinvesting in LessWrong itself but that doesn’t mean we have an obvious way to turn money into a better LessWrong, as it requires the right people to work on it.
There’s also the fact that the Lightcone plan is more than a little bit something like ‘make life for a class of people Zvi at least kind of belongs to much better in the hopes they get to do more useful things faster’ and on reflection, yeah, I probably shouldn’t be making the funding decision on that, what do you know.
I mostly want to mention Lightcone because I do find the new thing that Lightcone is trying to do compelling, and I would love to see it expanded.
We can model the world as consisting of a limited number of people who are Doing a Thing that we consider relevant to our interests, another group that at least shows signs they could and might do things that are relevant to our interests at some point in the future, and then the vast majority of people who are not doing that and show no signs of ever doing so.
For current Lightcone, ‘relevant to our interests’ mostly means ‘AI Safety work’ but the argument doesn’t strongly depend on that. I certainly think all non-fake AI Safety work should count here, regardless of what else and who else counts as well.
For the people who are Doing Thing, a lot of their time is spent on Stupid Stuff, and they must Beware Trivial Inconveniences and the expenses involved in various actions. This eats up a large portion of their time and cognitive efforts.
Even remarkably small stupid stuff can eat up a remarkably large amount of time that could otherwise have been useful. Having a good, well-equipped and stocked place to work, where you get meals taken care of and can meet with people and interact spontaneously and other neat stuff like that is a big game. So can ‘stop worrying about money’ either in general or in a particular context, or ‘not have to run these stupid errands.’ Life is so much better when there’s other people who can take care of stuff for you.
A lot of work and potential work, and a lot of community things and events, and many other things besides, are bottlenecked by activation energy and free time and relatively small expenses and stuff like that. Even I end up worrying a bunch about relatively small amounts of money, and getting time wasted and rising stress levels about things where the stakes are quite low, and fussing with stupid little stuff all the time. You really could substantially increase the productivity (although not at zero risk of weirdness or distortions of various kinds) of most of the people I know and almost anyone not high up in a corporation or otherwise super loaded, if you gave that person any combination of:
A credit card and told me ‘seriously, use this for whatever you want as long as it’s putting you in a better position to do the stuff worth doing, it doesn’t matter, stop caring about money unless it’s huge amounts.’ More limited versions help less, but still help, and for someone more money constrained than I am, all this doubtless helps far more.
A person I could call that would be an actually useful secretary and/or concierge, especially someone who could run around and do physical tasks. We have a nanny for the kids, and that helps a ton, but that alone doesn’t come anywhere near what would be useful here. The only concierge service I know about, which I somehow got access to, is completely useless to me because it assumes I’m super rich, and I’m not, and also the people who work there can’t follow basic directions or handle anything at all non-standard.
A person I could have do research for me and figure things out, assemble spreadsheets, that sort of thing.
A nice office space and hangout space I could use and take people to, especially where other interesting people also often went, and where everything was made easy, ideally including free food.
And I think that applies to basically everyone who hasn’t already gotten such things handled. And it’s a shame that we can’t find ways to get these sorts of things usefully into the hands of the People Doing Thing that we think are special and the limited resource that actually matters.
That doesn’t mean this is an easy problem. The first item is especially dangerous, you can’t go around handing that out to people without huge amounts of moral hazard and risk of corruption of the entire space, so that’s out, or at least out outside of a few special cases.
The other items, however, hold more promise, and are where Lightcone’s strategy comes into play. If you’re Doing Relevant Thing you get access to nice office space, free food there, and people to take care of at least some of your logistical needs, and a place to gather and meet people. That’s a big game, and to me seems like an Obviously Correct Move as long as you can make reasonable decisions about who gets it, and deliver in a reasonable way, and otherwise avoid things getting too corrupted, which is why you presumably don’t hand out any company credit cards at least outside of special circumstances.
Over and over again, as the years have gone by, I’ve seen communities fail, connections not happen, projects not get done, people get overwhelmed, and other similar things happen in ways that can be solved with relatively small investments of resources, if you can apply them well, and identify the people worth helping in these ways.
It’s also plausible that this is actually a hidden Universally Correct Strategy for society as a whole and we should be giving everyone who is 22, passes some basic checks and asks nicely either these types of resources or a job providing them to those who get the resources, or something, encouraging them to start businesses and unique projects and the world ends up a much better place, or something like that, although I haven’t gamed that out fully, but it isn’t obviously stupid.
There’s a lot of space available in the ‘spend money to enable Doing of Thing’ in various ways, and I’m excited to explore them at some point, but as you can imagine with the Covid posts and the Omicron variant on top of having a job and kids, I’m currently super busy.
But seriously, that whole section makes it very clear I have lots of conflicts of interest, so please don’t take me as an objective source here and draw your own conclusions.
Conclusion
This got even longer than I expected, despite a large number of places I was tempted to say far more than I did, and a lot of places I gestured at stuff rather than finding ways to properly explain it. It was definitely a case of writing a longer letter because I didn’t have time to write a shorter one, and/or it was about ten posts combined into one.
Hopefully it was illustrative of my perspective on things, and on the things themselves, in ways that were helpful. I know this wasn’t an ideal way to present all this information despite it being important, in a similar way to Eliezer Yudkowsky'’s recent writings not being the ideal way to present their information despite being important, with similar reasons likely being behind both decisions.
I can easily see this generating some mix of a ton of useful discussions and ideas that are great, and a lot of nasty demon threads, and also a bunch of stuff that should be distinct response articles or have its own discussion sections. I encourage use of ‘header comments’ to organize thoughts and topics, and spawning off distinct posts if that seems strategically like the right thing.
Also, there will inevitably be four different copies of this post, if not more - My Substack and Wordpress copies, the LessWrong copy, and then EA Forum. I apologize in advance for the inevitable lack of engagement in some or all of those places, depending on how it goes.
To be honest, I'm relieved that you and the other recommenders all found too few good places to put money for the ultra-long-term. Almost all such charities I've seen smell. It's a daunting scale to optimize for, and concrete proposals seem few and far between. There's no way to show that 'raising awareness' and 'investigating possibilities' and whatnot are actually productive in any way, and so I generally assume they're not. On the other hand, most of the organizations you mention by name are new to me, so I'm clearly uninformed about a lot here.
The one particularly that interested me was ALLFED though. Preparation for life after apocalypse seems like an underexplored X-risk approach; it allows for your research time to be helpful in mitigating many different X-risks as well as being helpful in major non-existential crises (like nuclear war or an out-sized solar flare burning out much of what's connected to electric lines.) I'm looking forward to doing some research on them.
In the spirit of this sort of thing: Is there any meaningful grant-giving for red teaming, currently?
Concretely: If the authors of "Universal and Transferable Adversarial Attacks on Aligned Language Models" were inclined to pioneer further attacks on LLMs, as a method of demonstrating that they are fundamentally unsafe in as many situations as possible, to whom would they apply for grants, and would those grants be likely to issue? Let us assume, for the sake of argument, that the authors will in the alternative of getting such a grant do something completely different or non-technical.
One expects that (of course) someone could make more money in private industry than they could from grants. To have an incentive to go to private industry instead of producing useful research is different from having no financial incentive for producing useful research.