My Covid-19 Thinking: 4/17
Epistemic Status: Even farther out of my supposed expertise than previous Covid-19 posts.
A little over a week ago I started writing up my general thinking about Covid-19. I then ended up pulling out from that post first The One Mistake Rule and then Seemingly Popular Covid-19 Model is Obvious Nonsense. I did not get back to finishing the post, and it's now been a week.
In Covid-19 terms, a week is a long time. Rather than continue where I left off, I'll restart and change styles. The goal here is to get thoughts down on paper and share them, in hopes they are useful, and not lose too much more to the huge speed premium these days. Can always follow up later.
This is built upon the same type of thinking as On R0 and Taking Initial Viral Load Seriously. Its conclusions are broadly consistent with On R0.
If anything, the past week has been surprisingly unsurprising. I would characterize what we have seen this past week as mostly the good news we would have expected given there was going to be good news, plus a large reporting bump around Easter.
All the numbers are terrible and unreliable and flirt with being meaningless. But the numbers also mostly fit together at this point into a coherent story of what is happening.
This post is an attempt to lay out what I think is happening, and likely to happen, and explain the thinking behind that. I've laid it out in conversations with many people I trust, and incorporated their models and feedback. I mostly seem to be able to convince these people that my picture is broadly reasonable.
This is not intended as a proof of any kind, nor am I going to attempt to provide all the sources that shaped my conclusions - better to get the information out there than to let perfect be the enemy of the good. Early on I did the opposite, kept quiet, and that was clearly a mistake.
How I Read Data at This Point
Different people have different views on what data is accurate, what data is meaningful, and what data is useless.
My starting assumption is that essentially no one is outright lying or putting down imaginary numbers. When you see a number it is the product of a process that observes something in the physical world and reports back. That does not mean that the number means what it says it means.
We know that the number of positive test results, on its own, not very useful once you have other data. Positive test numbers, on their own, primarily tell you how many tests are being run.
Number of tests being run tells you how much testing we're doing, which is worth knowing. More would be better.
The percentage of tests that are positive is highly useful. I consider this to often be the best tool available.
Knowing how to interpret that number depends, of course, on the method of choosing who gets tested. The hope is that this method is roughly constant over time, except for variation in how many tests are available and how many people are how sick. Across different locations this is not a good assumption, but within a single location it seems like it probably mostly is a good assumption. Even across locations, it's almost always going to mostly be some form of first testing health care workers and others with priority (which is a roughly similar group in most places), then people who need the test to determine hospital treatment, then others.
The tests being used are mostly only sensitive during a short window. If you take the test too early or too late, you'll come back negative. There's also a substantial pure false negative rate, but a very small false positive rate. I'm building in something like a 25% false negative rate for actually positive-right-now patients.
Effectively, there seems to be a cap at or slightly above 50% on what percentage of tests will ever be positive. Which makes sense - if you could pick out a 90%+ positive sample without doing any tests, you almost don't need to run the tests, and there are substantial numbers of false negatives or people who take the test outside their response window.
What that means to me is that, once we get to 50% or so positive rate, the numbers stop meaning much. Tests tell us about individual patients but they don't tell us about how many people are infected beyond telling us that number is above a threshold that is a function of our testing rate. Until we drop down again, that's all we know.
It's not clear exactly how lagged the positive test data is, since lab delays and delays before getting a test at all vary a lot, and there's even dispute over the time it takes to show symptoms. The delay seems clearly at least one week, and might be as high as two weeks.
Then we have death data. The death data can be assumed on average to lag infections by roughly three weeks according to what I have seen, with a lot of noise attached.
The bigger problem is that death data mostly only counts confirmed positive cases that then die. In practice, this means that if you die in a hospital you are counted, if you die in a nursing home you might be counted, and if you die in your house you probably won't be counted.
Thus, if hospitals are not above capacity, death data is good but lagged, although you need to adjust it for the health of the infected population, the quality of care and testing, and so on, if you want to backtrack to infections.
Even in the best of places, this will be an undercount - this effect is much, much bigger than any unjustified ascribing of Covid-19 deaths to people who die of other things while also having Covid-19.
However, if hospitals are above capacity, the death data stops meaning anything beyond establishing a minimum baseline. This happened in Iran, Italy and Spain, and in New York City. If more people died, the death counts that were reported would not change much if at all, unless it caused other things to change. Italy's undercount of deaths seems clearly massive. New York City claims theirs is at least 40% or so. Given the state of New York's medical care I'm guessing that is if anything low.
This means that we enter a kind of dead zone once things get sufficiently bad. First we lose our test data, then we lose our death data and are left blind. We only learn where we are when numbers drop substantially from where they peaked. Then we can attempt to figure out where we were prior to that.
We can also use our knowledge of what physical behaviors people were doing at various times, create a model of what that implies in terms of doubling times given a serial interval of roughly 5 days (see On R0). That especially gets combined with observed changes in the death and positive test rates at various points in time.
Here I include a much faster than generally assumed drop-off in R0 based on the percentage already infected. Who gets infected is very non-random. The amount of exposure each person gets seems best approximated by a power law. The amount you are exposed to infection by others, and the amount you expose others to infection, are very highly correlated. To make conservative (to me) statements, I would be surprised if the first 10% infected reduced the forward-looking infection rate much less than 30%. I would be very surprised if the first 50% didn't cut infection rates by >75%.
Lock down conditions probably make these contrasts even stronger, since a lot of people's exposure profile drops close to zero out of household, while others are forced to or choose to accept a lot of exposure. Lock down conditions also seem like they have to, as discussed in On R0, get rid of the vast majority of the vast majority of exposure sources, especially once there's been enough time to work through secondary household infections from pre-lock down infections.
The infection fatality rate, and the asymptomatic infection rate, are also importantly unknown even under a fixed set of conditions (patients with a given age distribution and level of co-morbidity, and a known level of care). It is another set of variables that one needs to vary as needed in order to make everything make sense - we have a bunch of equations that have to hold.
I have been using a soft upper bound on the IFR of 1% in cases where the hospitals have not fully collapsed. My guess is that it's lower than that, and my best guess is more like 0.3%-0.5%, but that is based on a ton of different things pulling in different directions.
Combine it all, find the versions that seem non-absurd, and that's the range of plausible situations.
Where Is The Data?
I have decided to use the data from Covid Tracking Project. They seem to have good data in nice form and reliably update. It's important to be consistent and this allows that.
I look at positive test numbers, negative test numbers and deaths, and then do basic math on them. I gather the data into this spreadsheet for easy viewing and calculations. Here is a link. You can look but not edit.
There are often obvious errors or random things in the data. Mostly I just leave them there, but am starting to try and note it and do what I can when I see them, such as when New Jersey spent a day only reporting positive tests on 4/14 (or they forgot to copy the number into the spreadsheet), or when California reported tons of negative tests all at once on 4/3.
I've decided to break out "New York State plus New Jersey" as the epicenter that is distinct from the rest of the country. New Jersey's situation is quite bad, and it makes sense to me to treat it as effectively a part of New York here. This leads to my numbers being slightly different than the numbers you'll see discussed elsewhere, since almost no one else is doing the same exact calculation.
How Are Things Going in New York and New Jersey?
If we smooth out 3/28 with 3/29, NY/NJ hit the test positive rate maximum on 3/28. If not, we hit on 3/30. From there until 4/6, rate was steadily between 46% and 50.9% each day. One could argue the decline starts on 4/2, if we want to look at small changes. Starting on 4/7, positive rate has been under 46% each day. Starting on 4/12 it has been under 44% each day. On 4/14 and 4/16 it was under 40% for the first time in weeks. No matter how the numbers are plotted, it is clear the news is good.
On 4/1 I made my guess that that day was the peak in marginal infections in New York City. That now seems like the late end of a several day range of possible peaks.
That is very consistent with New York fully locking down on 3/22, and the peak coming several days later once secondary household infections had time to play out.
Deaths lag infections by three weeks. NY/NJ death counts have been almost entirely flat from 4/7 until 4/16 when this was written. That corresponds to infection times of about 3/19 to 3/28.
The earliest plausible time for new infections to have peaked was 3/22 itself, for obvious reasons. That would correspond to a six day delay in test reporting. In the three days before that it looks like test positive results went from ~30% to ~50%, and New York doubling times before lock down in general look like they were well under three days.
Thus, the 'good' scenario for New York would be that there was one final doubling after 3/19 (e.g. after the death rate hit its current observed rate, which maxes out the hospitals). Then New York started declining, but has not yet dropped a full 50% back to 3/19 as of at least 4/10 or so, since more tests yielded a higher positive rate through the 4/16 data.
If we take the death peak as roughly the same as the number of deaths that would have resulted from good medical care, and state that as a 1% IFR, that seems like a lower bound on infections at the time. On April 9 three weeks later, deaths in New York State were counted at 7,067. Thus we would get 700,000 infections in NYS on March 19, 1.4 million on March 22. If we get steady decline in new infection rate from there, by 4/17 we still have 26 additional days at more than half that rate, so at least that many infections every 5 days, or 5 more times, for 4.9 million infected out of 19 million residents, or a roughly 25% statewide infection rate. You can lower that a bit, or at the upper end you can go up to and including hitting saturation (50%+) without making any assumptions that seem impossible. New York City would then be substantially higher the state, and you adjust for the undercount and for your preferred IFR.
If we used today's official death state count of 12,192, assume 1% IFR (if the official death counts are close to right, it means hospitals were barely ever overwhlemed) and back out 21 days, then ask about plausible rate of infection from there given Cuomo's official estimate of R0=0.9, and serial interval of 5. Again, this is us living in 'official world.' So we have four serial intervals, we can assume previously half of people got infected last interval (600k) and then we can generously drop that 10% each time. That should give us a hard lower bound (short of rejecting that IFR). So we get roughly 2.4 million infected, or 12%, plus whatever you think the undercount is on death totals.
Zucker, in a Cuomo briefing, said he thought 10%-20% infected two days ago. Something like 10%-40% for the state, 15%-60% for NYC seems like a reasonable range of possible worlds. The higher end involves a less deadly virus, but if that's true then I don't see how we avoid infection rates that look super high.
It's also worth noting that New York City is much younger than America in general (4% over 75 vs. 6% elsewhere) and generally more healthy, so we should expect a lower IFR than is typical, which means more infections for a given death rate, and it's almost half of New York State. And also that a lot of people left New York City. A lot of things push towards very high infection rates there.
At that point, things start to self correct and burn out. We should see a steady drop in R0, unless there's something pushing back against it via a different selection effect. That seems unlikely. The very fact that we didn't see a big decline yet, only a modest one, in turn points towards things having been far out of the measurable range at least briefly.
Another sanity check, the only NYC testing data point I could find that is a remotely random sample:
Between March 22 and April 4, 2020, a total of 215 pregnant women delivered infants at the New York–Presbyterian Allen Hospital and Columbia University Irving Medical Center . All the women were screened on admission for symptoms of Covid-19. Four women (1.9%) had fever or other symptoms of Covid-19 on admission, and all 4 women tested positive for SARS-CoV-2 (Figure 1). Of the 211 women without symptoms, all were afebrile on admission. Nasopharyngeal swabs were obtained from 210 of the 211 women (99.5%) who did not have symptoms of Covid-19; of these women, 29 (13.7%) were positive for SARS-CoV-2. Thus, 29 of the 33 patients who were positive for SARS-CoV-2 at admission (87.9%) had no symptoms of Covid-19 at presentation.
Presumably late stage pregnant women tend to go outside much less under normal conditions than others, and even less on a relative basis under pandemic conditions, and thus are much less exposed. The case against is perhaps they have large household size and their other kids mean they're more exposed, especially before the schools closed. And of course, they're a pretty young group, which is less susceptible, and also females get the virus less. Could in theory go either way but I have a hard time imagining women nine months pregnant not in general being far safer than the population. Also, this study was in The Bronx area, which is relatively poor, which could limit ability to stay home.
It's possible they visited doctors offices for checkups and caught it there? But I would assume they would mostly do everything to not do that during the previous few weeks, and before that it was safe to go. Some effect, I guess.
Still, we see a 13.7% positive rate on the test via swabs, which means something like 25% false negative, so at the time we get 17% of the asymptomatic women positive, 19% overall positive. There isn't that big a window to test positive, and even less of one to do so while asymptomatic. If we believe this result at all, it's really hard not to get a very, very high number for infection rates in NYC. If 19% are infected during one window, the total number now has to be more than double that, likely a lot more than double.
Given these numbers, New York and New Jersey should be able to reopen in mid-May or certainly in June. Maybe things are hovering around 0.9 now, but we're rapidly building herd immunity, so that number should drop substantially week by week. As that happens things can start returning to the new normal, if allowed to by Cuomo. I expect people to mostly keep taking strong precautions when their living conditions allow it, and I expect those whose conditions do not allow it to have the virus burn through those areas quickly.
In general, this 'stuck in limbo where we're able with all our effort to just barely win' thing doesn't really parse for me. It's possible, but it's a strange coincidence, and it's only stable with low infection rate bases. So it's more possible to get stuck in limbo as California, but I don't buy it for New York.
I expect to emotionally feel personally safe here within between one and three months, with the upper range of that being a mix of some things go slow and some failure to have appropriate emotions.
How About Everywhere Else in America?
Tons of regional differences, of course, that I'm going to mostly ignore because I'm not looking at them all that carefully.
As I explained previously, the model that says the worst is behind America and we'll top out at 60k deaths doesn't make sense. Another amusing take down has now been written of the same model, which contains gems like this:
The chief reason the IHME projections worry some experts, Etzioni said, is that “the fact that they overshot” — initially projecting up to 240,000 U.S. deaths, compared with fewer than 70,000 now — “will be used to suggest that the government response prevented an even greater catastrophe, when in fact the predictions were shaky in the first place.”
Which is a really rich thing to complain about, changing one's projections as new data comes in, as opposed to having a number and sticking with it. We are all so doomed.
A much better point they make would be this one:
Even the predictions of daily deaths “have been highly inaccurate,” said statistician Sally Cripps of the University of Sydney, who led a team that examined IHME’s up-and-down projections. “It performs poorly even when it predicts the number of next-day deaths: The true number of next-day deaths has been outside the 95% intervals 70% of the time.” If the 95% calculation correctly reflects a model’s uncertainty, then textbook statistics say the true numbers can fall outside that range no more than 5% of the time.
Your 95% confidence interval for the next day really, really should be able to get more than half of the data points. We'd like to have some standards. Ideally more like 95%, but if it's 30%, consider that maybe you have a problem.
All the models are bad. Which isn't unreasonable, modeling this is super hard and it's not like I can offer a robust one in return here. If I had more time and some help, and especially if I thought people would then listen, I would try to make all this more formal and accurate.
In any case, let's ask what's happening out there. Again, we have deaths and we have infections.
Deaths clearly had an artificial drop around Easter, which may have shifted extra deaths forward. So it's not crazy to say just from the death data that we're looking at roughly 2000/day for the last six days, or 1000/day without NY/NJ, and the trend isn't obvious and definitely isn't exponential. Seems fair as a first step before looking at infections. We certainly can't claim any sort of decline, but maybe things aren't getting worse.
Using what for now is Column X, % positive rate ex-NY ex-NJ, we see the peak was on 4/1 to 4/3. Then something weird happened with the data reporting and a whole bunch of negative tests got dumped out by California and generally we had a low tide for a few days. It's likely that these two events represent time-shifted reports in some form. Seems too early in any case, based on what was happening at the time. 4/8 to 4/13 seems mostly stable, as are the test counts. Then there's a distinct drop on 4/14-15 to 12.0% and 12.6%, but then on 4/16 it's back to 14.2%.
In a few days, we'll know whether that's a blip in a good trend or there isn't much of a trend at all. The weekend should cause a substantial update no matter what happens. Nothing much is changing out there, so whatever conditions exist are continuing to exist, at least for now.
The good scenario is that things peaked around 4/1-4/5 in terms of new infections, and they're on a slow downward trend now in most or all places. There's a lot more room here for either regional variation, or for us to by coincidence be just below 1 on R0. We could see a peak early next week at something like 2500 deaths in a day, then slow declines.
The bad scenario is that this represents various logistical failures of allocation and implementation, and places good at testing doing a lot of tests that are increasingly negative while places bad at testing do a few tests that more of which are positive, and things aren't getting better. Or at least, aren't getting better in a few key places. Potentially we outright fail even while trying, or more likely at least some regions give up.
What's the plausible range of infection rates?
Again, the USA overall has 30,209 reported deaths as of this morning (4/17), roughly 15,000 ex-NY/NJ. Again, we assume an undercount. We back track that three weeks.
At best the peak is about one doubling from then, at 3 million infections, then we maintain slightly less than that pace from there and we get something like 6 million, or 2% infected. The path down will be slower than the path up, but we could be already somewhat down from peak, so maybe we can get away with this being the halfway point in infections now on 4/17 and have the death toll around 120k outside of the epicenter? Again, as an optimistic scenario but not quite the absolute most optimistic. And of course, maybe we find (by which I mean, we confirm way too slowly and then way too slowly actually use) better treatments, in which case IFR drops and less people die. That certainly seems possible.
It's also very possible for this to be a large undercount even if IFR doesn't change. I find it plausible that a lot of places are in crisis but not testing much and missing a ton of deaths as well, and things have been building in such places for a while by now.
A 5% infection rate wouldn't be surprising to me. I would be surprised but wouldn't be shocked by a 10% overall infection rate, but of course that would imply a lower IFR. More than that seems unlikely. If it was true, then NYC would have already hit herd immunity and the numbers would be crashing, and we'd see the same in Italy and Spain, but we don't.
What Happens From Here?
I want to wrap this up so I don't lose even more to the speed premium, so I don't have time to give my full thoughts on this today. In On R0 and throughout this I should be making clear what I think the physical landscape looks like. This will all be the same picture I gave then, just more explicit/direct/compact.
Over time, people get better at reducing risk while not sacrificing too much to do so. Masks get worn and improved in type and quality, delivery services scale, people stop acting like 6.1 feet is safe and 5.9 feet will kill you (seriously, this one stings every time), companies implement temperature checks and new workplace designs, the people who aren't with the program learn the program, and so on. Testing will expand eventually, including by corporations supplying for their employees. People who are immune are placed in exposed positions increasingly with antibody testing, and even those without testing in those positions have mostly have been infected anyway at this point if they have been unable to take good precautions.
There is pushing back from people sick of distancing and who need to go work and earn a living, but they mostly do so in sensible ways. There's some risk of crazy town if things are handled badly. So far only Michigan and Ohio show signs of it I've heard about, which is likely because Ohio reacted so well that people don't get why they had to, and Michigan reacted poorly and people are mad about the implementation details being terrible.
Some areas are not set up to handle things well. In those areas, things get briefly bad, but they also burn out relatively quickly. Can't handle it translates into particular groups being unable to handle things or choosing not to, so those groups burn through infections until they're handling it.
I do think that there's a lot of slack for private agents to step up and help a lot more, but it's currently illegal to do so. When there is need of it, it will happen. Corporations and individual people have come out of all this looking really good.
Trump will continue to do his best to screw things up, and it will continue to screw things up, but I expect this to not be in ways that matter that much in the big picture. The real damage done was already done early on. He's done an admirable job of not trying to actually do things, except for some banditry and piracy involving medical equipment. Could have been a lot worse. He briefly flirted with saying he had absolute authority, but backed down as is his pattern before we started once again saying "The United States are" and I don't expect him to try again in a serious way.
We "reopen" the country in places in mid-May and others in June or July, but that doesn't mean open. It means people are free to do what they want. Schools mostly or entirely won't reopen this semester, so places that 'open' will still be largely closed slash in the new normal as we adjust. Mostly things shut down because private citizens and corporations shut them down, and they'll reopen the same way. If Cuomo "reopened" New York I doubt much of anything would actually change the next few weeks.
Likely by end of year we've found a treatment protocol that makes this all a lot less deadly. I'm guessing combining the stuff we've already found that works and tossing the parts that doesn't would do a lot.
The supply chain looks like it's straining in places, and needs deregulation of the food supply so people can pivot to residential provision faster. My guess is things get somewhat worse for a bit, then this is allowed to happen slash people find workarounds as some restaurants become de facto grocery stores. When it gets bad we'll figure it out quickly.
Eventually there's a vaccine and it all finally ends.
What would I bet on right now, as one commentator asked? I don't really want to bet on such matters, especially I don't feel good about betting on more people dying. But certainly I'd bet against the things I think are essentially can't happens, given reasonable odds. More detail is needed to know what else I would do; trading is very context dependent.
All right, that's where I'm at right now. I'd prefer to learn and expand on this with debates and conversations on phones and video chats rather than comments, so I don't plan on engaging much in the comments, but don't let that discourage you from saying things.