1: You seem to assume that you can accurately tell when survey evidence is misleading. Perhaps that falls under the heading of "well-handled" but identifying whether the survey evidence is misleading seems to be terribly difficult. Sure, if you can tell it is misleading based on verifying against evidence you are more certain of that tells you something (ala #4 in your list) but how do you get there? If information has a very high possibility of being misleading, but you can't be sure, probably the answer is to not update on it, no? (I think that is Adam's point, but I wouldn't want to presume.)
2: I think your discussion under-weights the selection effects of Twitter. If your questions are along the lines of "what do my followers and the followers of their retweets think" then you are maybe fine, although you still have all the caveats of your sample being people who respond to surveys, people who check Twitter enough to notice this and respond before the survey closes, etc. If your questions are "What do humans in general in my country think" nope, right out. Even things like "What do humans who care about this topic and are on Twitter think?" won't be well sampled, because the poll still has to reach them for them to answer. Even beyond problems of adjusting for sample bias (like having 75% women, or whatever, that you can't adjust for because you don't have all the demographics) you still run into the "survey at a shopping mall" effect of the people even exposed to the possibility of taking your survey are a fraction of the possible group you actually care about, but your brain wants to lie to you about that and over generalize.
3: That brings up the next problem: you are always trying to persuade someone with evidence, and that someone is yourself. Evidence that could be misleading but you aren't sure of, the non-transparent non-transparencies as David Levy puts it, is the most dangerous. A transparent non-transparency (a known lie) you can work around, but if you are not sure it is a lie the information is extra dangerous because you will always want to interpret it in a way that supports what you want it to. In that case it might be best to assign it a very, very small update amount, and possibly zero, or even slightly negative just to be sure. After all, if the evidence might be misleading, and you don't know exactly how likely that is, there is a good possibility that the proper updating direction is in the opposite direction of the evidence/brain interaction.
Just some thoughts on that. I dabbled in survey design and that kind of work in my graduate student years, and... whew... it wasn't a field I wanted to get much involved in afterwards. The difficulties of getting information that doesn't actively mislead is only surpassed by the difficulty in telling whether it is misleading or not, and whether misleading about magnitude, direction, or both.
Also a (very light) dabbler in survey design, and I agree. There’s sophisticated work on Bayesian methods for interpreting biased surveys, eg by Andrew Gelman. In a situation where the evidence-generating model is clearly pretty complex (and involves many unknowns, eg respondent demographics), I don’t see a lot of value in saying “Bayes” without proposing any details of an actual model. The ratio of philosophizing to object-level math is disappointingly high in this post.
Sure. I would love to do a deep dive on advanced statistical techniques some time, which is the kind of thing that falls under 'put together a team that would allow me to do things like that in reasonable time.' Secret of blogging is often to know what takes how long.
Fair enough. I like when your writing gives me motivation to “shut up and do the math” in situations where I’d ordinarily not bother. Here I think your example goes in the opposite direction.
I think the issue here is that we tend to overestimate how good we are at handling misleading information. We also tend to overestimate how good we are at estimating how good we are, and to overestimate how good we are at estimating how good we are at estimating how good...
You mention legal rules of evidence, and I think that's a good example: Two classes of information are banned from being considered in court cases - hearsay, and proof of past bad acts. So you can't ask the desk lady "did the mailman say he saw the defendant enter the building?" And you can't ask the defendant "Isn't it true that you've been convicted of other robberies in the past?"
These are both *obviously* evidence. I don't even think they're "weak" evidence or "bad" evidence. If a mailman says he saw the defendant enter the building, she probably did. And if the defendant has repeatedly committed robberies in the past, the idea that she committed these robberies as well goes way way up.
The issue is that they're evidence that tends to bounce off of existing biases. If I hear someone is a thief, as much as I might try to say "okay but let's keep Bayes' Rule in mind" when I look at the defendant, I'm likely to see a thief. And if I get an uncritical statement that a witness "saw the defendant" I don't think about things like "can this guy pick the defendant out of a crowd of people," or "was the defendant looking through a dirty screen and a leafy tree at dudes 200 feet away with no glasses?" Which is why you can often remedy hearsay issues by just calling the dude in for cross-examination.
Twitter polls tend to cause the same issue - they aren't even "weak" or "bad" evidence, they're just "evidence you're likely to over-interpret no matter how hard you try not to."
More broadly this is probably my major issue with the rationalist project - the idea that if we just try really really hard we can purge biases and make better decisions than people who rely on tradition, authority, and general social values. I align more closely with people who try to do that than those who don't try at all, but it's always a balancing act, and one you can't perfect while living in your own brain.
Thanks for taking the time to dissect this thread in so much detail. I think it's made some fault lines a bit clearer for me.
> the important distinction between evidence, as in Bayesian evidence or a reason one might change one’s mind or one’s probabilities, and the rules of evidence in a given format of debate or discourse
One thought on this point; the rules of evidence threshold for "admissible in discourse" is probably influenced by the fact that actual statistical competency is quite uncommon. So in some sense, most people operate on heuristics which are not precise enough to admin any sort of small update like you are proposing, since that sort of world modeling requires an understanding of Bayes etc. I suppose this is another argument in support of the Rationalist project, or just teaching Statistics instead of Calculus as the first "difficult mathematics" in school. I'm not making any claims about your audience here; that's a very selected group. But the general rules of discourse bleed through from the general population.
> All evidence is evidence. All evidence is net useful if well-handled.
This is one of those things that seems obviously true to me, and yet I often encounter those who think it's obviously false.
> Those who deny this are likely epistemically hostile and/or operate in a highly hostile epistemic environment. Treat accordingly.
I think it's worth exploring conditions where "well-handled" does not obtain; one case in which I got a lot of pushback for the "more information must be useful" position was in a discussion on medical imaging, where the concept of iatrogenics comes into play; basically much of the medical system seems to operate as what might meet your definition of a "hostile epistemic environment". This is due to a dynamic where specialists optimize for, say, percentage of successful cancer treatments, even if the successful treatment of a minor cancer might lead to more disutility on net (e.g. if the risk of the operation was higher than the risk of leaving the cancer untreated). Basically there is no "optimize for QALYs" in place in the US and so in some cases information can trigger harmful actions.
I wonder if there are other environments that are non-maliciously "epistemically hostile" and whether it's interesting to enumerate those further.
> Mostly compare information from hostile or biased sources to expectations.
I think this is an insightful point I've heard you say before, but this concise framing is helpful.
This comment is untimely, I know - but I want to revisit "prejudicial" briefly, because I think the legal analogy has useful content here very much in line with your analysis. TL;DR: there are two senses of "evidence" at play here, "probative value" and "admissibility", and Twitter polls might be considered to have the former but not the latter. (Obligatorily: I'm not a lawyer, I just watch cases sometimes.)
A court, considering whether to admit X (a fact, document, testimony, or whatever) into evidence - meaning the set of things the jury is provided with - considers several factors. Among those factors is the "probative value" of X; how much does it contribute to a rational estimation of the likelihood of the fact under investigation? For the reasons you detail when saying "everything is evidence", just about everything has *some* probative value. The court will typically set a threshold of probative value for exactly what can be shown to the jury, but I believe this is typically a very low bar to clear. If X can conceivably bear on the case, why not let it in?
To answer that rhetorical question, one reason you might *not* allow X into evidence is that "prejudicial" element. X may be likely to "prejudice" the jury, changing their minds for a bad reason, based on unsound or invidious bias (two different categories). For example, information on past crimes of a defendant ("propensity evidence"), or the odd sexual proclivities of a witness, or the results of a polygraph test, typically involve significant risk of introducing prejudice and muddying the waters more than they clarify them. So despite just about everything having probative value (past offenders are in fact far more likely to commit more offenses!), a court will set limits on *admissible* evidence to avoid prejudicing the jury. (Prejudicial value could be considered to "introduce more heat than light" in rationalist parlance, though it's not a perfect correlate.)
To revisit polygraph tests for a moment, since I think they're the best parallel to Twitter polls: though not every U.S. state bans them outright, the general norm is not to admit them into evidence in criminal cases. They have lower reliability than the commonly-applied "lie detector" label implies, meaning a jury is likely to be easily misled about the proper weight to give a polygraph result. A polygraph still has *some* probative value, of course - but the apparent reliability outweighing the actual reliability is typically enough, by itself, for a judge to consider polygraph evidence prejudicial to the jury in a criminal proceeding.
I could see a reframing of Adam Gurri's argument in these terms. Twitter polls are "evidence", in the weak sense of "Twitter polls have probative value". Just about everything is "evidence" in that weak sense. But maybe Twitter polls shouldn't be *admissible* evidence to reputable discussions. They could be prejudicial, due to the strong temptation to lazily treat them as if they were representative - when of course they're not. The *apparent* probative value outweights the *actual* probative value, so they shouldn't be considered "admissible evidence" in discussions with reasonable evidential rules.
Really? A network designed to have unbounded connectivity is the same thing as one designed to have very small connectivity?
The only problem with final takeaway (3) is I don’t have the resources to move to the Moon.
I really want a shirt that says "Bro do you even Bayes?"
I notice a few things:
1: You seem to assume that you can accurately tell when survey evidence is misleading. Perhaps that falls under the heading of "well-handled" but identifying whether the survey evidence is misleading seems to be terribly difficult. Sure, if you can tell it is misleading based on verifying against evidence you are more certain of that tells you something (ala #4 in your list) but how do you get there? If information has a very high possibility of being misleading, but you can't be sure, probably the answer is to not update on it, no? (I think that is Adam's point, but I wouldn't want to presume.)
2: I think your discussion under-weights the selection effects of Twitter. If your questions are along the lines of "what do my followers and the followers of their retweets think" then you are maybe fine, although you still have all the caveats of your sample being people who respond to surveys, people who check Twitter enough to notice this and respond before the survey closes, etc. If your questions are "What do humans in general in my country think" nope, right out. Even things like "What do humans who care about this topic and are on Twitter think?" won't be well sampled, because the poll still has to reach them for them to answer. Even beyond problems of adjusting for sample bias (like having 75% women, or whatever, that you can't adjust for because you don't have all the demographics) you still run into the "survey at a shopping mall" effect of the people even exposed to the possibility of taking your survey are a fraction of the possible group you actually care about, but your brain wants to lie to you about that and over generalize.
3: That brings up the next problem: you are always trying to persuade someone with evidence, and that someone is yourself. Evidence that could be misleading but you aren't sure of, the non-transparent non-transparencies as David Levy puts it, is the most dangerous. A transparent non-transparency (a known lie) you can work around, but if you are not sure it is a lie the information is extra dangerous because you will always want to interpret it in a way that supports what you want it to. In that case it might be best to assign it a very, very small update amount, and possibly zero, or even slightly negative just to be sure. After all, if the evidence might be misleading, and you don't know exactly how likely that is, there is a good possibility that the proper updating direction is in the opposite direction of the evidence/brain interaction.
Just some thoughts on that. I dabbled in survey design and that kind of work in my graduate student years, and... whew... it wasn't a field I wanted to get much involved in afterwards. The difficulties of getting information that doesn't actively mislead is only surpassed by the difficulty in telling whether it is misleading or not, and whether misleading about magnitude, direction, or both.
Also a (very light) dabbler in survey design, and I agree. There’s sophisticated work on Bayesian methods for interpreting biased surveys, eg by Andrew Gelman. In a situation where the evidence-generating model is clearly pretty complex (and involves many unknowns, eg respondent demographics), I don’t see a lot of value in saying “Bayes” without proposing any details of an actual model. The ratio of philosophizing to object-level math is disappointingly high in this post.
Sure. I would love to do a deep dive on advanced statistical techniques some time, which is the kind of thing that falls under 'put together a team that would allow me to do things like that in reasonable time.' Secret of blogging is often to know what takes how long.
Fair enough. I like when your writing gives me motivation to “shut up and do the math” in situations where I’d ordinarily not bother. Here I think your example goes in the opposite direction.
Thanks for the piece -- still musing on it
"Everything is evidence" -- striking, hyperbolic but the heart of your claim maybe
-- my first thought was this is an angel's rationality, unbounded by time or cognitive load
-- for us mortals, often there is noise
-- is there some signal in the noise, pointing? the fact of the noise itself? a little nudge in a few directions? when is disentangling warranted?
-- I am sympathetic to things not requiring a certain sort of purity to be useful
-- it is easy to be intoxicated by purity
I think the issue here is that we tend to overestimate how good we are at handling misleading information. We also tend to overestimate how good we are at estimating how good we are, and to overestimate how good we are at estimating how good we are at estimating how good...
You mention legal rules of evidence, and I think that's a good example: Two classes of information are banned from being considered in court cases - hearsay, and proof of past bad acts. So you can't ask the desk lady "did the mailman say he saw the defendant enter the building?" And you can't ask the defendant "Isn't it true that you've been convicted of other robberies in the past?"
These are both *obviously* evidence. I don't even think they're "weak" evidence or "bad" evidence. If a mailman says he saw the defendant enter the building, she probably did. And if the defendant has repeatedly committed robberies in the past, the idea that she committed these robberies as well goes way way up.
The issue is that they're evidence that tends to bounce off of existing biases. If I hear someone is a thief, as much as I might try to say "okay but let's keep Bayes' Rule in mind" when I look at the defendant, I'm likely to see a thief. And if I get an uncritical statement that a witness "saw the defendant" I don't think about things like "can this guy pick the defendant out of a crowd of people," or "was the defendant looking through a dirty screen and a leafy tree at dudes 200 feet away with no glasses?" Which is why you can often remedy hearsay issues by just calling the dude in for cross-examination.
Twitter polls tend to cause the same issue - they aren't even "weak" or "bad" evidence, they're just "evidence you're likely to over-interpret no matter how hard you try not to."
More broadly this is probably my major issue with the rationalist project - the idea that if we just try really really hard we can purge biases and make better decisions than people who rely on tradition, authority, and general social values. I align more closely with people who try to do that than those who don't try at all, but it's always a balancing act, and one you can't perfect while living in your own brain.
Thanks for taking the time to dissect this thread in so much detail. I think it's made some fault lines a bit clearer for me.
> the important distinction between evidence, as in Bayesian evidence or a reason one might change one’s mind or one’s probabilities, and the rules of evidence in a given format of debate or discourse
One thought on this point; the rules of evidence threshold for "admissible in discourse" is probably influenced by the fact that actual statistical competency is quite uncommon. So in some sense, most people operate on heuristics which are not precise enough to admin any sort of small update like you are proposing, since that sort of world modeling requires an understanding of Bayes etc. I suppose this is another argument in support of the Rationalist project, or just teaching Statistics instead of Calculus as the first "difficult mathematics" in school. I'm not making any claims about your audience here; that's a very selected group. But the general rules of discourse bleed through from the general population.
> All evidence is evidence. All evidence is net useful if well-handled.
This is one of those things that seems obviously true to me, and yet I often encounter those who think it's obviously false.
> Those who deny this are likely epistemically hostile and/or operate in a highly hostile epistemic environment. Treat accordingly.
I think it's worth exploring conditions where "well-handled" does not obtain; one case in which I got a lot of pushback for the "more information must be useful" position was in a discussion on medical imaging, where the concept of iatrogenics comes into play; basically much of the medical system seems to operate as what might meet your definition of a "hostile epistemic environment". This is due to a dynamic where specialists optimize for, say, percentage of successful cancer treatments, even if the successful treatment of a minor cancer might lead to more disutility on net (e.g. if the risk of the operation was higher than the risk of leaving the cancer untreated). Basically there is no "optimize for QALYs" in place in the US and so in some cases information can trigger harmful actions.
I wonder if there are other environments that are non-maliciously "epistemically hostile" and whether it's interesting to enumerate those further.
> Mostly compare information from hostile or biased sources to expectations.
I think this is an insightful point I've heard you say before, but this concise framing is helpful.
I knew it was over for Adam when Agnes literally called in Aristotle
Just to state the obvious: Twitter polls only poll people on twitter. I'm not on twitter, I don't know anyone on twitter.
This comment is untimely, I know - but I want to revisit "prejudicial" briefly, because I think the legal analogy has useful content here very much in line with your analysis. TL;DR: there are two senses of "evidence" at play here, "probative value" and "admissibility", and Twitter polls might be considered to have the former but not the latter. (Obligatorily: I'm not a lawyer, I just watch cases sometimes.)
A court, considering whether to admit X (a fact, document, testimony, or whatever) into evidence - meaning the set of things the jury is provided with - considers several factors. Among those factors is the "probative value" of X; how much does it contribute to a rational estimation of the likelihood of the fact under investigation? For the reasons you detail when saying "everything is evidence", just about everything has *some* probative value. The court will typically set a threshold of probative value for exactly what can be shown to the jury, but I believe this is typically a very low bar to clear. If X can conceivably bear on the case, why not let it in?
To answer that rhetorical question, one reason you might *not* allow X into evidence is that "prejudicial" element. X may be likely to "prejudice" the jury, changing their minds for a bad reason, based on unsound or invidious bias (two different categories). For example, information on past crimes of a defendant ("propensity evidence"), or the odd sexual proclivities of a witness, or the results of a polygraph test, typically involve significant risk of introducing prejudice and muddying the waters more than they clarify them. So despite just about everything having probative value (past offenders are in fact far more likely to commit more offenses!), a court will set limits on *admissible* evidence to avoid prejudicing the jury. (Prejudicial value could be considered to "introduce more heat than light" in rationalist parlance, though it's not a perfect correlate.)
To revisit polygraph tests for a moment, since I think they're the best parallel to Twitter polls: though not every U.S. state bans them outright, the general norm is not to admit them into evidence in criminal cases. They have lower reliability than the commonly-applied "lie detector" label implies, meaning a jury is likely to be easily misled about the proper weight to give a polygraph result. A polygraph still has *some* probative value, of course - but the apparent reliability outweighing the actual reliability is typically enough, by itself, for a judge to consider polygraph evidence prejudicial to the jury in a criminal proceeding.
I could see a reframing of Adam Gurri's argument in these terms. Twitter polls are "evidence", in the weak sense of "Twitter polls have probative value". Just about everything is "evidence" in that weak sense. But maybe Twitter polls shouldn't be *admissible* evidence to reputable discussions. They could be prejudicial, due to the strong temptation to lazily treat them as if they were representative - when of course they're not. The *apparent* probative value outweights the *actual* probative value, so they shouldn't be considered "admissible evidence" in discussions with reasonable evidential rules.
Is this parallel perhaps clarifying?