Note: this is the first of several posts discussing the 2020 Meta Election research partnership and the resulting papers, in advance of a keynote panel at IC2S2 on the topic next Thursday July 18.
The 2020 Election research partnership between Meta and a team of seventeen independent academics is the most ambitious social science of social media to date. There have been five papers published to date — but the fact that each of these is simply “a paper,” from the perspective of a lit review or the intuitive sense in which academics think about evidence, does them a disservice. Social media platforms have their own “physics,” defined by their infrastructural affordances. To an extent that is impossible in the physical world, these studies experimentally manipulated gravity.
My aesthetic appreciation for these experiments cannot be overstated. They are simply beautiful. Thinking as I sometimes do of social science experiments as performance art, the craft and the vision on display is deeply satisfying.
This is also an important innovation from the perspective of the “industrial organization” of science. Years ago, I wrote about the Theory of the Academic Firm, how the rigor demanded to answer certain questions is simply incommensurate with what a lone academic or small team of collaborators can accomplish. These papers, with co-author lists that start to look more what you see in high-energy physics than in political theory, are a necessary step in this direction.
The idea of an “Academic Firm” is somewhat jarring in this case — the “firm” is composed of independent academics and Meta employees. You can tell they’re independent because the term “independent” appears 18 times in the Meta press release.
I’m teasing — I agree with the assessment by (independent!) observer Mike Wagner:
the team conducted rigorous, carefully checked, transparent, ethical, and path-breaking studies
but also that
Though the work is trustworthy, I argue that the project is not a model for future industry academy collaborations. The collaboration resulted in independent research, but it was independence by permission from Meta.
There are of course ~~ethical considerations~~ that come out of all this. The relevant question to me is not whether this work was Ethically Bad vs. Ethically Acceptable (and thus whether it should be banned or allowed) — it’s at the other margin. That is, is this work Ethically Acceptable or Ethically Obligatory? Should it be legally mandated?
But that’s a later post! We’re here to talk about scientific evidence. The most important dimension in evaluating the papers produced by this collaboration is “What did we learn?”
In the narrowest sense, what we learned was numbers. This gigantic apparatus was spun up, millions of dollars spent, thousands of hours by some of the best researchers in the field — and the answer is a number (or, really, a range of numbers). For example, “The point estimate for the effect of Facebook deactivation on Trump vote is a reduction of 0.026 units (P = 0:015, Q = 0:076, 95% CI bounds = −0:046, −0:005).”
Even more specifically, we didn’t learn those numbers at all — we simply learned whether the range of numbers intersects 0, once doing a statistical adjustment: “This effect falls just short of our preregistered significance threshold of Q < 0:05.”
We learned a lot of those numbers and the relevant comparisons with zero. Including all of the analyses of heterogeneity in all of the appendices, there are hundreds of such comparisons.
The meta-scientific question is: so what?
My first meta-science paper argues that “I no longer believe that one-off field experiments are as valuable a tool for studying online behavior as I once did. The problem is that the knowledge they generate decays too rapidly because the object of study changes too rapidly.”
How do we generalize our findings from one place to another, from the past to the future? How do we ensure that this generalization is done at the same level of rigor of the original study? If we cannot, the rigor of the original study is simply performative — beautiful, I maintain, but not scientific.
The traditional approach has been replication — we know we can transport knowledge if other people can replicate the study and produce the same result. As the scope of rigorous social science has expanded, this has become obviously impossible: we can’t “replicate” the Vietnam War Draft Lottery to check the generalizability of the studies of the effects of being randomly drafted. And neither can we “replicate” the Meta 2020 research collaboration.
For one thing, institutionally, it’s not going to happen. The stars aligned for this one to work, and the environment at Meta (and other tech companies) is dramatically different than it was when this project kicked off in 2018. The low interest rates are gone; Trump isn’t President; Zuck is cool now.
More fundamentally, it’s ontologically impossible to replicate these experiments. Facebook and Instagram are already dramatically different than they were in 2020. Notably, the experiment in which some subjects had their algorithmic recommendation feeds replaced by old-school chronological feeds was conducted before the dramatic shift to a heavily algorithmic Instagram feed in the summer of 2021 in response to pressure from TikTok. Concerns about Instagram’s feed have only really arisen in recent years; this study, conducted in 2020 but published in 2023, is already historical.
So if we can’t replicate these studies, how do we rigorously apply their results? External validity/generalizability is the most important question in social science methodology — especially when studying something that changes rapidly.
These papers do not address this problem except to specifically punt on it. These hedges are the least beautiful part of these papers. They’re so inane that they’re obviously pro forma. And they all boil down to saying “This study doesn’t tell you what you actually want to know.” To which my question is, again: what does it tell us?
our design cannot speak to “general equilibrium” effects, because doing so would imply making inferences about societal impact
We can’t make rigorous inferences about societal impact — so what are we doing?
replications in other countries with different political systems and information environments will be essential to determine how these results generalize
Absent the impossible replications, then, we can’t know how these results generalize.
Our results may also have been different if this study were not run during a
polarized election campaign when political conversations were occurring at relatively higher frequencies, or if a different content-ranking system were used as an alternative to the status quo feed-ranking algorithms
This one is my favorite: “our results might have been different if we had run a different experiment.”
readers should be cautious about generalizing beyond our specific sample and time period…we do think our results can inform readers’ priors about the potential effects of social media in the final weeks of high-profile national elections
This one is the most informative by exposing an incoherent philosophy of statistics. The statistics implemented in the study — the same deactivation study that compared the effects of deactivating Facebook on self-reported Trump vote to the pre-registered Q-threshold discussed above — are flamboyantly frequentist. This philosophical paradigm has not yet solved the problem of generalizability, nor will it, absent a solution to the problem of induction. So the punt is to Bayesianism: reading this study should “change your priors.”
That’s fine, as far as it goes. Some form of Bayesianism is unavoidable. In fact, I think we should take it dramatically more seriously. The first step is arithmetic: the change in our priors is equal to our priors before reading the study minus our priors after reading the study.
So: what exactly are those two terms? Here again we find a failure of rigor. There’s a massive, even majestic amount of care and rigor put into the production of these papers and the statistical infrastructure to evaluate them — and essentially zero attention paid to the question of how humans synthesize the information contained in those papers.
This is the best avenue to begin rigorously answering the question at issue: what did we learn?
I want to call attention to the final sentence in this paper, one of the only parts to which I strongly object:
Notwithstanding its limitations, we believe that this study can usefully inform and constrain the discussion of the effects of social media on American democracy.
It’s one thing to gesture towards the vague idea of people changing their minds after reading this paper. It’s something else entirely to say that the study can “constrain the discussion” of what social media does.
What exactly were people saying before that they can’t say now? And how would the answer change if the effect of Trump vote had been less than Q < 0.05?
The thing about priors is that they are personal. So, the conclusion is something like "if you are open to the idea that social media influences politics, but not committed either way, this evidence should change your beliefs in the direct of believing this idea"
It's the absence of an impersonal, and therefore pseudo-objective, measure that accounts for the persistence of the classical hypothesis testing framework, even though it's mostly voodoo.