In the seventeen years since Twitter was founded, social scientists have published millions of papers about it. Twitter is perhaps the best-studied instance of digital media. And most everyone seems to have thought that this was a fine thing to do: we could study Twitter with the standard methods of social science, like we would American elections, or international trade.
I disagreed. There seemed to be a disconnect between how I was trained in grad school about how science works and the application of the scientific method to the study of social media. After five years grappling with this problem and thinking about how to make science work better, my peer-reviewed article Temporal validity as meta-science has just been published.
And I confess that I feel vindicated. The timing prompts a re-consideration of the status of our knowledge of Twitter, now that “Twitter” no longer exists. To what extent does out knowledge of “Twitter” generalize to the platform “X”? And more importantly, what is the scientific basis for our answer to the previous question? How much rigor can we bring to bear, here?
The “folk theorem” of how (quantitative, positivist) social science works is that a decentralized community of scholars conducts independent studies of a phenomenon, responding to each other and building on previous work but not formally coordinated, and that through this process knowledge of the phenomenon “accumulates.”
The first problem with this “folk theorem” is that in practice, this “accumulation” is useless. I’ve used variations on this metaphor before, and I thank Sean Gailmard for pointing out the original formulation:
“Science is built of facts the way a house is built of bricks: but an accumulation of facts is no more science than a pile of bricks is a house” (Henri Poincaré)
The current industrial organization of science incentivizes brickmaking far more than it does housebuilding. The scientific apparatus is insular, lacking external feedback from “the world” or even “the market”; like a command economy, we are able to ramp up production of measurable commodities, and now we have a massive pile of bricks.
The second problem with the “folk theorem,” though, is most salient for fast-moving objects of inquiry like Twitter: the knowledge doesn’t accumulate, it also decays. The world in which the knowledge was produced is different from the world of today. I argue that linguistic continuity and the technology of in-line citation have masked the true rate of change r in the phenomena of interest on social media.
Consider a hypothetical example. Assume that study XXXX (2014) and YYYY (2022) were both perfectly-executed RCTs in which a random sample of US Facebook users were paid to stop using Facebook for a month. We might then encounter the following sentences in a research article: “XXXX (2014) finds that Facebook desistance causes an increase in partisan affective polarization; however, YYYY (2022) finds the opposite, that Facebook desistance causes a decrease in partisan affective polarization. Future research is needed to adjudicate which of these is correct.”
The example is intended to be absurd: r is far too high this context, so that accumulating knowledge about the effect of “Facebook use” is meaningless. Another perspective on this problem is that “Facebook use” is a poorly-defined construct, that it bundles together too many disparate treatments and mechanisms. This is likely true, absent any efforts at construct validity (Esterling et al., 2021). But citizens and researchers alike think it is a meaningful construct, that it structures their understanding of the world. The related policy questions are certainly salient: “Is Facebook good or bad? Should I use Facebook?”
But thanks to Elon Musk, we now have an example of linguistic change. The virtual world of academic concepts and in-line citation has ruptured. Twitter research is now historical research. “Twitter,” “Facebook,” and “X” are different platforms.
Of course, the reason that “Twitter” and “X” are different platforms is not that Elon Musk changed the name. In reality, “Twitter” was never just one platform; it was constantly changing, in its userbase, technological affordances, norms of use, formal rules, and its relationship to the rest of the world. Musk’s takeover of Twitter certainly drew attention to the way in which each of those things could change in response to new management.
Absent the name change, however, the media technology of academic knowledge production did not have to (and perhaps could not) recognize these gradual changes. Only if a specific study was aimed to point out that, say, Twitter’s switch from a 140 character to 280 character limit had a specific effect was that specific change enregistered into the realm of academic knowledge.
The name change forces us to confront the way in which all of these relevant parameters have been drifting, unmeasured, across all of the studies of Twitter. And methods sections of papers will look different moving forward. Instead of saying “we accessed the Twitter REST API…”, they will say “we accessed the X API…”
Just kidding! Musk shut down academic access to the API! Of what use are the thousands of methods papers and dozens of R packages premised on this particular data source?
I don’t mean to sound gleeful; I genuinely wish that normal science worked, here, and I feel especially bad for PhD students halfway through dissertations which are now both out of date and nearly impossible to complete. But I also wish that I could dunk a basketball. The phenomena we seek to study have no obligation to fit within our preferred paradigms or research budgets.
So, what do we do? My paper does not propose a solution to the problem of induction; our knowledge remains imperfectible. But I conclude by offering a number of meta-scientific improvements to our capacities, including a greater focus on quantitative description and on the synthesis and application of knowledge in the form of prediction.
Ironically, in the interim between when the paper was written and when it was published, I have become somewhat less bullish on prediction. Long-term predictions are simply too difficult, and the mapping between a given “unit” of scientific effort and the “lift” (gain in predictive accuracy) is too difficult to pin down. Short-term prediction, about what will happen in a given study or series of studies, is still a useful disciplining of our intuitions — but this is essentially just pre-registration, already well-discussed in the meta-science literature.
Part of my hope for prediction, I now believe, was thinking that it was a free lunch — that “the market” in the form of prediction markets would do the hard work of knowledge synthesis and aggregation for us. This could work, in principle, but it definitely isn’t a free lunch. Our current experience of “the market” in our role of consumers surrounded by posted prices associated with government regulation and firm reputation disguises the massive amount of work that went into creating “the market.” The most fundamental such work is the creation of money, all of the technology and legal apparatus undergirding the financial system, producing a single dimension which lubricates the interchangeability of commodities and labor.
For prediction markets to function for knowledge aggregation, we need a lot more of this kind of work. They’re fine for predicting election outcomes — there, they piggyback on the extensive legal and technological apparatus which transforms the beliefs of millions of people into a single, definite outcome.
But trying to predict the outcome of a given scientific study reveals just how much contextual detail is required. What, exactly, is the experimental protocol? When and where is it being conducted? On what populations? How are the outcomes measured? Any predictor needs to have the same amount of contextual knowledge as the researcher for this to work.
The value of this exercise, then, has nothing to do with it being a market — there aren’t nearly enough “traders” to for any of the arguments for market efficiency to apply. No, the value is the precision. It’s a kind of adversarial pre-registration, or at least, a collaborative one.
Prediction might still be useful, but now I think it’s the cart that must be put behind the horse of knowledge synthesis. Prediction offers some rigor at the end of the chain of knowledge production and application. But the middle of this chain—the normal science we have inherited—should be restructured as well. An anonymous reviewer asks “why Bayesian updating fails...why we can’t gradually update our priors based on past knowledge when a new study comes out, which is implicitly what are doing now?”
My personal prior is that this Bayesian updating is performing poorly. As good Bayesians, we need to confront our priors with evidence. However, we have only begun to allocate much rigor to this crucial link in the knowledge production chain; see, for example, the pioneering work by Little and Pepinsky (2021). In other words: how much time do social scientists spend reading studies as they come out? Is this the right amount of time? How much heterogeneity is there in the priors of scholars in a given subfield? How do they update in response to new information?
I have no idea! And neither does anyone else! Anecdotally, though, I hear social scientists saying that they spend very little time reading published research. The brick-counting apparatus doesn’t reward this activity, so we under-invest in it.
At a political level, one thing we could do is slow down digital media. The present rate of change overwhelms not just social science but all the homeostatic, regulatory functions of our economy, government and society. Never forget “move fast and break things.” Any kind of collective response to new technology requires the ability to understand what it is and what it does, and to deliberate over whether and how we think it should be used. If the only actors capable of changing social media are the CEOs of social media companies, our knowledge is destined to be useless—or worse, useful only to those CEOs.
My personal response has been to stop churning out marginal papers on topics low in temporal validity and instead to read broadly in the history, philosophy and sociology of science, to pursue institutional reforms like founding the Journal of Quantitative Description: Digital Media and releasing the Upworthy Research Archive, to write this blog and to bring up the topic of meta-science (what exactly are we doing) to colleagues as often as possible.
If you have thoughts on what we are or should be doing, I encourage you to share them—but for the love of God, not on X. Science is nothing if not a collaborative effort, and my intuitions and opinions on this topic are far less interesting than what we as a community can develop.
Very interesting. It aligns with a conversation I had recently with Dr. Lawrence Eppard of the Connors Forum, Shippensburg University. On my Outrage Overload podcast, sharing and discussing Social Sciences research is a core element. These meta-science questions are important to keep it legit.
Great piece. So good, I will be assigning as a reading to a course I co-teach on the uses of SM in psychological research.