Everyone Knows That Water Is Wet

Apr 13, 2023

Political science needs more books like Theory and Credibility: Integrating Theoretical and Empirical Social Science, a recent and important book by Scott Ashworth, Christopher R. Berry, and Ethan Bueno de Mesquita [hereafter, ABBdM]: books that outline a research program. This one clarified some elements of the formal theory paradigm and its application to research design that had remained opaque to me for years.

The book begins with the premise that “The essence of formal theory is the crafting of models that embody mechanisms and reveal the all-else-equal implications of those mechanisms. The essence of the credibility revolution is the crafting of research designs that make credible the claim to have held all else equal, at least on average” (p7).

Given the way the paradigmatic winds are blowing, it’s a good move to hitch one’s wagon to the causal inference train. The quoted motivation is clear, and at least internally consistent.

However, this clarity only reinforces my belief in the contingency of this method, its inability to produce transcendent Knowledge. With that off the table, I also argue that it cannot be justified on pragmatic grounds, on the usefulness of the knowledge it produces.

On the latter point, let’s begin with the triumphant conclusion of the book: an explication of the analysis in Bueno de Mesquita and Tyson’s 2020 APSR article. They consider a model where the empirical research design uses rainfall as an exogenous shock to protest and argue that this design is flawed because it assumes that the regime doesn’t understand that rain raises the cost to protest. On the merits, their argument is correct. Formalization clarifies that the assumptions of this research design—that rainfall only affects protestors’ cost function, but not the regime’s awareness of that cost function—are implausible. The authors establish the importance of considering the interaction between a research design and the behavioral context being modeled.

Given the stakes facing the status of expertise—and more broadly, the way that knowledge industries operate in our democratic society—we should not be satisfied with the (correct) claim that this argument represents an incremental improvement along the branching path that social science has taken. Let’s take a step back. “Political scientists have been going about the business of accumulating knowledge by combining theory and empirics for generations” (p11) we’re reminded, and this marquee 2020 paper, (correctly) critiquing causal inference practice, literally makes the contribution that everyone knows that water is wet.

Is it a defensible task to formally diagnose the plausibility of the research designs used for causal identification? Yes, of course; it is necessary to exorcise the “Phantom Counterfactuals” of Slough 2023.

But ABBdM have bolder ambitions. They baldly assert their belief that “the primary goal of social scientific theorizing is to explain social phenomena in terms of mechanisms.” They deign to acknowledge the existence of other, non-primary goals (forecasting, evaluating policy, and so on), “but, in keeping with our focus on the canonical goals of political science, we put explanation front and center.” (p45)

I am someone who thinks the goal of social science is to predict what will happen in response to the greatest range of interventions. But Theory and Credibility is a serious effort to advance The Primary Goal of social science, explicitly targeted at winning the minds and syllabi of the next generation of graduate students. I agree that generational replacement is The Primary Way that science advances, so I think it’s important to engage with the program that ABBdM advance.

I have three objections.

Section I presents long-standing axiomatic debates that will never be resolved. I hope to demonstrate that within the verstehen tradition (on which more later), the distribution of intuitions and beliefs is ultimately responsible for what a given epistemic community comes to decide is the best method for achieving their goal. Though ABBdM provide no criterion by which a rational choice theorist can argue that their tradition is superior to other interpretative/hermeneutic traditions, this community has been content to ride the coattails of John von Neumann’s model of nuclear deterrence through the postwar establishment of modern economics to their current position as the highest-status and best-paid verstehen-enjoyers in American academia.

Say what you will about the axioms of rational choice theory, but at least it’s an ethos – and the minds of the kinds of people who end up as quantitative social scientists tend to prefer even the most flawed system to the messy, democratic pluralism of reality. I hope to establish that its prominence is a historical contingency and thus that other communities have equal a priori claim.

At a minimum, I seriously doubt that our colleagues working in political theory or state politics would ever invoke an opportunity “to write the inevitable APSR paper.” I suspect that this footnote is a joke, but the premise of the joke is a celebration of the status of marginal advances in formal modelling as worthy of publication in our discipline’s flagship journal.

Section II is the heart of my argument. ABBdM advocate for a scientific process that is a bizarre contortion of actual scientific practice, one that only makes sense in light of their axiomatic commitments. The key bridge between formal models of mechanisms and empirical research requires an evaluation of the similarity between the two. They repeatedly argue that the test of similarity is whether the predictions derived from the model match the empirical results. If these match, the model and context are similar; if they don’t, they aren’t.

This cannot work outside of the friendly confines of the lab. Indeed, when they provide a real example from the empirical literature, they abandon this framework and instead simply assert that formal models which have actors called “voters” are similar to the real-world contexts in which citizens vote on politicians.

Some appeal to similarity is inevitable. Social science is not perfectible. It can never reach objective, transcendent Truth. Any honest account admits that we need (fallen, subjective) humans in the loop. This is what my esteemed methodological nemesis called “meta-scientific dread”...and I think it’s best we come to terms with it.

Section III aims to establish that, at the margin, the quality of the knowledge production process within quantitative social science can be improved by shifting some of our focus away from statistical and formal rigor towards meta-scientific questions—and that the most immediately important intervention is a re-integration of qualitative and quantitative description.

First, I’ll propose a framework to explain how we got here and where we should go instead.

A Stylized History

(“Our” school of) social science was born with aspirations of grand theories of everything. If you set out to understand the Protestant Work Ethic, or Capital, or Suicide, you need to understand the entire process, how the micro and macro interact. Taking the monograph as the unit of knowledge production enforces a structure with transitions between sections (concepts) and a coherent narrative arc. The reader consumed the entire argument. Individual sub-arguments might not hold up; Adam Smith’s anthropology was super wrong, for example. But the proof of the knowledge was in its ability to reveal new things about the world to the reader, to ultimately inform their action.

The methodological specialization of social science necessitates the creation of mutually uninterested (and now unintelligible) communities who each work on one stage of the process of creating knowledge. The unit of knowledge production is now the academic paper, with concomitant specialization: each new paper is a refinement of previous work within that community, with little reference to other communities and only the vaguest gestures towards the ultimate verification of this entire distributed knowledge system.

The original system was limited by the capacity of the Great Thinker; one person can only read, see, think, write so much. Specialization is necessary. But academic social science now lacks a top layer, an entity to provide synthesis, verification, direction. We have techniques but no strategy. Duncan Watts’ metaphor describes “throwing papers over the wall for someone else to apply…[but] there’s no one on the other side of the wall. Just a huge pile of papers that we’ve all thrown over.”

The classic metaphor is about chains and links. This has unfortunate binary implications (the chain breaks or it doesn’t), but it’s useful for now.

The Great Thinker had to convince themself, and their reader, that the entire evidentiary chain worked. They were always wrong; social science is hard. But the ultimate standard was whether the argument was useful, if readers found that seeing through this lens, thinking with this technology, allowed them to accomplish their goals better. This is what provides crucial feedback to the entire system. The successes and failures of actors using previous Great Theories is then fodder for new Great Thinkers.

Contemporary academic social science entails distinct communities trying to perfect their individual link in the chain. Sometimes these links simply don’t connect: there’s no one fitting them together, just piles of papers thrown over subdisciplinary walls. Even when they do connect, however, we never test the entire chain to see where it breaks. The sociological reality is that social scientists don’t care if it breaks or holds. We don’t care whether the knowledge we have created is both valid and useful.

Formal modeling is a community who has refined their chainlink for decades. It is a specialized link; narrow, finicky, with few applications. But it is strong.

Empirical research design has recently followed a similar trajectory. In the days of cross-country regressions, the link was easily connected but extremely weak. Today, reduced-form causal empiricists insist on gold: it’s rare and expensive, but (metallurgical facts be damned) extremely strong as well.

Theory and Credibility is an effort to refine the connection between these two links. It aims to demonstrate how these two fit together nicely and can form a coherent section of the chain, to make it as strong as it can be.

In most of this essay, I will accept the premises of ABBdM but argue that they have failed. But the far more important problem is that even if they had succeeded, we would be no closer to a resilient chain. It is an absurd allocation of social science energies and resources to continue to refine what are already the strongest links in the chain.

If we actually care about diagnosing the evidentiary chain, about demonstrating that our knowledge actually works, we need metascience. The literature on “publication bias” and the “file drawer” problem have started us down this path; we already consider the aggregate output of social science as our quantity of interest. We must thus consider how we allocate a fixed number of social-scientist hours, both in training and in professional careers.

The most important dimension on which I disagree with ABBdM is in their implicit model of how we should apply those scarce social science resources.

I. Verstehen? Hermeneutics? Are you sure we need all this math?

ABBdM’s stated (axiomatic) goal is “intentional understanding”: “we understand a model only when we have a successful intentional explanation of it” in terms of the “opportunities, beliefs and desires that make the behavior in question comprehensible” (p53). This is elaborated in fn3, which says we should “ask whether individuals in an empirical circumstance actually perceived and thought about the main tradeoffs faced by the actors in the model.” They admit that there are other related traditions, but they can only, per their definition, produce explanation not understanding.

In more formal terms, they cite previous claims that rational choice “falls in the hermeneutic rather than the positivistic camp of social science methodology (Guala and Steel 2011)” and, per Bates (1996), “the tools of [Rational Choice Theory] cannot be applied in the absence of verstehen.”

I only recently heard the word verstehen for the first time, and I had only encountered “hermeneutics” in the context of literary criticism. Indeed, “verstehen” tends to be more associated with qualitative methodologies, following a recent debate in the Sociology literature.

It turns out that ABBdM’s “verstehen” is an odd duck. The full quote from Bates (1996) is illustrative:

“The use of such methods requires precisely the kinds of data gathered by ethnographers, historians, and students of culture. It requires knowledge of sequence, perceptions, beliefs, expectations, and understandings. The tools cannot be applied in the absence of verstehen.”

This seems to me to imply that intention-based modeling in the absence of qualitative knowledge is nonsense. Theory and Credibility does not, however, spend very many words discussing the link between ethnography/history/cultural studies and rational choice.

Again, I'm in the “social science as prediction” camp, the one which believes that knowledge only exists as it is being used and that we can thus evaluate the success of our social science by making and evaluating predictions about the future.

Given my perspective, it has always been difficult for me to understand how this epistemic community decides what constitutes good research, or more broadly, how they determine if they are successfully accomplishing their social scientific goals. Thankfully, Chapter 4.4 provides “Some Guidance on What Makes a Good Model.” The two guidelines are that better models are transparent and substantively interpretable.

The latter is discussed in detail below. Towards the former, ABBdM praise incrementalism in the selection of models, noting that theorists don’t want to abandon canonical models even if they have some “assumptions that they don't like” (p59). The reasons why a theorist might like or dislike an assumption is again based on the appeal to similarity but note that this section implies a preference in the tradeoff between verisimilitude and verstehen: “incrementalism is in part the price of really grasping a model’s mechanisms.”

This is not a price that seems worth paying...but that’s coming from someone who has never really grasped a model’s mechanisms.

Relatedly, we should prioritize the use of “well-understood mechanisms”; we should be excited to see mechanisms which have been applied in multiple contexts applied in a new context. “Recognizing such patterns deepens our sense that we have explained what is going on—‘feeling the key turn in the lock’ as Peirce put it” (p57).

This entire discussion, then, rests on the identity of the minds whose locks are being turned. Or, in Ian Hacking’s discussion of this famous Peircian phrase,

Explanations are relative to human interests. I do not deny that explaining - 'feeling the key turn in the lock' as Peirce put it - does happen in our intellectual life. But that is largely a feature of the historical or psychological circumstances of a moment. There are times when we feel a great gain in understanding by the organization of new explanatory hypotheses. But that feeling is not a ground for supposing that the hypothesis is true. (Representing and Intervening, p53)

I agree that there is a sizeable community of scholars who derive this mental satisfaction from the process of thinking about the world through the lens of formal models. How could we discover whether this is a mistake, whether (more? better?) verstehen could be achieved by allocating the same societal resources towards an alternative method?

There is no non-circular justification for the method of formal modelling if the goal is understanding. This is why ABBdM lean so heavily on appeals to authority and tradition (“primary goal”; “canonical goals”) in the passages quoted above.

If an epistemic community decided that they wanted, starting from scratch today, to explain human behavior, what are the chances that they would decide to prioritize this method? The answer to this question depends on the distribution of the identities, preferences, experiences and cognitive styles of the members of that community.

The lengthy tenure of formal models and in particular rational choice at the center of powerful institutions in American social science is endogenous to the fact that there are today many powerful people who think that rational choice is a useful framework through which to understand human behavior. Rational choice is an example of the kind of “cognitive map” which Doug North (as discussed in Paul Pierson’s Politics in Time) explains as causing positive feedback loops in institutional development.

Most Political Science PhD students encounter rational choice concepts and logic in their coursework; it is a well-established mode of explanation. This is roughly analogous to the method of Derridaean deconstruction in comparative literature departments: some students ignore it, others are familiar with its logic, others deploy it frequently in conjunction with other methods and for others it’s the whole enchilada.

By what criterion can we decide whether rational choice or deconstruction is a better method? No criterion is offered by ABBdM. Both have (self-)serious practitioners among professional academics whose explicit aim is hermeneutic.

Other political scientists who identify as positivists, who aim to produce knowledge that helps improve human decision-making by making more accurate predictions, should be aware that rational choice as described by ABBdM cannot be falsified because it cannot be tested. My hunch is that many of these positivists would conclude that rational choice (like textual deconstruction!) is therefore non-scientific.

II. The fictional scientific process of establishing similarity

One of my guiding lights in the philosophy of social science has been Nancy Cartwright. Her work has been explicitly incorporated into recent social science debates by co-authoring a seminal paper with Econ Nobel laureate Angus Deaton about the epistemic status of Randomized Controlled Trials [RCTs]. She argues that there is no *qualitative* advantage that RCTs have over other research designs; all designs rely on assumptions, and they are valid iff those assumptions are met. The RCT enthusiast can argue that the assumptions of RCTs are more plausible than the assumptions underlying other designs, but then we’re in the world of considering the comparative tradeoffs of this increased plausibility with other disadvantages of RCTs. They are not, in her estimation, a gold standard.

More broadly, I associate her thinking with two ideas. The first is the fact that a chain of evidence is only as strong as its weakest link, as I discussed above. How many stages are there? That is, how many links are there in the chain? I don’t think we really know, but this is the kind of question I think we should be asking. If we actually cared about applying our knowledge, we would need to formalize this entire evidentiary chain and work on strengthening the weakest links first. A necessary task for political methodologists working within the emerging subfield of Metascience!

Regardless, some of these links correspond to well-specified components of validity. In these links, numbers are fed in, math is performed, and numbers are spit out; it is at least conceivable that zero variance is introduced in these stages, that they are perfectible. However, Cartwright’s second key idea is that at least one (and probably many) of these links are fundamentally *subjective*, that they require a human or humans to perform some act of comparing what they observe with some internal state of their mind.

(This is analogous to the “Oracle problem” encountered by prediction markets and smart contracts. These systems may be perfect, internally, but they ultimately require a human to translate their output into the real world---or vice versa, to translate the real world into an input they can accept.)

Reading Cartwright's philosophical formalism caused a key to turn in the lock of my mind; so much of the performative rigor (what she calls the “vanity of rigor”) applied to certain stages of the research process is allowed to evaporate in less rigorously contained stages. Strengthening one link (through increased rigor of a better statistical method, sampling procedure, or measure) is pointless unless all the other links are already sufficiently strong.

So I was surprised to see ABBdM appeal frequently and crucially to “similarity” in their model of the research/knowledge production process. (And then, on p17, to see them claim that “a chain is only as strong as its weakest link”!) But it seems that the foundations they invoke in the discussion of similarity come from the philosophy of natural science.

The natural sciences are only about prediction and control; there’s no analogue of verstehen here, unless we mean the creation of more poetic metaphors to describe the dance of the celestial orbs. There is no meaningful sense by which we can say that heliocentrism is correct and geocentrism is incorrect if our criterion is which one makes the lock in our brains turn. So I fail to see why ABBdM prefer to ground their verstehen-aimed method in a philosophy of science that embraces the analogy to the natural sciences.

“Exactly what similarity amounts to is a question of considerable philosophical interest (Frigg and Nguyen, 2020).” Correct! Having acknowledged that this is an open question, ABBdM proceed to beg the hell out of it. They assert that design similarity means that the “Measures and assumptions of research design credibly represent relevant features of the target” and that model similarity means that the “Model meaningfully represents relevant actors, situations, and mechanisms in target.”

The former is later elaborated: “We say that a research design is substantively identified if it includes arguments and evidence that would convince a reasonable interlocutor that the assumptions linking the statistical procedure and the estimand are plausible in the particular target under study.” (p73)

I don’t see how appeals to “credibility” and “meaningfulness” and “reasonability” are an improvement on appeals to “similarity.” This central issue is not seriously addressed.

This book is a testament to the centrality of the modeling link and a comparative lack of interest in the necessarily qualitative links. This is not a merely “academic” quibble because it reflects an analogous dismissal of the practical importance of the subjective link in the chain. The following quote illustrates why this is a problem; the model cannot fail, it can only be failed.

“Suppose that estimates from the research design did not agree with the implications of the model. This reduces our confidence in the similarity between the model and the target under study” (p31).

ABBdM are insistent on this point:

“to evaluate the similarity claim, we can see whether the model’s pertinent implications agree with empirical findings” (p48).

“If a theoretical model implies some relationship, then you might proceed by looking for a credible research design for estimating a commensurable quantity in order to assess the similarity of the theoretical model to the target.”

These three quotes all assume perfect empirics. The model is axiomatically good and useful; if some empirical results disagree with the model, then the context of the empirical exercise must not be similar to the model. In reality, of course, we don’t know if there was a coding error, or that a threat to inference has not yet been identified by statistical methodologists, or if some assumption of the research design turns out to be substantively incorrect. If the implications of the model match the empirical results, voila, similarity; if not, no similarity.

We then turn to the well-trod ground of American voter behavior. This was the site of a previous generation’s battle over the pathologies of rational choice theory. Back then, it bears noting, the rational choice camp argued that their models were in fact useful for predicting human behavior. But after decades of expensive effort, the payoff was extremely small; today, as we saw above, ABBdM wisely assert that rational choice is not interested in prediction.

More recently, American voter behavior has seen a high-powered back-and-forth over whether voters are “rational.” The crucial examples have to do with whether voters make decisions based on random events that are unrelated to politicians’ performance, whether negative moods from shark attacks or positive moods from soccer matches spill over into the voting booth. I won’t rehash this here.

ABBdM take for granted that the models that they have labeled with the words “voter” and “politician” are similar to the citizen filling in her mail-in ballot. Earlier they claim that “to evaluate the similarity claim, we can see whether the model's pertinent implications agree with empirical findings." Here, we find a case where the model’s pertinent implication did not agree with empirical findings. They do not use this evidence to evaluate the similarity claim.

Sociologically, what happened is that this extremely smart, prestigious, and well-resourced group worked hard to overcome each new empirical objection to their beloved axiom that voters are rational. The possibility that their canonical models of voter behavior are not similar to the context of real-world voter behavior is never considered. Instead, they provide novel theoretical objections or point out empirical flaws in order to conclude that “the evidence on offer does not entail the conclusion that voters are irrational” (p149).

As a practical matter, diagnosing “similarity” by comparing theoretical implications with empirical findings is ridiculous. It only makes sense as a defensive feint, a fictional scientific procedure which keeps formal modelling as unshakably necessary. As we see, ABBdM are unable to maintain this farce for even a hundred pages.

So how does this crucial stage in the scientific process actually work? The intuitive way, which they ultimately admit: “Evaluating similarity is in large part a matter of substantive knowledge of the phenomenon in question” (p150). This leads us to the most important and least-rigorously-considered aspect of the knowledge production process.

III. Knowing what the fuck is going on

Formal modelers read books and manipulate Greek symbols; there’s no reason to expect them to be at the cutting edge of substantive knowledge, especially for phenomena (like joining a violent rebel group) with which they have no experience. So it seems unlikely that they should want to rely on their own intuitions to make decisions about similarity. We should perhaps enroll qualitative scholars, ethnographers, for this task — as Bates (1996) argues.

With our library of mechanisms, we might select a few plausible candidates and ask the ethnographer, with her intimate, thick knowledge of a social context, to tell us which mechanism(s) are at play. After spending two years embedded with guerillas who might have begun predating on oil or gold—depending on price shocks to these commodities and in light of the relative ease with which these commodities can be physically moved and exchanged for cash—she is best suited to tell us “whether individuals in an empirical circumstance actually perceived and thought about the main tradeoffs faced by the actors in the model.”

(Brief digression: anyone who has tried to teach the rational choice paradox of voting is overwhelmed with evidence that voters do not “actually perceive and think about the the main tradeoffs faced by actors in the model.” If voters already thought about voting in terms of costs, benefits, probability of decisiveness and civic duty….it would be way easier to teach this topic.)

That is a little unfair; realistically, none of us high-status quantitative researchers wants to give any sort of veto power over our research to qualitative researchers. They’ve been saying that quantitative social science is impossible for decades and we’re still going at it (but see the Seawright chapter in the recent Handbook). As a result, we allocate far too little rigor to this stage, to the crucial practice of evaluating similarity.

This deficiency becomes an acute threat to the social scientific enterprise because social scientists are all weirdos who don’t understand how other humans experience the world. Our intuitions aren’t just underdeveloped, they are actively biased. This bias is increasingly obvious in the study of social media. Academics are far more likely than the average American to exist in online echo chambers; our shared intuitions point towards echo chambers as a major driver of polarization. Academics who study digital media come from the very top of the distribution of digital literacy; absent extensive qualitative evidence, we have no way of understanding how Americans at the bottom of this distribution experience digital politics and we are thus unlikely to develop theories that capture this experience.

Returning to Theory and Credibility: how does it look in practice, the incorporation of substantive knowledge into the research process? The mystery of the term limits illustrates.

Around the turn of the Millennium, scholars of American elections encountered a puzzle. Term limits used to cause governors to shirk and perform less well, but then, in the 1980s and especially 1990s, term limits ceased to cause this. Besley and Case (2003), cited on p199, conclude that: “it seems likely that some omitted variable is responsible for the change in behavior observed for governors working under a term limit. This is an area ripe for future research.”

The solution comes from Alt, Bueno de Mesquita and Rose (2011) “who argue for a resolution of this puzzle that combines a version of the model of electoral accountability ... with some institutional knowledge about the nature of gubernatorial term limits in the United States.”

That “institutional knowledge” is the descriptive fact that, over time, state term limits ceased to be a mixture of one- and two-term limits and became all two-term limits. The resolution to the mystery of the term limits was provided by someone knowing what the fuck was going on—through a detailed, ongoing knowledge of the social phenomenon under study, so that empirical attributes which might have at one point been considered “auxiliary” or which did not vary within the initial scope of the analysis began to become causally relevant. In this case, the relationship between the numbers that appeared in the dataset and phenomenon of interest changed over time.

So what do we achieve by formally establishing that all of these contexts were similar to a model which included a mechanism called “electoral accountability” but that only the two-term contexts were similar to a model that also included a mechanism called “competence”?

Personally, not much. I’m fine with the “folk theories” of human behavior that people learn stuff about the stuff they do and they work harder when they might get fired; I don’t need that dressed up in math. But there is a large, powerful community of academics who do need these theories dressed up in math, who in fact explicitly argue that the primary goal of social science is to translate as much as human behavior into math as possible.

In conjunction with the “water is wet” example, we can trace the tragicomic arc of postwar quantitative social science, the dialectic between undertheorized empiricism and overtheorized rational choice. Theory and Credibility aims at synthesis but in so doing lays bare the vast gulf between actually-existing social science and the hypothetical scientific processes/institutions which are necessary to satisfy our absurd ambitions and vanities of rigor.

Where are we going?

I am grateful to ABBdM for laying out their research program so clearly. Today, social science feels unsettled, the result of many interlocking trends: the replication crisis, causal inference, the internet, novel computational tools, and the crisis of expertise facing democracies around the world. We need to be having this kind of high-level, strategic conversation about what social scientists should do, rather than complacently extending the theories, models, institutions and practices of the past.

I have argued that ABBdM’s approach has shortcomings that I hope can be considered and debated. But what we need are alternative models of what social scientists should do, other coherent projects. I don’t have one yet, but I think that motivating metaphors are a good first step.

A better, Bayesian metaphor for knowledge production highlights where we should allocate our efforts. Social science is like an electric line (with intended resonances with entropy / information, per Kubinec (2020)). Each community is a wire. The ideal wire offers zero resistance, conducts electricity perfectly. In practice, each wire and each connection dissipates some of the electric current. But a single low-quality connection can dissipate disproportionate energy.

That is, the variances introduced at each stage of the knowledge creation/application process are multiplicative. In practice, social scientists tend to treat each stage as separate and to consider each step a binary success if the variance is kept within tolerable bands.

Consider the construction and validation of a measure based on human coders. Multiple coders code the same (say) tweet as being either liberal or conservative, and their codes are then compared. If the resulting inter-coder reliability score is “high enough,” the measure has been validated and codes are deterministically assigned based on the majority vote of coders.

Another “stage” here could be the sampling procedure; the ideal quantity might be a truly random sample of tweeters. Is our sample truly random? Impossible to say, but we might be able to demonstrate that the demographics of our sample match (...ish) the demographics of the Pew report from 3 years ago about who uses Twitter. If this satisfies the reviewer, this stage gets a binary check mark of validity; if it does not, it gets a big red X and is shoved in the file drawer.

The errors in these two stages are nonzero, impossible to know, and *multiplicative* with the reported variance of the key estimator in the regression table. Different communities have different models of how they allocate their rigor. Psychologists tend to care more about construct and measurement validity, and they tolerate the lower external validity created by convenience samples; political scientists tend to prefer the opposite side of this tradeoff.

If we model the entire social scientific electric grid and accurately propagate the information dissipated at each connection and link, I believe that we will find the current very weak indeed.

One path forward requires us to technologically engineer our way out of meta-scientific dread, to minimize human subjectivity. If we want to maintain high levels of rigor everywhere, high levels of validity throughout, maximum conductance throughout the electric knowledge grid, I believe that we will need to dramatically restrict the freedom of individual scientists. Think Registered Report, for everything. We can see what this looks like in the form of the Controlled Vocabularies and Ontologies that have enabled significant progress in the field of computational biology; systems modeled structurally, with the concepts, terms and actors all fixed, huge communities of scholars working on a strictly defined framework.

I believe that because social science is not like natural science — because our object of study changes over time and that none of the relationships can be taken as fixed — this approach is a dead end. In terms of intellectual pleasure, it would also just suck.

I prefer to be more agnostic about process, more pluralistic, and to ultimately rely on humans to figure out how best to learn about the world. Positively, my philosophy of science is primarily Feyerabendian; I am increasingly against method.

The rigor only comes in at the end, when it really matters, by making predictions about the future.

A common objection: There are many large-scale or long-term processes that are not amenable to prediction. This is true, of course — and the solution is to forgo the vanity of rigor entirely, admitting that these questions are outside of the scope of social science. The scope of social science in a free society must be constrained by what we can honestly aim to achieve; there is no technocratic escape from the messy realities of democracy.

This also means abandoning the search for transcendent Truth in social science. Meta-scientific dread could produce nihilism, but that’s no good to anyone; instead, we need to embrace the democratic belief in our capacities and those of our fellow citizens.

A character in John Barth’s The End of the Road says “Energy's what makes the difference between American pragmatism and French existentialism—where the hell else but in America could have a cheerful nihilist, for God’s sake?” quoted in Pragmatic Politics by John McGowan.

Only by learning to live with meta-scientific dread can we achieve an honest account of what social science can do, and what we social scientists can do as part of a democratic society. Here I am embracing the American Pragmatists, especially Dewey, who in his last major speech notes that

we have had the habit of thinking of democracy as a kind of political mechanism that will work as long as citizens were reasonably faithful in performing political duties... we can escape from this external way of thinking only as we realize in thought and act that democracy is a personal way of individual life

I will be working on developing this component of my philosophy of science over the coming years.

Thanks for reading/scrolling all the way to the bottom. Brief updates:

I deleted my Twitter account on January 1 and I am incredibly happy about it.
My paper on Temporal Validity has been accepted at Research & Politics; I’ve been working on this for nearly five years, so that’s a relief.
I will be spending AY 2023-2024 on a fellowship at the Princeton Center for Information Technology and Policy. Hit me up if you’re in NYC or central Jersey.

Never Met a Science