In a speech in 2019, Duncan Watts discussed his regrets over the course of what is by any accounts an impressive career for a social scientist:
"For 20 years I thought my job was as a basic scientist. Publish papers and throw them over the wall for someone else to apply. I now realise that there's no one on the other side of the wall. Just a huge pile of papers that we've all thrown over."
There is no way to evaluate the quality of quantitative political science research without understanding what it does in the world. Appealing to previous literature simply means building towers higher and higher with no conception of the quality of their foundations or their utility as super high towers. Rather than starting at the beginning, with a political scientist reading the literature, thinking of an idea, and testing it, we have to start at the end: what are we trying to do, and how do we get there.
The former model is flattering to the intuitions of political scientist---"of course we can identify what's important and produce the relevant knowledge"---but the recent loss of faith in expertise has made such an approach indefensible. This loss of faith is also a major motivation for the causal revolution that has recently swept the social sciences, and particularly for the rise of RCTs.
This focus on producing credible estimates of (geographically and temporally) local treatment effects has been a necessary corrective to decades of poorly executed quantitative work. It was good to abandon cross-country regressions that could not transcend problems of confounding. But this previous model had another feature that has not been as easily addressed as has the improved identification strategy: if your research (at least ostensibly) describes the entire world, there is no need to worry about external validity.
In contrast, RCTs or natural experiments make no claim to global knowledge. But simply taking the results of one of these intensively internally validated studies and assuming that they will hold in a “similar" context---perhaps one with blunt covariate adjustments---is an absurd allocation of rigor. It's like designing and executing a moon landing and then sending the same ship to Mars with triple the fuel and assuming things will work out.
So: the biggest problem in quantitative methodology is about external validity / generalizability / transportability: how does the knowledge generated in one context (or aggregated across some number of contexts) inform action in a novel context? This fact is far from lost on the Political Methodology community; there were several excellent presentations on the topic in last week’s PolMeth conference, which I’ll discuss in more detail in later posts.
"with a large number of internally valid studies across a variety of contexts, it is reasonable to hope that researchers are accumulating generalizable knowledge, i.e., not just learning about the specific time and place in which a study was run but about what would happen if a similar intervention were implemented in another time or place. The success of an empirical research program can be judged by the diversity of settings in which a treatment effect can be reliably predicted."
The research program that I've been describing, currently ascendant, is what Cyrus Samii calls "Causal Empiricism." And it has the desirable property of a properly defined outcome: we know that there are some *predictions* we'd like to make, and we work backwards to produce the knowledge that best improves our predictive capacity.
An analogous intellectual movement is sweeping the practice of social science itself. Just as we've come to realize the limitations of cross-country regression, we're realizing the limitations of the practices of social science. This movement is sometimes called the “credibility revolution." This has been conflated with the “causal revolution" cited above, but they are distinct. The former has had the largest impact in social psych, for example, while the latter has been a bigger deal in economics and political science.
The causal revolution's criticism of inherited research designs produced a move to more careful studies that then necessitated a research program based on synthesis that can only be evaluated through its impact on the world. Thus far, the credibility revolution's criticism of research practices have inspired some reforms. But the can of worms has been opened (for the better!) and we need to follow this path where it leads.
I have thus far been working backwards, which is begging the question. So now I will take the traditional path and work forwards from an example: the file drawer problem.
One of the most well-developed areas of meta-science relates to meta-analysis of the results of multiple studies on a single topic. Leaving aside the issue of commensurability, the meta-analyses that have been conducted to date tend to overestimate the true effect size. This is due to the “file drawer problem": studies with positive results (that reject the null hypothesis) are more likely to be published than are studies with null results.
To address this problem, social scientists have advocated for results-blind publication, either informally by advancing a norm of celebrating journals that publish null results, or formally through the creation of the registered report article format.
These are clearly methodological innovations in that they have been shown (to some degree of confidence) to improve our ability to produce knowledge. Compare this to the canonical methodological contribution: a novel statistical estimator that has superior properties under conditions that have been empirically shown to obtain. This improves knowledge at the level of the individual paper, a unit of knowledge production that the higher-towers model of social science reifies but the predict-treatment-effects model doesn’t yet know what to do with. This latter model is more interested in the aggregate output of social science knowledge. It wants to ensure that the body of knowledge being produced has desirable statistical properties, so the file-drawer problem is the kind of methodological contribution that it finds most useful.
(We—and to be clear, I endorse causal empiricism to the extent that I endorse positivism—are of course standing on the shoulders of giants, and this step would not be possible without the work that has already been done to improve the validity of individual papers. My point is about the allocation of the marginal hour of the methodologist’s energies.)
Once we engage in this level of critique, the institutions of academic knowledge production enter the purview of political methodology. We can expand the scope gradually.
The format of articles (eg the presence or absence of results-blind review) in academic journals is part of the process, and so it must be that the number and scope of academic journals is as well. A discipline could have a small or a large number of journals. There is evidence of asymmetric developments on this dimension across different fields of social science. Communication, for example, has seen a dramatic expansion in the number of journals in recent decades, while Political Science has not. As a result, the submission numbers for the established Political Science journals have gone through the roof, and acceptance rates are abysmal.
It bears mentioning that there have been several attempts by the American Political Science Association over the past decade to create new journals. In the 2013 meeting, a proposal to create a new “e-journal” for short articles was tabled for two years (these quotes are all from the Minutes of the 2013 APSA meeting):
“given the heated discussion among members of the council and the relatively even division of the number of council members on each side of the issue, it would be disruptive and damaging either to institute a new association-wide journal or to defeat it by such a slim majority, and that it was best to defer a vote.”
There is no political science without politics! (I expect this to be a running theme of my blog: in many realms of scientific inquiry, politics is or will soon be a binding constraint.)
Some notes on the debate of the e-journal:
Many of the arguments in favor had young scholars in mind: “timeliness (reduced R&R) creates opportunities for assistant professors’ publication needs”; “improved publication process for younger scholars”
It is suggested that “Blogs different form of communication, distinct from journal,” so we should all be blogging as a replacement! The Political Methodologist served this function for a while, and even had a special issue on peer review that I found very useful
“empirical evidence of need has not been demonstrated” !
“an imbalance in APSA journals has been a source of conflict in the association earlier; desire to avoid repeat of those earlier conflicts”:
The major objection in 2013 (in addition to general concerns about opportunity costs, over-stretching the budget etc) was that the “proposed format is biased toward quantitative and formal methodologies.”
So the scope of a field---which in the case of Political Science (and basically all other academic disciplines) is largely a historical artifact---and the specific structure of the governing body of that field are all inputs into the aggregate production of knowledge, and thus cannot be excluded from the purview of methodology except arbitrarily.
There is some irony here; meta-institutional critique is more commonly associated with constructivist epistemologies than positivist ones. I don’t have an answer here; no easy answers exist. This is a political question, and one which I think would benefit from open deliberation.
Bringing this back down to Earth (by which I mean a discussion of statistical estimators), there are several steps that we can take to study the peer review process as it is.
The peer review process is a function mapping inputs of varying quality into a binary space of acceptance or rejection. However, this function has never been specified, despite its centrality to the practice of contemporary social science (and science more generally), and despite longstanding gripes about how the practice operates.
To develop this area of scholarship, we need three things:
Data from journals about the peer review process, across time and across journals
Theoretical work modelling the relationship between the quality of papers published and: institutional design, the characteristics of reviewers, and the
Experimental variation (either by existing journals or new journals) in these parameters
A positivism which uses non-positivist methods for evaluating its own success lacks the confidence of its convictions.