Note: this essay sketches the possibility of a novel philosophy and practice of social science and is the crux of my next book project. I think this is some of the most interesting stuff in the world, so please tell me what else to read or why I’m wrong.
Randomized Controlled Trials are sometimes called the “gold standard” research
design by social scientists because they requires the fewest modelling assumptions — that is, the advantage is epistemological. This “gold standard” status has been criticized by a variety of practitioners and philosophers of science. The true advantage of RCTs, however, is not epistemological but ontological: they create novel states of the social world. RCTs which are implemented by the same real-world institutions who will ultimately apply the results avoid the wasteful process of “representing” knowledge, which is instead embodied directly within the social organization. My argument for RCTs involves large-scale experimentation with open-ended evaluations both qualitative and quantitative. Epistemologically-driven social science’s excessive focus on experimental “control” wastes the true potential of experimentation.
Social Science and Epistemology
Standard social science epistemology envisions a brain, sensing the the social world
through our instruments of research. These instruments are imperfect, and methodologists are tasked with improving them. In recent years, the causal inference revolution has paid particular attention to issues related to causality; it turns out to a much thornier problem than previous generations of social scientists had realized.
But when we run Randomized Controlled Trials (RCTs), we can be very sure that our instruments are picking up a causal relationship. This imperious social science epistemology has for decades marshalled more and more of society’s resources to satisfy its own ritual imperatives. With RCTs, it enlists fellow humans as puppets in a kind of epistemological theater. This development has generally been embraced: experimentalist researchers have won Nobel Prizes, the method is now ascendant in the disciplines of Economics and Political Science, and government policy is often informed by their results. The informed public has come to see RCTs in the behavioral sciences as the “gold standard” of evidence, to follow academics in skepticism of any findings not derived from RCTs. Charitable giving is increasingly motivated by the results of RCTs.
The rise of RCTs has seen considerable pushback as well. There have been a host
of ethical concerns raised by both practitioners and the general public. Among the
disciplines adopting RCTs, the primary concern is that this method means no longer
asking “big questions” in favor of extremely narrow empirical results. More technically, the concern is about the external validity of these narrow results: we don’t really care about what happens in a specific time and place to a specific group of people when we deliver a “treatment.”
RCT proponents are optimistic about the possibility of external validity–that within
a reasonable amount of time, their efforts will have created a large enough store of
knowledge to make accurate predictions about what will happen after a given intervention in a given time and place. RCT skeptics think this won’t work, for a variety of reasons: it’s too expensive; it’s anti-democratic; human behavior is too heterogeneous (Friedman, 2019); even RCTs require assumptions about implementation that don’t always hold (Deaton and Cartwright, 2018); RCTs are only ever conducted in contexts in which it is possible to conduct an RCT, and these contexts don’t generalize to all contexts (Allcott, 2015); the social world is too complex (Yarkoni, 2022); the social world is changing too quickly (Munger, 2018).
This debate fundamentally misunderstands the value of RCTs by taking too narrow
a view of social science. We need not be in the business of creating Knowledge. Scientism (or naturalism), the aping of the methods and assumptions of natural science, is our original sin. We have inherited what John Dewey called “that confirmed species of intellectual lockjaw called epistemology” from the long history of Western philosophy (Dewey, 1958): the Cartesian epistemological problem of a disembodied mind deriving knowledge by observation and logic. Tabling the larger question, I will argue that this tradition is inapplicable and indeed deleterious to the goals of social science within a democracy.
I ground my argument in Dewey’s instrumentalism, especially as read by philosopher
of science Ian Hacking, who summarizes the central point: “Dewey distinguished his
philosophy from that of earlier philosophical pragmatists by calling it instrumentalism. This partly indicated the way in which, in his opinion, things we make (including all tools, including language as a tool) are instruments that intervene when we turn our experiences into thoughts and deeds that serve our purposes” (Hacking et al., 1983).
This framework enables social scientists to play a key role in the complicated practice
of 21st century governance. The epistemology delusion—what Dewey derisively calls
“the spectator theory of knowledge”—belies the fact that RCTs are uniquely useful for
empowering both individuals and groups to better achieve their desired ends, for two
reasons:
1. RCTs are “ontologically” useful because they create novel states of the world.
Other empirical methods can only learn from extant states of the world. The extant
world is only a small subset of the set of nearby possible worlds. Humanity has become increasingly powerful in our capacity to manipulate our environment. New technologies and the growth in the absolute number of humans means an explosion of possibilities for how we organize our social worlds. However, we have barely begun to explore these possibilities, especially not in any systematic fashion. RCTs encourage and reward creativity and entrepreneurship in the social and governmental realms, in contrast to the spectator theory of knowledge, which requires the “existence of a leisure class, who thought and wrote philosophy, as opposed to a class of entrepreneurs and workers, who had not the time for just looking” (Hacking et al.,
1983).
In a capitalist society, this kind of creativity and dynamism is possible in the economic sector thanks to the informational and feedback properties of the market. Capitalists have powerful systems in place for testing and exploiting the economic potential of each new technological development or social innovation. The start-up model is designed for exploration and failure, with successful companies adapting “atheoretically” to the complex space they’re exploring and unsuccessfully companies ceasing to exist. RCTs that involve a whole community or group of communities also explore as-yet nonexistant social possibilities, starting from where they are.
2. RCTs provide feedback to the entire social organism. The fact of acting in the
world demonstrates the relevant capacities of the actors and the actual structure of the
world at the most relevant points. RCTs train social organisms to achieve their goals. The knowledge that RCTs produce is inherently local—generated in the same context that is to be applied, obviating the need for “generalizability”—and tacit, distributed among the actors and institutions that comprise the organism. By not requiring this knowledge be rationalized and fed into the central Brain of social science before being applied in every relevant context worldwide, the epistemological objections to RCTs are avoided.
This does not mean that the local, tacit knowledge generated by RCTs is trapped
and thus unable to be shared. Each of our “hands” and “environments” are different, so
there’s not one set of centralized commands that a brain could send out; the framework that the only mechanism for knowledge diffusion is a universal, objective language is a residual of the social science epistemology. Other “nearby” social organisms can “watch” each other, see how others act in the world, and choose to adopt attractive actions into their menu of experimentation. Any generalizability is therefore intrinsically qualitative.
This perspective on the value of RCTs requires a significant re-orientation to enable them to act effectively. Most importantly, it will require a radical scaling up of
the number of RCTs—which means increasing the number of both experimenters and
experimental subjects. In turn, the way these RCTs are conducted should emphasize the creation of tacit, local knowledge throughout the social organism doing the
experimenting. Finally, the ethical status of experimentation needs to be re-negotiated: in what Don Campbell famously called “the Experimenting Society,” the fact that many people are experimenters and everyone is the subject in someone else’s experiment shifts the power and knowledge imbalance inherent in the current elitist approach to experimentation (Campbell, 1991).
Indeed, much of what I propose is directly inspired by Campbell’s work. The time
for “the Experimenting Society” has come, in my view, thanks to the developments
in the theory and practice of RCTs—and to an even greater extent, thanks to the
communication and information processing capacity of modern technology. Online behavior serves as an ideal case for the Experimenting Society because of how
malleable the online social world is (Matias and Mou, 2018). Code can act at scale with
zero marginal cost, and the measurement of outcomes is trivial.
Learning by Doing
One of the standard goals of science is prediction. It is generally assumed that this requires knowledge: that this prediction has to pass through a human brain. Further, the structure of contemporary social science requires that this knowledge be general. In contrast to the world of merely subjective mental states or human consciousness, scientific knowledge must exist outside of a human brain (in a .pdf, say, or perhaps in the .csv and the code used to analyze it)—what Popper called the “third world,” “a world of books and journals stored in libraries, of diagrams, tables and computer memories” (Hacking et al., 1983).
With the knowledge produced and encoded, it is assumed that it can be applied in
a variety of times and places. Indeed, the social sciences tend to be entirely focused on
the production of knowledge, with little attention to the synthesis, application or interpretation of that knowledge. There are counter-movements and alternative traditions within the social sciences, but the dominant approach is (broadly) positivist, quantitative, and naturalist.
Within this dominant approach to the social sciences—the more empirical side of political science, sociology and economics—there has been a “credibility revolution” (Angrist and Pischke, 2010). The current state of the art argues that field experiments
or Randomized Controlled Trials (RCTs) are the the “gold standard” research design
because they require the fewest assumptions. Through the power of randomization, the research design ensures that the treatment group and the control differ only in terms of whether or not the treatment was delivered, and thus that we can infer that any average difference between these groups was caused by the treatment rather than some unobserved difference.
But experimentation is uniquely valuable for another reason, one directly tied to
how humans learn. We are not disembodied minds deliberating over platonic entities;
we are embodied, and our cognition and learning are based heavily on actions and
sensory feedback. The baby likes to observe, but even more to manipulate, to find out
whether things fit in its mouth. This kind of learning does not look like education, does not conform to what most philosophers call knowledge. It lives in the hands as much as it does the brain. Individuals learn through experimentation, by acting on the world and observing what happens. This is what Dewey calls instrumentalism.
The analogy here is that academic social science is The Brain of society, that our
purpose is to acquire knowledge, centralize it, verify it, synthesize it and then disseminate it. Each school has a preferred mechanism by which these steps, particularly the verification and synthesis, should take place, but all have themselves at the center of the knowledge system. This approach pays insufficient attention to the lower levels of the knowledge system, the “hands” in the analogy to the individual.
An alternative intellectual lineage provides some insight into how we might otherwise organize social science. Industrialization and factory production gave rise to
the disciplines of management and operations research; WWII kickstarted huge investments in the management of large enterprises without the benefit of the price system that manages capitalist economies. My account of this history borrows heavily from the doctoral dissertation of J. Nathan Matias.
The crucial limitation for organizations of this scale is information flow: the way in
which the problems faced by the lowest level of the organization can be transmitted to
the appropriate higher level to address. One early approach to this problem is Taylorism, the scientific management of employees designed to minimize their individual agency in favor of maximum legibility and control by higher-ups. The justification of Taylorism was efficiency, both in terms of the practices it enabled and in terms of the ability of managers to generate “scientific” knowledge about what worked. This paradigm was explicitly authoritarian, but also technocratic: workers could not be trusted to make decision.
This approach directly mirrors contemporary social science epistemology as embodied with RCTs. The researcher is a representative of Science, tasked with choreographing the bodies of research subjects in order to maximize the efficiency with which they produce Knowledge. There have been many RCTs which have “failed” because of the lack of control over the subjects, who responded in unexpected ways. Failure in this case means that the knowledge produced is insufficiently pure: the measurement of the effects of actions taken by research subjects cannot be unambiguously distilled down to the language of Theory in which the researchers speak.
Alternatives to Taylorism soon arose. The Great Depression dealt a general blow to
belief in the scientific management of society, and World War II called for a rapid and
radical re-orientation. With more of society organized in hierarchical, authoritarian
structures, the social scientists tasked with evaluating organizational success were able to observe the limitations of this approach. Organizational psychologist Kurt Lewin was an early advocate for a more democratic structure and scientific approach: “Efficient democracy means organization, but it means organization and leadership on different principles than autocracy.” (Lewin, 1944). Lewin specifically advocated for what he called “action research”: rather than passively observing how the organization functions, his approach called for taking actions.
Lewin’s research treated the men he studied as more than mere cogs in an organizational machine; he enlisted their help in designing the experiments that were
run and and analyzed, proving that social science and democratic control were not
incompatible, as the Taylorists had believed.
Another intellectual tradition spawned by WWII was cybernetics. Invented by
mathematician Norbert Wiener to enhance the accuracy of anti-aircraft guns, this interdisciplinary field aspired to unify disparate areas of inquiry. Cybenetics’ central insight proved useful in a variety of contexts: success requires adaptation through the timely incorporation of feedback. Cybernetic systems are inherently dynamic; “knowledge” in these systems is embodied, local and distributed.
This approach provides a more direct path between one action and another, a path
which avoids what sociologist of science Andrew Pickering calls the “detour through
knowledge” (Pickering, 2010). For systems as complex as human behavior, we cannot
hope to generate enough knowledge, or knowledge that is sufficiently up to date.
Here, again, by knowledge, I mean the kind of thing that can be externalized to
Popper’s third world of books and computer hard drives. Without the flexibility of the
fully-engaged and embodied human mind as a conduit for knowledge, there is simply
too much entropy in the process of knowledge production, transmission, synthesis and
application for science to succeed at understanding human behavior.
Local, Tacit knowledge
Discussing this issue with practitioners who conduct RCTs in developing countries,
they agree that the process of figuring out how to actually act in the world generates
a lot of knowledge about the relevant social processes. However, this knowledge tends
to be qualitative and thus dismissed as auxiliary to the focal output of the experiment,
knowledge that can be encoded in a pdf. Boutique experiments like these might help
the experimenter gain tacit knowledge, but the experimenter is not a long-term agent
in the social organism, so this knowledge is not put to the best use.
The fact that it takes so long to train social scientists makes the practical irrelevance of our knowledge self-evident. Our highly technical language makes possible the
practice of academic social science: it creates a linguistic world within which we can
operate, a low-dimensional space into which we can project the complex social phenomena we study. Again, different schools have more or less formal languages, from the platonic world of game theorists to the “empirical” world of government statistics and development indices. But even the most informal school defines some jargon, if only for the purpose of internal communicative clarity.
In contrast, consider the speed with which humans can acquire embodied, tacit
knowledge. By acting in the world, we can acquire a skill—how to catch a ball, say—
without any formal training in the physics of parabolic motion. We are able to use
our entire sensory apparatus and the high-latency feedback that comes from moving
our arms and legs rather than being restricted to knowledge taken in through our eyes
by reading a physics textbook. The tacit knowledge we gain is stored throughout our
bodies—in our brains, yes, but also in the arms and legs that do the acting.
Smaller social organisms can use language they develop for themselves, adapted
to their goals and capacities. Even under ideal circumstances, the language developed by social scientists cannot apply equally well to each of the social contexts they
seek to explain. This translation is only a problem if the goal is to generate knowledge that transcends context. This is a bad goal. Social organisms—groups, societies,
communities—can develop knowledge that inheres to their context: local, tacit knowledge.
Even this approach, though, requires decisions about how knowledge and action
be distributed across all of the members. Stafford Beer, the cyberneticist and early founder of operations research, called this “variety engineering” – human society is extremely high in complexity (variety), and it can only be managed through structures that enable many humans to deploy their full capacities to respond to issues that arise locally (Beer, 1993). Crucially, this requires that social structures be able to re-arrange themselves, to act and respond to the feedback generated by that action.
This re-organization is irrelevant to the epistemological approach, the spectator
theory of knowledge. Through action, the social organism touches its local context
directly, in the high-variety world in which it acted before and will act again.
The most valuable knowledge generated through experimentation, then, is stored
within the social organism itself, the way it re-organizes itself. It is stored within
the embodied minds of the humans that comprise that organism and in the relations
between those humans.
Path dependence, scope of the world
We know that human societies are incredibly plastic thanks to the work of anthropologists and archaeologists summarized in Graeber and Wengrow (2021). Premodern humans “experimented” (implemented) with a large variety of economic, social and governmental forms. They observed what was done by societies they came into contact with and either rejected or adopted novel aspects. Graeber and Wengrow (2021) make the case that this was done with a high degree of knowledge. Pierson (2011) gives us reason to doubt that the strong form of this account: the inherent complexity of human behavior makes it impossible to predict what kind of human society will result from adopting new institutions, technologies or practices.
Over the past few centuries, the scope of possibility for human organization has
expanded dramatically. Explosive growth in productive capacity and revolutionary
communication technology enable social arrangements that have never before been attempted, and could also give us reason to revisit previous attempts that failed.
Very little of this scope of possibility has been explored. One school of thought
suggests that this lack of institutional diversity comes from systems of control that
aim to restrict the scope of human freedom in order to render society more legible and
manageable from the center (Scott, 2008). Formal institutions like schools, prisons and
hospitals capture more and more of human behavior. Capitalism itself renders every
human activity a commodity, forcing our messy human excesses to conform to the logic of exchange.
Setting this contentious view aside, we can simply observe that we haven’t had the
time to explore more than a tiny percentage of the space of possibility. The absolute
number of humans has grown tenfold in the past three hundred years, and we are only
thirty years into the internet revolution. There are so many of us, with so much more
freedom and capacity than ever before.
Society if we experimentally explore the possibility space.
The internet set the stage for an explosion of humanity: the scope of communication
continues to boggle the mind. But almost immediately after this space of possibilities
opened up, the major platforms enclosed this protean commons, rationalizing their use and greatly constraining the structures of communication. The goal was of course to make social media that were optimized for collecting consumption-related data about users in order to better target those users with ads, and to constrain those users’
attention in order to show them those ads.
This is unlikely to be the best structure of the internet for all of the
disparate groups who would like to use the internet to accomplish their goals.
Online behavior is thus an ideal case for the social science approach that I describe.
Our intuitions and traditions about how to structure online society are weak, the ontology of online activity is far more malleable than that of the physical world, and the tools for both communication and action are distributed among a much larger number of people.
Another advantage provided by contemporary information technology is the ability
to unobtrusively measure the outcomes of experimentation. Campbell was concerned
about both the costs of measurement and the effects of measurement on the value of
the experiment itself: “almost universally...there is a conflict between the action personnel...and the research staff. This should be regarded both as a practical problem and as philosophy of science issue....the rearrangements in the action program required to make an experimental evaluation possible. Most of these changes are appropriately regarded as distherapeutic. There may be a fundamental social science indeterminacy issue here. In a broader framework, the problem becomes one of the compatibility between psychological health and continuous measurement.”
The ubiquity of digital tracking and measurement has effected a shift in both the
reality of and our intuitions about its intrusiveness. Although this process has not been remotely democratic or deliberative, the bargain we have struck with tech companies has inured a large majority of citizens to the fact of continuous measurement. Unless there is a significant and assertive effort to reverse this trend, accompanied by vigorous government action to counteract corporate data power, social scientists and citizens should use the same technology to construct measurements for our own ends.
We must be careful not to let these easily quantified measures define the entirety
of our values, however. At a minimum, it is important to be explicit about how the
measures are constructed. More broadly, citizens should be involved in the process of
evaluating the measures themselves, to avoid taking them as direct evidence of reality.
Ethics
Ethical considerations about experiments are generally more salient in the social sciences than in, say, physics. Indeed, much of the current debate about experiments in natural science is about ethics. Ethical standards like informed consent, inherited from medical trials, are often directly in conflict with the goals of RCT practitioners: the knowledge extracted from a field experiment is usually purer, higher-grade, when the human participants are unaware that they are taking part in an experiment.
Additional concern arises when Western academics or NGOs travel to developing
countries to conduct field experiments, often at much lower expense than an analogous field experiment in their home country. In these cases, the analogy of knowledge extraction is clear: Western scientists (even granting the best intentions) using their economic power to finance an epistemological theater that gratifies our aesthetic sensibilities.
Although there are reasonable arguments about where to draw the ethical line with
social science field experiments, the current ethical paradigm is fundamentally incompatible with the framework for RCTs I advocate in this paper. Don Campbell understood this as well, so his essays on The Experimenting Society included a radical
ethical re-orientation. He argued that “participation in policy experiments is more
akin to participating in democratic political decision making than to participating in
the psychology laboratory” (Campbell, 1998).