I still don't think you need to vacuum every week

Jun 16, 2026

Editor’s note: this post is primarily of interest to my academic colleagues, if that’s not you, don’t worry, we’ll be back with more media theory next week. A few quick announcements: I’ve co-authored a book chapter (Forthcoming with Cambridge University Press) on “Public Opinion in the Age of AI,” preprint here. I’m going to be at APSA this year for the first time in a few years (Boston is easier to get to from Europe) — please reach out if you want to hang! Also at APSA, I’m participating in a panel resulting from my work on Peer Review 2027: “Generative AI and Journal Publishing: Challenges, Opportunities, and Policies (APSR Panel)” on Friday at 10am — this is an incredibly pressing topic, glad to have this forum to discuss it.

Today I’m presenting two new papers, one of them an extremely dense working paper in the philosophy of science, the other a just-today-published article using AI-powered experiments to study gender and housework. I’m putting the fun one first but please scroll down to the philosophy of science one if that’s your thing.

Hot off the presses at Sociological Science is my article (first author’d by Léa Pessin) “Beyond Text: Using AI-Generated Visual Conjoints to Study Gender and Housework Attribution.” This was a perfect combination of my previous work on visual conjoints and Léa’s substantive expertise on gender and housework.

The project began when I was explaining to Léa that we simply have different preferences for the tidiness of domestic spaces — what she thinks of as messy, I think is fine. This is just a question of diverging personal preferences. So really, we’re better off compromising where if she wants it to be super-duper clean, she has to do it herself.

In a normal relationship, this would’ve just led to a fight, and not a very productive one. Thankfully, Léa is a sociologist of gender and housework, and had both the theory and data to explain what was wrong with my reasoning (her words). What she argued is that my allegedly personal preference about tidiness was in fact the product of societal gendered expectations about how men and women should behave at home. Women are expected to be responsible for keeping their homes tidy and pay greater social costs for not doing so. Then, she did the one thing that would end the discussion: she served me with a visual survey experiment. This is how this paper started.

Sociology is a breath of fresh air compared to political science because of the range of outcome variables they consider interesting. I’ve long chafed against the imperative that my work be immediately relevant to electoral politics — especially because many of the computational methods I use don’t have many direct applications. We can study voter preferences for hypothetical politicians a million different ways but I feel we’re starting to hit diminishing returns…meanwhile, getting to mock up hypothetical living rooms and think about how to balance the number and diversity of children’s toys strew about was novel and exciting.

The substantive findings are really cool. We find, just like in the same 2019 conjoint experiment paper by Sarah Thébaud, Sabino Kornrich and Leah Ruppanner that directly inspired our design, that the myth that “men just don’t see mess” is busted: men were if anything slightly (though non-significantly) more likely to rate rooms as messy than were women. We also find the expected, large effects on housework responsibility: on average, female occupants are rated as more responsible for cleaning up the rooms. And, most importantly, that Léa was right and I still have to do housework.

On the other hand, we didn’t find support for the original finding that female occupants suffer greater social consequences for messy rooms than male occupants do; there are a number of explanations for why we might’ve found this different result, we don’t intend to “overturn” the original finding, but it does suggest that it’s less robust than some of the other results.

Novel to our setup is the differential effect of children’s versus adult mess. Here, we see that respondents rate children’s mess as more messy than adult’s mess — but that occupants suffer lower social consequences for children’s mess. And finally, Léa’s favorite result (because it’s consistent with earlier qualitative evidence) has to do with the housework/paid work responsibility tradeoff. When one person works full time and the other is either unemployed or a homemaker, the full-time person is less responsible for cleaning, regardless of the gender of the respective parties. It’s when both parties work full time that we see an additional significant allocation of responsibility to women. This is an example of what sociologists call ‘doing gender’ -- people hold women more accountable for the home even when both partners face similar time constraints.

So, we’re very excited about this project, and especially that it’s coming out in Sociological Science. For my non-sociologist colleagues who aren’t familiar with Sociological Science, let me briefly shower the journal in praise. Subjectively, of course, I think the current editors have great taste, since they just published my article; ymmv. But objectively, the structure of the journal is a magnificent metascientific intervention. They view their role as primarily curatorial rather than developmental; as such, they don’t do R&R’s. The editorial team has complete discretion; they can accept or reject manuscripts on their own, or send out for advisory reviews, but again even with the reviews, they either accept or reject the paper — and they do so in 30 days.

We can argue about how much peer review improves manuscripts— I think it’s rare in practice today, but that we should move to a system in which there’s a much greater diversity in journal types, where some explicitly focus on improving manuscripts and others focus on this curatorial role. Sociological Science has already taken the latter step, and I think other “top” journals should move in this direction as well.

We have to stop pretending that every nice-to-have element of academic publishing can be achieved for free — most things like mandatory computational reproducibility cost money, and all of them cost time. And time is progress, time is validity. Sociological Science takes well-reasoned stances on these tradeoffs, promising speed in exchange for a modest ($45) submission fee, and a sliding-scale publication fee that goes up the longer the manuscript is (an incentive for concision!).

We think that there’s a ton more work that can be done with the AI-powered visual conjoint, across the social sciences. The upside of visuals is obvious, from an ecological validity perspective, and now the cost of producing them has decreased dramatically. There remain methodological issues — it’s very difficult to precisely balance the images on the dimensions you don’t want to vary. We ended up using very explicit, lengthy image generation prompts.

But even still, it’s very difficult to say how much “mess” is the same amount “messy” when the content of the “mess” differs. Experimentalists are used to thinking about how to operationalize concepts using text, but operationalizing them using images poses novel theoretical issues. Were the children’s mess rooms “messier” because children’s mess is per se messier than adult mess — or was there a latent confound in how the images were generated? These methodological issues remain for future research. But possibly the biggest challenge ahead for gender equality in our household is that Léa has now concluded that every one of our disagreements should end in a publication proving her right.

Read the full, open access article here.

and now for the philosophy of science one

The biggest open question in the methodology of social science involves generalizability, aka external validity/knowledge synthesis. There have been a raft of statistical and institutional interventions on this problem, but it’s particularly difficult to tackle because it’s an intrinsically metascientific problem. Methodology is the task of improving the validity methods undertaken as part of scientific research, and the recieved ontology of scientific research is that it science is composed of studies. When we take a step back, we’re immediately confronted with deeper questions about what exactly the scientific enterprise is aiming to accomplish, how well it’s achieving those goals, and how it might do it better. We need to think more broadly; it doesn’t make sense to simply apply the same rigorous positivistic scientific methods one level up.

But the absurdity of meta-positivism hasn’t prevented it from becoming the dominant impulse within metascience. The replication crisis and concomitant issues about generalizability will be solved with more rigor! Come up with a metric (say, the p-curve) and apply it across wildly disparate literatures that all happen to use numbers to represent their outcomes — this will diagnose our flaws, producing the only true and good science: a machine for producing a uniform distribution of p-values.

I’ve called this movement metascience manqué, a degenerate case of the much wider possibility space made possible by a more critical metascience. This can take a variety of forms, but my focus is on the ontology of science. As an experimentalist, my ontology is premised on treatments and contexts: the actual physical actions that the researcher causes to take place in the world, as well as the precise relationship between those actions and the rest of the world . The other crucial components of research (measurement, samples, populations) are fairly well-understood. But the biggest issue for the field is a lack of attention to the experimental procedure as such.

I believe that this essential engineering step has recieved less attention because it is not immediately relevant to the theories that social scientists consider their true aim. In Slough and Tyson’s (extremely compelling) framework, they refer to the “technology of intervention” as something of a footnote to the meatier theoretical constructs of “mechanisms” and “contrasts.” I think that higher-level work can only succeed when the foundations are more solid, and the technology of intervention is a pivotal primitive. My particular point is that this technology of intervention is far more heterogenous than theory-focused practitioners (would like to) believe, and that unless we understand the implications of this heterogeneity, our eventual efforts at knowledge generalizability will be confounded.

To that end,I introduce a typology of political science experiments based on their “containedness.” Contained experiments are those for which it is plausible to “perform the same procedure” and expect “the same result.” Uncontained experiments are those for which practitioners expect intrinsic variation to dominate; even when performing the same procedure, the goal is to understand how results differ across time and context. I propose three conceptions of containedness: pragmatic (community expectations about replication), complexity (information required to specify the procedure), and temporal (speed of experimentation relative to drift in the phenomenon). Lab experiments are generally the most contained, survey experiments occupy a middle position, and field experiments are the least contained. I provide a diagnostic rubric for evaluating containedness and argue that distinct evidentiary strategies are appropriate at different levels.

This is the most difficult paper I’ve ever written, my first serious attempt at doing philosophy of science aimed at social science methodologists. It’s just incredibly hard to write rigorously; as my readers here will recognize, I’m fairly good at writing impressionistically, but that shit doesn’t fly when you need to be precise. I’ve already submitted this paper for review and been rejected; one particularly thoughtful review suggests the need to formalize the framework, certainly the standard perspective on how to communicate ideas with precision. But I’m even worse at that. So I’m sharing the working paper now in the hopes that some readers can make use of the framework well enough that I don’t have to go down that road.

Comments on this would be extremely welcome:

http://kmunger.github.io/pdfs/metascience_67.pdf

Never Met a Science

Discussion about this post

Ready for more?