In Today's US, Age is Always a Confounder

What makes for good quantitative description?

Jun 23, 2022

A recent statistical and substantive controversy: a New York Times piece with the subtitular claim that “the death rate for white Americans has recently exceeded the rates for Black, Latino and Asian Americans.”

This is, literally, true. It’s quantitative description! A first cut at understanding the social world, the step in the scientific process that helps us figure out what to study and where to begin theorizing. And yet the newsletter drew serious criticism, with some calling it “dangerous” or “misinformation.”

The problem is that the article failed to sufficiently emphasize the fact in the US today, age and race are correlated: a higher percentage of older people are white than are younger people.

This is a common mistake — age and race were not significantly correlated until recently, and there is no single reason why. It’s a combination of Boomer Ballast (compounded by differential mortality rates among black and white Boomers) and youthful immigration. A central argument of my recent book is that age is correlated with basically every other politically relevant characteristic. This is straightforwardly important as a driver of the breakdown in intergenerational understanding and the rise of intergenerational conflict, but it also poses a problem when we try to talk about those other characteristics on their own terms.

As in the COVID example, we can’t talk about race without also talking about age, in the United States of the 2020s. This is a tricky point: social scientists usually want to isolate “one thing,” to control for other variables. This is a worthy scientific goal, but when two variables are intrinsically correlated in the context we want to understand, it can further confuse the issue.

In this case, a revised analysis of COVID and race that controlled for age would be ideal for answering the question “What would the COVID death rate by race be if age and race were uncorrelated?” We might want to answer this question, but it refers to a counterfactual world that will never exist: there is no policy that will cause age and race to become uncorrelated anytime soon in the US.

I want to emphasize that I’m not sure about the answers here, but this is a useful case for thinking through what we are trying to accomplish with quantitative description, both in science communication and as part of the scientific process.

But first, the substantive argument: age is unusually correlated with basically every other politically relevant characteristic.

First, race. This graph from Pew (a few years old at this point) makes the argument very clearly. The Baby Boom is easily visible among whites: there are almost 40% more 55 year-old whites than 37 year-old whites. In contrast, there are fewer than half as many 60 year-old Hispanics as there are 24 year-old Hispanics.

Put another way, to emphasize the median age and that whites are the distinctive race here.

U.S. racial and ethnic minorities tend to be younger than whites

To a counter-intuitive degree, then, when we talk about older people in the US today, we are mostly talking about white older people. And the converse: when we talk about racial minorities, we are mostly talking about younger racial minorities.

Second only to race as an organizing identity of American society is religion. [Bracketing gender. Despite differential lifespans and thus a skewed gender ratio at the very top of the age distribution, biology tends to make age and gender uncorrelated.] Or at least, it was, for over two hundred years. The fact that this strikes me (at least) as somewhat quaint is a testament to the vertiginous shift in the American religious landscape.

The most recent data from Pew is from 2014, but it shows a stark generational divide on religious lines. Younger generations are overwhelmingly less likely to be religious.

More recent data from Gallup lacks the generational breakdown but shows that the topline decline has in fact accelerated. After decades of being nearly constant around 70%, in the past twenty years the number of Americans belonging to a church, synagogue or mosque is now just 47%.

There is some evidence of older people abandoning religion; that same Gallup poll shows that the Silent Generation, Baby Boomers and Gen Xers all fell by 5-7% in the previous decade. But the rate for Millennials is more than double that (51% in 2010 to 36% in 2020), suggesting that a larger factor is generational replacement. Regardless of whether is primarily a generational story, though, age is today highly correlated with religion.

When we talk about religious people in the US today, we are mostly talking about older religious people.

One more: veterans. “Millennial myth debunker” economist Gary Kimbrough has done great work visualizing generational trends based on IPUMS data. A dramatically higher percentage of Baby Boomers are veterans than are Millennials. Looking at the chronology, only a small amount of this trend can be explained by the fact that Millennials are still active duty and have not yet retired from service.

So, one more time: when we talk about veterans in the US today, we are mostly talking about older veterans. This one might be more intuitive, but the longer time series demonstrates that being a veteran used to be much more evenly spread throughout society, a unifying source of identity that transcended generational identity.

Educational attainment, sexual orientation, gender identity, union membership: all of these drivers of political attitudes and sources of identity are—today—correlated with age and thus generation. Insofar as identity alignment enhances the role of overlapping identities, per Liliana Mason’s argument about partisanship, we should expect age/generation to be a defining cleavage point in mass political behavior in the current period. My book argues this at length, and I hope I’ve convinced you that age-alignment is substantively important in the US today.

But the initial conversation about age, race and COVID serves as a useful point of departure for an open question, one in which I’m deeply invested. In the build-up to founding the Journal of Quantitative Description: Digital Media, I argued that quantitative description is necessary for setting the academic agenda, establishing null hypotheses (or, better, informing priors), and providing the covariates that are necessary for generalizability. But what, exactly, is quantitative descriptive research? And how can we do quantitative descriptive research in a way that provides the most utility to other scholars and our fellow citizens? Over the past year of editing the JQD, I have begun to induct two helpful criteria.

Lets explore the intuitions revealed by the criticism of the age/race/COVID issue. At first blush, Leonhardt's description in the NYT is reasonable: quantitative description based on race is common in both scholarly research and public-facing communication, and it often reveals racial disparities that we can acknowledge and (optimistically) rectify.

Criterion 1: We should prefer quantitative description where the categories (race) and measures (death from COVID) are more “natural”: they are commonly understood and used to divide up our social world.

The issue is the now well-established correlation between age and race in the US today. The words “white Americans” and “Black Americans” refer to approximately 200 million and 45 million people, respectively. The descriptive fact is that a higher percentage of the former group than the latter group died from COVID.

The source of the criticism is that news headlines (or even personal tweets) which invoke a quantitative descriptive difference between white and Black Americans invoke a causal model: the cause of the disparity is racist discrimination, be it active, implicit or inherited. This wasn't always the case. In the early 20th century, an analogous descriptive claim might have invoked a causal model based on scientific racism; in the 1970s, it might have invoked a causal model based on cultural deficiencies.

Each of us has a cognitive map of the social world, constructed from a lifetime of lived experience, media consumption, and conversation. This map is not “purely descriptive” in the sense of atheoretical: it includes relationships between categories and measures, both correlational and causal. It's useful to think of a “culture” as the overlap between these maps.

It's *not* useful, though, to think about descriptive facts as existing outside of our cultural maps. Communication is most usefully defined with respect to the recipient. So defending the headline about age/race/COVID as "true" is besides the point: given our cultural map, this "true" description seems likely to cause people to infer that racist discrimination is less important than they did prior. The endogeneity between age, race, and COVID lethality limits our ability to learn from this descriptive fact.

Criterion 2: We should prefer quantitative description that causes other scholars and citizens to update their cognitive maps correctly.

This is...not exactly satisfying! A subjective criterion, mixing positive and normative claims, based on a *causal* relationship, to evaluate the quality of descriptive research? I would love to hear any counter-proposals or refinements, please feel free to comment, but I am convinced that eliminating subjectivity from social science is impossible.

Thankfully, I believe that there are enough free lunches to keep us fed, especially when it comes to the quantitative description of digital media. This is the classic Pragmatist move. Rather than get bogged down trying to solve fundamental epistemological problems, simply dis-solve them. We get to decide what social science is, and I think our decision should give far more weight to our fellow democratic citizens than to, like, Descartes.

The free lunches here come from the dynamism and scope of digital media. There's just so much that no one can develop their cognitive map of it through personal experience or narrative media (including narrative journalism). And yet it is massively important, reshaping the social and political world. Just as the previous century saw the development of public opinion polling as a new way for democracies to understand their citizens on an unprecedented scale, computational social scientists are able and I believe *obliged* to make this new digital world legible as a necessary precondition to bringing it under democratic control. Causal theorizing is unlikely to produce the right questions, let alone the right answers, in the absence of justified beliefs about the digital media territory.

Our evaluation of Criterion 2 is more likely to be positive in contexts where the cultural cognitive map is sketchier, less filled-in. This produces a strong preference for research about digital media, but also for research outside of Twitter and Facebook use in the US and Europe. At the JQD:DM, we’d love to see more research about digital media in Asia, Latin America and Africa; about emerging platforms like TikTok, Twitch and Roblox.

What’s the deal with Roblox? Do you have any idea? No one over 25 does! But by some estimates, more than two-thirds of US kids between 9 and 12 are playing it. What’s that doing to them? What are they learning?

It’s so exciting to learn about new digital media; it’s like there are new islands rising up from the ocean every day, new territory to explore. And no one knows what they don’t know. “Is Roblox going to have a misinformation problem? Racist harassment? Political polarization?” How could you even begin to answer these questions without some baseline knowledge of what’s going on?

Never Met a Science

In Today's US, Age is Always a Confounder

What makes for good quantitative description?