4 Comments
User's avatar
Andy Hall's avatar

Lots of great thoughts here Kevin! In a world of AI slop it’s possible that the journals become more important rather than less, if they can become the trusted curators, but I agree they’ll need to change dramatically to pull that off.

Having research controlled by corporate AI systems—and more generally having thought controlled this way—is one of the single biggest problems that’s coming for the world. It doesn’t seem for now like open source AI is keeping up so we’re going to have to come up with other ideas for how to either keep AI decentralized when it comes to knowledge production (so we aren’t too reliant on a single model) or else make sure our preferences and values rule over the AI.

I’ve been thinking about how to do this and would love to discuss.

(Btw on my piece: there really wasn’t a lot of prompt engineering and I didn’t iterate at all. Claude wrote the initial prompt which is why it’s so well detailed)

Deanna Hoffmann's avatar

The old adage, "Publish or perish" takes on a new perspective. With the ability to curate a portfolio, 'self appointed scholars' may more easily gain entry into positions which traditionally were earned over time invested producing reliable quality. A whole new way to 'fake it till you make it' has come into play. Without intentional efforts to direct an alternative course...all the slop will erode and obscure the purity of the institution.

Paul Staniland's avatar

I am pretty blown away by recent progress. I recently introduced Claude to a new concept I'm trying to develop (about civil-military relations) with some definitions and example cases of each. After some back-and-forth it created excellent coding rules for RA's to measure the concept. But then out of curiosity, I asked it to just go ahead and code a bunch of cases itself using those rules. And it did an awfully good job, all things considered, and was able to produce a plausible-looking (based on the cases I know well) cross-national dataset with some qualitative justification for various codings, while also highlighting cases it wasn't sure about, ambiguities of measurement, etc. The big problem is that it can't point to specific sources, which is obviously a huge issue. But I wonder what norms will develop around this - will it become acceptable to start with an LLM "first draft" that humans then check and source?

Pia Deshpande's avatar

Hi Kevin! Great stuff here as always. I had some similar thoughts to Andy, and some new ones.

(1) You're right that LLM-assisted research is here. It's increasingly popular, and frankly, I've begun to use proprietary LLMs for code assistance pretty regularly. Conditional on my already knowing how to code relatively well, it's improved my productivity and taught me some new things.

(2) I, like you and Andy, am worried about the widespread use of corporate LLMs influencing the quality of the work we produce and the results we find --- particularly when leveraged to write. These companies have incentives to be simpatico with political regimes in power --- especially in the U.S. case --- and I do wonder how that impacts the text they produce (ex. there are really easy examples of political LLM outputs from models like Grok, but I'm worried about more subtle versions of it showing up in GPT and Claude too).

(3) I also worry about replicability. When researchers use LLMs for replication tasks, and the LLMs are proprietary with version changes that drastically impact how models behave, how do I replicate what you did with Claude two years ago? Six months ago? Depending on whether or not there's been a version update, it doesn't seem easy to do. I worry about the next crisis in replicability in the social sciences being driven by a combination of ambitious and lazy work that relies on proprietary LLMs that we just can't replicate.

(4) Finally, I think we should take more seriously the threat of propaganda through proprietary LLMs. I think of research led by Hannah Waight (with Eddie Yang, Yin Yuan, Solomon Messing, Margaret Roberts, Brandon Stewart, and Joshua Tucker) that shows that propaganda produced by Chinese state media is influencing LLMs, and we see discrepancies in LLM outputs when prompted in Cantonese versus Mandarin. This should worry us even in non-authoritarian contexts. In the U.S. context --- a context that is trending more authoritarian by the day --- this means we should be spending more time auditing the text that LLMs produced when asked to do tasks like tell users who they should vote for or even just summarize the news. And, when researchers ask LLMs to write or to classify political content, we should be very aware that the classification we receive could be influenced by political actors (in addition to being potentially inaccurate and hard to replicate).

Thanks for writing your piece! Would love to talk more, and will be thinking about this problem for a long time --- it's not going away any time soon.