How to Train Your Stochastic Parrot: Large Language Models for Political Texts
Published:
Large language models pre-trained on massive corpora of text from the Internet have transformed the way that computer scientists approach natural language processing over the past five years. But these “foundation models” have yet to see widespread adoption in the social sciences, partly due to their novelty and upfront costs. In this paper, we demonstrate that such models can be effectively applied to a wide variety of text-as-data tasks in political science–including sentiment analysis, ideological scaling, and topic modeling. In a series of pre-registered analyses, this approach outperforms conventional supervised learning methods without the need for extensive data pre-processing or large sets of labeled training data. And performance is comparable to expert and crowd-coding methods at a fraction of the cost. We explore the accuracy-cost tradeoff associated with adding more model parameters, and discuss how best to adapt and validate the models for particular applications.
Recommended citation: Ornstein, J. T., Blasingame, E. N., & Truscott, J. S. (2022). How to Train Your Stochastic Parrot: Large Language Models for Political Texts.