stm

Download the Vignette

Authors: Molly Roberts, Brandon Stewart and Dustin Tingley

Please email all comments/questions to bms4 [AT] princeton.edu

News

July 31, 2018

We were honored to win the Political Methodology Society's Statistical Software Award for 2018.
Added Mikael Johannesson's stmprinter package to Supporting Packages.
More updates coming soon...

Jan 27, 2018

Added a news section.
See a fun demo of stm from Julia Silge.
Updated supporting packages and papers.
New version is up on CRAN.

Summary

The Structural Topic Model is a general framework for topic modeling with document-level covariate information. The covariates can improve inference and qualitative interpretability and are allowed to affect topical prevalence, topical content or both. The software package implements the estimation algorithms for the model and also includes tools for every stage of a standard workflow from reading in and processing raw text through making publication quality figures.

The package currently includes functionality to:

ingest and manipulate text data
estimate Structural Topic Models
calculate covariate effects on latent topics with uncertainty
estimate a graph of topic correlations
compute model diagnostics and summary measures
create all the plots used in our various papers

Methods Papers

Egami, Fong, Grimmer, Roberts and Stewart. "How to make causal inferences using texts."
Roberts, Stewart and Nielsen. "Adjusting for Confounding with Text Matching"
Roberts, Stewart and Airoldi. "A model of text for experimentation in the social sciences" Journal of the American Statistical Association. 2016.
Roberts, Stewart and Tingley. Navigating the Local Modes of Big Data: The Case of Topic Models" In Data Analytics in Social Science, Government, and Industry. New York: Cambridge University Press. 2016.
Lucas, Nielsen, Roberts, Stewart, Storer, and Tingley. "Computer assisted text analysis for comparative politics." Political Analysis. 2015.
Roberts, Stewart, Tingley, Lucas, Leder-Luis, Gadarian, Albertson, and Rand. "Structural topic models for open-ended survey responses." American Journal of Political Science. 2014.
Roberts, Stewart, Tingley, and Airoldi. "The Structural Topic Model and Applied Social Science." Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation. 2013.

Supporting Packages

Johannesson. stmprinter: Print multiple stm model dashboards to a pdf file for inspection. Beautiful automated reports from multiple stm runs.
Schwemmer. stminsights: A Shiny Application for Inspecting Structural Topic Models. A shiny GUI with beautiful graphics.
Woloszynek. themetagenomics: Exploring Thematic Structure and Predicted Functionality of 16s rRNA Amplicon Data. . STM for rRNA data.
Johannesson. "tidystm: Extract (tidy) effects from estimateEffect" . Makes it easy to make ggplot2 graphics for STM.
Zangri, Tingley, Stewart. "stmgui: Shiny Application for Creating STM Models" . This is a Shiny GUI for running basic STM models.
Freeman, Chuang, Roberts, Stewart and Tingley. "stmBrowser: An R Package for the Structural Topic Model Browser.'' . This D3 visualization allows users to interactively explore the relationships between topics and the covariates estimated from the stm package in R. See an example here .
Coppola, Roberts, Stewart and Tingley. "stmCorrViz: A Tool for Structural Topic Model Visualizations." . This package uses `D3 to generate an interactive hierarchical topic explorer.

Published Applications

If you have published a paper using stm that you would like to see included here please email us.

Curry and Fix. "May it please the twitterverse: The use of Twitter by state high court judges" Journal of Information Technology & Politics. 2019.
Schwemmer and Jungkunz. "Whose ideas are worth spreading? The representation of women and ethnic groups in TED talks" Political Research Exchange. 2019.
Bittermann and Klos. "Does Psychological Research Address Current Social Issues? A Scientometric Analysis of the Example of Refugees and Migration Using Topic Modeling" Psychologische Rundschau. 2019.
Fischer-Preßler, Schwemmer and Fischbach. "Collective sense-making in times of crisis: Connecting terror management theory with twitter reactions to the Berlin terrorist attack" Computers in Human Behavior. 2019.
Mourtgos and Adams. "The rhetoric of de-policing: Evaluating open-ended survey responses from police officers with machine learning-based structural topic modeling" Journal of Criminal Justice. 2019.
Rodriguez and Storer. "A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data"Journal of Technology in Human Services. 2019.
Shirokanova and Silyutina. "Internet Regulation Media Coverage in Russia: Topics and Countries" WebSci '18 Proceedings of the 10th ACM Conference on Web Science. 2019.
Shirokanova and Silyutina. "Internet Regulation: A Text-Based Approach to Media Coverage" International Conference on Digital Transformation and Global Society. 2019.
Anzoise, Salnzi and Poli. "Local stakeholders’ narratives about large-scale urban development: The Zhejiang Hangzhou Future Sci-Tech City" Urban Studies. 2019.
Grajzl and Murrell. "Toward Understanding 17th Century English Culture: A Structural Topic Model of Francis Bacon's Ideas" Journal of Comparative Economics. 2019.
Grajzl and Irby. "Reflections on Study Abroad: A Computational Linguistics Approach" Journal of Computational Social Science. 2019.
Geese. "Immigration-related Speechmaking in a Party-constrained Parliament: Evidence from the ‘Refugee Crisis’ of the 18th German Bundestag (2013–2017)" German Politics. 2019.
Zafari and Ekin. "Topic modelling for medical prescription fraud and abuse detection" Journal of the Royal Statistical Society Series C. 2018.
Kim. "Media Bias against Foreign Firms as a Veiled Trade Barrier: Evidence from Chinese Newspapers" American Political Science Review. 2018.
Schwemmer and Ziewiecki. "Social media Sellout: The Increasing Role of Product Promotion on YouTube." Social Media + Society. 2018.
Dybowski and Adämmer. “The economic effects of U.S. presidential tax communication: Evidence from a correlated topic model” European Journal of Political Economy 2018.
Rothschild, Howat, Shafranek, Busby. "Pigeonholing Partisans: Stereotypes of Party Supporters and Partisan Polarization." Political Behavior 2018.
Chandelier, Steuckardt, Mathevet, Diwersy, Gimenez. "Content analysis of newspaper coverage of wolf recolonization in France using structural topic modeling." Biological Conservation 2018.
Cerchiello and Nicola. "Assessing News Contagion in Finance" Econometrics 2018.
Nelson, Laura K. "Computational Grounded Theory: A Methodological Framework" Sociological methods & Research 2018.
Bohr and Dunlap. "Key Topics in environmental sociology, 1990–2014: results from a computational text analysis"Environmental Sociology 2018.
Banks, Woznyj, Wesslen and Ross. "A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App)"Journal of Business and Psychology 2018.
Hagen, Harrison and Dumas. "Data Analytics for Policy Informatics: The Case of E-Petitioning" Policy Analytics, Modelling, and Informatics 2018.
Kuhn, Kenneth D. "Using structural topic modeling to identify latent topics and trends in aviation incident reports" Transportation Research Part C: Emerging Technologies 2018.
Tvinnereim, Flottum, Gjerstad, Johannesson and Nordo. "Citizens’ preferences for tackling climate change. Quantitative and qualitative analyses of their freely formulated solutions "Global Environmental Change 2017.
Mildenberger and Tingley. "Beliefs about Climate Beliefs: The Importance of Second-Order Opinions for Climate Politics"British Journal of Political Science 2017.
Terman. "Islamophobia and Media Portrayals of Muslim Women: A Computational Text Analysis of US News Coverage "International Studies Quarterly 2017.
Chakrabarti and Frye. "A mixed-methods framework for analyzing text data: Integrating computational techniques with qualitative methods in demography"Demographic Research 2017.
Bail, Brown and Mann. "Channeling Hearts and Minds: Advocacy Organizations, Cognitive-Emotional Currents, and Public Conversation "American Sociological Review 2017.
McInerney, Doherty, Bindoff, Robinson and Vickers. "How is palliative care understood in the context of dementia? Results from a massive open online course"Palliative Medicine 2017.
Gupta, Wang, Lin, Hong, Sun, Liebman, Stern, Dasgupta and Roberts. "Toward Building a Legal Knowledge-Base of Chinese Judicial Documents for Large-Scale Analytics"Legal Knowledge and Information Systems 2017.
Zhang, Qiang and Jiang. "Finding Academic Concerns on Real Estate of U.S. and China: A Topic Modeling Based Exploration"Proceedings of the 21st International Symposium on Advancement of Construction Management and Real Estate 2017.
Chow, Kumar, Ouyang, Zhong, Lee and Inverso. "What can Physicians learn from Social Forums: Insights from an on-line Self Help and Support Group" Computational Advances in Bio and Medical Sciences (ICCABS), 2017 IEEE 7th International Conference 2017.
Moeller, Munksgaard and Demant. "Flow My FE the Vendor Said: Exploring Violent and Fraudulent Resource Exchanges on Cryptomarkets for Illicit Drugs"American Behavioral Scientist 2017.
Gwak and Sohn. "Identifying the trends in wound-healing patents for successful investment strategies "PLOS One 2017.
Light, Ryan and Colin Odden. "Managing the Boundaries of Taste: Culture, Valuation, and Computational Social Science" Social Forces 2017.
Kuhn, Kenneth D. "Topics and Trends in Incident Reports: Using Structural Topic Modeling to Explore Aviation Safety Reporting System Data" Twelfth USA/Europe Air Traffic Management Research and Development Seminar (ATM2017) 2017: 1-10.
Kim, In Song. "Political Cleavages within Industry: Firm-level Lobbying for Trade Liberalization" American Political Science Review 2017.
Tingley, Dustin. "Rising Power on the Mind." International Organization. 2017.
Lynam, Timothy. "Exploring social representations of adapting to climate change using topic modeling and Bayesian networks"Ecology and Society. 2016.
Tvinnereim, Endre, Xiaozi Liu, and Eric M. Jamelske. "Public perceptions of air pollution and climate change: different manifestations, similar causes, and concerns." Climatic Change 2016: 1-14.
Truex, Rory. Making Autocracy Work. Cambridge University Press. 2016.
Kolar, Mladen and Matt Taddy. "Discussion of 'Coauthorship and Citation Networks for Statisticians'" The Annals of Applied Statistics 2016.
Bauer, Paul C., Pablo Barberá, Kathrin Ackermann, Aaron Venetz. "Is the Left-Right Scale a Valid Measure of Ideology? Individual-Level Variation in Associations with "Left"" and "Right"" and Left-Right Self-Placement" Political Behavior 2016.
Sachdeva, Sonya, Sarah McCaffrey and Dexter Locke. "Social media approaches to modeling wildfire smoke dispersion: spatiotemporal and social scientific investigations." Information, Communication & Society. 2016.
Munksgaard, Rasmus and Jakob Demant. "Mixing politics and crime- thre prevalence and decline of political discourse on the cryptomarket." International Journal of Drug Policy. 2016.
Huff, Connor and Dominika Kruszewska. "Banners, Barricades, and Bombs The Tactical Choices of Social Movements and Public Opinion" Comparative Political Studies. 2016.
Bail, Christopher A. "Cultural carrying capacity: Organ donation advocacy, discursive framing, and social media engagement." Social Science & Medicine. 2016.
Law, David S. "Constitutional Archetypes" Texas Law Review. 2016.
Farrell, Justin. "Corporate funding and ideological polarization about climate change" Proceedings of the National Academy of Sciences. 2016.
Wang, Baiyang and Diego Klabjan. "Temporal Topic Analysis with Endogenous and Exogenous Processes." Thirtieth AAAI Conference on Artificial Intelligence. 2016.
Reich, Stewart, Mavon and Tingley "The Civic Mission of MOOCs: Measuring Engagement across Political Differences in Forums." Association for Computing Machinery: Learning at Scale. 2016.
Tvinnereim, Endre and Kjersti Flottum. "Explaining topic prevalence in answers to open-ended survey questions about climate change" Nature Climate Change. 2015.
Mishler, Alan, Erin Smith Crabbm Susannah Paletz, Brook Hefright, Ewa Golonka. "Using Structural Topic Modeling to Detect Events and Cluster Twitter Users in the Ukrainian Crisis." International Conference on Human-Computer Interaction. 2015.
Milner, Helen and Dustin Tingley. Sailing the Water's Edge: The Domestic Politics of American Foreign Policy . Princeton University Press. 2015.
Romney, David, Brandon Stewart and Dustin Tingley. " Plain Text: Transparency in the Acquisition, Analysis, and Access Stages of the Computer-assisted Analysis of Texts." Qualitative and Multi-Method Research. 2015.
Genovese, Federica. "Politics ex cathedra: Religious authority and the Pope in modern international relations" Research & Politics 2015.
Reich, Tingley, Leder-Luis, Roberts and Stewart. "Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses" Journal of Learning Analytics. 2015.

Installation Instructions

The package is available on CRAN and can be installed using:

install.packages("stm")

You can always get the most stable development release from the Github repository. Assuming you already have R installed (if not see http://www.r-project.org/), the easiest way to install from the Github repository is to use the devtools package. First you have to install devtools using the following code. Note that you only have to do this once

if(!require(devtools)) install.packages("devtools")

Then you can load the package and use the function install_github

library(devtools)
install_github("bstewart/stm",dependencies=TRUE)

Note that this will install all the packages suggested and required to run our package. It may take a few minutes the first time, but this only needs to be done on the first use. In the future you can update to the most recent development version using the same code.

Getting Started

See the vignette for several example analyses.

Funding Sources

This material is based upon work supported by the National Science Foundation under Grant Number 1738288. We are also grateful to have received supporting funds from the Spencer Foundation, the Hewlett Foundation and Princeton's Center for Statistics and Machine Learning. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.