Please email all comments/questions to bstewart [AT] fas.harvard.edu
The Structural Topic Model is a general framework for topic modeling with document-level covariate information. The covariates can improve inference and qualitative interpretability and are allowed to affect topical prevalence, topical content or both. The software package implements the estimation algorithms for the model and also includes tools for every stage of a standard workflow from reading in and processing raw text through making publication quality figures.
The package currently includes functionality to:
- ingest and manipulate text data
- estimate Structural Topic Models
- calculate covariate effects on latent topics with uncertainty
- estimate a graph of topic correlations
- compute model diagnostics and summary measures
- create all the plots used in our various papers
- Coppola, Roberts, Stewart and Tingley. ``stmCorrViz: A Tool for Structural Topic Model Visualizations." . This package uses `D3 to generate an interactive hierarchical topic explorer.
- More coming soon...
- Roberts, Stewart and Tingley. ``stm: R Package for Structural Topic Models"
- Roberts, Stewart and Tingley. `` Navigating the Local Modes of Big Data: The Case of Topic Models'' In Data Analytics in Social Science, Government, and Industry. New York: Cambridge University Press. Forthcoming.
- Roberts, Stewart, Tingley, and Airoldi. ``The Structural Topic Model and Applied Social Science.'' Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation. 2013.
- Roberts, Stewart, Tingley, Lucas, Leder-Luis, Gadarian, Albertson, and Rand. ``Structural topic models for open-ended survey responses.'' American Journal of Political Science. 2014.
- Lucas, Nielsen, Roberts, Stewart, Storer, and Tingley. ``Computer assisted text analysis for comparative politics.'' Political Analysis. 2015.
- Roberts, Stewart and Airoldi. "A model of text for experimentation in the social sciences"
- Reich, Tingley, Leder-Luis, Roberts and Stewart. "Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses" Journal of Learning Analytics. Forthcoming.
The package is available on CRAN and can be installed using:
You can always get the most stable development release from the Github repository. Assuming you already have R installed (if not see http://www.r-project.org/), the easiest way to install from the Github repository is to use the devtools package. First you have to install devtools using the following code. Note that you only have to do this once
Then you can load the package and use the function
Note that this will install all the packages suggested and required to run our package. It may take a few minutes the first time, but this only needs to be done on the first use. In the future you can update to the most recent development version using the same code.
You can also grab the binaries or source files for the latest release here: (https://github.com/bstewart/stm/releases). Then use
repos=NULL so that
install.packages(filepath, repos = NULL)
See the vignette for several example analyses. The main function to estimate the model is
stm() but there are a host of other useful functions. If you have your documents already converted to term-document matrices you can ingest them using
readCorpus(). If you just have raw texts you will want to start with
Have a large text corpus or need a language we don't provide support for? See our sister project txtorg