View on GitHub

stm

An R Package for the Structural Topic Model

Download this project as a .zip file Download this project as a tar.gz file

Download the Vignette

Authors: Molly Roberts, Brandon Stewart and Dustin Tingley

Please email all comments/questions to bstewart [AT] fas.harvard.edu

Summary

This page contains information on the stm package for R. It implements variational EM algorithms for estimating topic models with covariates in a framework we call the Structural Topic Model (stm).

The package currently includes functionality to:

Other Resources

Have a large text corpus or need a language we don't provide support for? See our sister project txtorg

Papers on the Structural Topic Model:

Installation Instructions

The package is available on CRAN and can be installed using:

install.packages("stm")

You can always get the most stable development release from the Github repository. Assuming you already have R installed (if not see http://www.r-project.org/), the easiest way to install from the Github repository is to use the devtools package. First you have to install devtools using the following code. Note that you only have to do this once

if(!require(devtools)) install.packages("devtools")

Then you can load the package and use the function install_github

library(devtools)
install_github("bstewart/stm",dependencies=TRUE)

Note that this will install all the packages suggested and required to run our package. It may take a few minutes the first time, but this only needs to be done on the first use. In the future you can update to the most recent development version using the same code.

You can also grab the binaries or source files for the latest release here: (https://github.com/bstewart/stm/releases). Then use install.packages with repos=NULL so that

install.packages(filepath, repos = NULL)

Getting Started

See the vignette for several example analyses. The main function to estimate the model is stm() but there are a host of other useful functions. If you have your documents already converted to term-document matrices you can ingest them using readCorpus(). If you just have raw texts you will want to start with textProcessor().