Topic models are widely used statistical models for reducing the dimensionality of textual data. Although the approach is quantitative in nature, model selection and validation of topic model results can be quite labor intensive, as it requires qualitative inspection of many documents and terms. This is were stminsights comes in: the package enables interactive validation, interpretation and visualization of one or several Structural Topic Models (stm). In case you are not familiar with structural topic models, the stm package vignette is an excellent starting point.
Stminsights can be installed from CRAN by running
You can also download the latest development version of the app by
For Windows users installing from Github requires proper setup of Rtools.
The main part of stminsights is an interactive shiny application,
which requires a
.RData file as input. This file should
outwhich was used to fit your stm models and contains documents, vocabulary and metadata.
As an example, the following code uses the quanteda package to prepare the gadarian
corpus for structural topic modeling. Afterwards, two models and
estimates effects are computed and all objects required for stminsights
are stored in
library(stm) library(quanteda) # prepare data <- corpus(gadarian, text_field = 'open.ended.response') data docvars(data)$text <- as.character(data) <- tokens(data, remove_punct = TRUE) |> data tokens_wordstem() |> tokens_remove(stopwords('english')) |> dfm() |> dfm_trim(min_termfreq = 2) <- convert(data, to = 'stm') out # fit models and effect estimates <- stm(documents = out$documents, gadarian_3 vocab = out$vocab, data = out$meta, prevalence = ~ treatment + s(pid_rep), K = 3, verbose = FALSE) <- estimateEffect(1:3 ~ treatment + s(pid_rep), gadarian_3, prep_3 meta = out$meta) <- stm(documents = out$documents, gadarian_5 vocab = out$vocab, data = out$meta, prevalence = ~ treatment + s(pid_rep), K = 5, verbose = FALSE) <- estimateEffect(1:5 ~ treatment + s(pid_rep), gadarian_5, prep_5 meta = out$meta) # save objects in .RData file save.image('stm_gadarian.RData')
After preparing the
.RData file, the shiny application
can be launched with
Hovering over UI elements displays tooltips that assist users in navigating through the application. Stminsights is organized as a dashboard with multiple columns that serve different purposes:
Info & Topics:
.RDatafile, select models and as effect estimates
Although the shiny application includes several options for exporting and visualizing the output from structural topic models, users may wish to create their own plots in different formats. For such cases stminsights offers three utility functions that can be used outside of the shiny application:
get_effects(): create a dataframe including prevalence effects for one stm model
get_network(): create a tidygraph for a correlation network of stm topics
get_diag(): create a dataframe including statistical diagnostics for several models