vignettes/introduction2.Rmd
introduction2.Rmd
In this vignette, you will learn:
BayesHMM is an R Package to run full Bayesian inference on Hidden Markov Models (HMM) using the probabilistic programming language Stan [cite]. By providing an intuitive, expressive yet flexible input interface, we enable researchers to profit the most out of the modern Bayesian workflow. We provide the user with an expressive interface to mix and match a wide array of options for the observation and latent models, including ample choices of densities, priors, and link functions whenever covariates are present. The software enables users to fit HMM with time-homogeneous transitions as well as time-varying transition probabilities. Priors can be set for every model parameter. Implemented inference algorithms include forward (filtering), forward-backwards (smoothing), Viterbi (most likely hidden path), prior predictive sampling, and posterior predictive sampling. Graphs, tables and other convenience methods for convergence diagnosis, goodness of fit, and data analysis are provided.
We aspire to produce a high-quality open-source software that is able to:
We designed the software to have a unified naming convention across programming code (both in R and Stan) and documentations (vignette, manual, and function help). Minor, and hopefully obvious, variants may be found due to different syntax restrictions in R and Stan.
We selected the snake case for methods and functions (e.g. validate_calibration
), lower camel case for function arguments and variables (e.g. optimizing(nRuns = 1, nCores = 1, ...
), and upper camel case for density functions (e.g. Gaussian
, NegativeBinomial
). We use verbs to reflect actions, such as compile
, explain
, and run
. Exceptions include sampling
and optimizing
to avoid masking the base R methods sample
and optimize
.
Most of the model quantities keep their mathematical notation (e.g. classify_alpha
, extract_K
, extract_zstar
). These are listed in Table .
Constants | |
---|---|
R | Observation dimension |
K | Number of hidden states |
M | Number of covariates for the observation model |
P | Number of covariates for the transition model |
Q | Number of covariates for the initial model |
Covariates | |
x_t | Time-varying covariates for the observation model |
u_t | Time-varying covariates for the transition model |
v | Covariates for the initial model |
Known-stochastic quantities | |
y_t | Observation vector |
Model parameters | |
*_kr | Ex. mu_11 or sigma_11 (suffixes k and r are optional) |
A | Transition model parameters (if no covariates) |
pi | Initial distribution parameters (if no covariates) |
xBeta | Regression parameters for the observation model |
uBeta | Regression parameters for the transition model |
vBeta | Regression parameters for the initial model |
Estimated quantities | |
z_t | Hidden state |
alpha_t | Filtered probability |
gamma_t | Smoothed probability |
zStar | Jointly most likely path (Viterbi) |
(Prior/Posterior) predictive quantities | |
yPred | Sample of observations drawn from the predictive density |
zPred | Sample of latent path drawn from the predictive density |
The typical data analysis workflow includes the following steps:
At this early stage, all the steps except for 6 and 8 are implemented in our software.
This is the stage where most modeling decisions have to be made. A HMM is completely specified when we define the structure of the observation, transition, and initial distribution components. Before delving into each of these submodels, it is important to remark why model specification is a key step in the modeling workflow: the fact that a model can be specified does not guaranteed by itself that we can make inference given reasonable time and resource constrains. BayesHMM is purposedly flexible to allow an extremely wide variety of hidden markov models, yet the user has to take care of Label switching, Using priors to center parameters. Using priors to break symmetry.
Programmatically, HMM are specified by calling the hmm
function:
hmm(
K = 3, R = 2,
observation = {...}
initial = {...}
transition = {...}
name = "Model name"
)
where \(K\) is the number of hidden states and \(R\) is the number of dimensions in the observation variable. The name is a string used for model printouts.
The observation
, initial
, and transition
arguments rely on S3 objects called Density
that specify density form, parameter priors, and fixed values for parameters. These allow for bounds in the parameter space as well as truncation in prior densities. For instance:
Gaussian(mu = 0, sigma = 1)
specifies a Gaussian density with fixed parameters.Gaussian(mu = Gaussian(0, 10), sigma = Cauchy(0, 10, bounds = list(0, NA)))
specifies a Gaussian density with a Gaussian prior on the location parameter and a Cauchy prior on the scale parameter, whose parameter space is bounded on \([0, \infty)\).The structure of the observation model comprises the density function for the observation random variable (e.g. continuous versus discrete, bounded versus unbounded, symmetrical versus skewed) as well as the prior density for the observation model parameters.
Most Density
objects relate directly with well-established probability density functions, in some cases providing more than one parametrization for the same distribution. For example, MVGaussian
, MVGaussianCov
, and MVGaussianCor
specify a Multivariate Gaussian distribution with the location-scale parametrization, the parametrization based on the Cholesky factor of the covariance matrix, and the parametrization based on the Cholesky factor of the correlation matrix respectively. In practice, the latter proved to be the most performant parametrization in many use cases.
Other Density
objects relate to regression models, and thus represent the density function of the observed random variable conditional on fixed covariates. These include RegGaussian
, RegBernoulliLogit
(Bernoulli regression for \(y_i \in \{0, 1\}\)), RegBinomialLogit
(Binomial regression with logit link and \(N\) trials for \(y_i \in \{0, 1, \dots, N\}\)), RegBinomialProbit
(Binomial regression with probit link and \(N\) trials for \(y_i \in \{0, 1, \dots, N\}\)), RegCategoricalSoftmax
(Multinomial regression with \(N\) categories for \(y_i \in \{0, 1, \dots, N\}\)).
Currently available densities are listed below. Information about the density parameterization can be accessed via the question mark operator (e.g. ?Gaussian
).
MORE ON HOW TO SPECIFY. USE ?specify.
A model specification is simply a named nested list designed for the purpose of BayesHMM. The following step is then translate this list to Stan code, which is in turn translated to C++ and compiled into a dynamic shared object that can be loaded and sampled from.
Add legend to diagram.
The reader is advised to read Team (2017) to learn the technical details behind compilation. As this process may be time consuming (e.g. a model may take up to a minute to compile), it is desirable to compile the model once, store the object in the memory, and reuse it to call other related methods (namely sampling
, optimizing
, sim
, validate_calibration
) one or more times. If any of these methods is only given a Specification object when called, it will compile the model, run the analysis, discard the compiled object and return the results of the analysis.
# Assume a specification object called mySpec
# Most efficient: only compiles the model once.
myModel <- compile(mySpec)
myVal <- validate_calibration(myModel, N = 100)
myData <- sim(myModel, T = 500)
myFit <- sampling(myModel, y = myData$y)
# Least efficient: the model is compiled under the hood three times.
myVal <- validate_calibration(mySpec, N = 100)
myData <- sim(mySpec, T = 500)
myFit <- sampling(mySpec, y = myData$y)
After a model is compiled, it is always best to validate the correctness of the software. This step is directly related to the concepts of computational correctness and computational faithfulness as defined by Cook, Gelman, and Rubin (2006) and Betancourt (2018) respectively. The built-in tools for automatically diagnosis a model are based on these ideas as well as Talts et al. (2018).
The implementation of the core algorithms for HMM inference (namely the forward, the forward-backward and the Viterbi algorithm) are based on publicly available software (Damiano, Peterson, and Weylandt 2018). Nonetheless, it is wise to validate that the specific combination of submodels and priors chosen in the specification step functions properly. Besides detecting errors in the software, the nature of these deviances may be informative about the nature and location of such errors.
The goal is to verify that the software recovers accurately the unknown quantities. If we know the true value of the unknown quantities that generate a dataset, we can compute the estimates and verify that they provide a good approximation. The main challenge is these results are inherently stochastic, and thus we need to account for variability even when running the software under ideal conditions.
In the specification stage, we defined the prior distribution of the parameter vector \(p({\mathbf{\Theta}})\) and the sampling distribution of the data \(p({\mathbf{y}} | {\mathbf{\Theta}})\). These specify the model, that is the joint distribution of the observation vector \(p({\mathbf{y}}) = p({\mathbf{y}} | {\mathbf{\Theta}}) p({\mathbf{\Theta}})\). Our validation protocol is as follows:
The prior predictive model is defined as the model where the posterior density equals to the prior density, i.e. the posterior density is not informed by the sampling distribution of the data. EXPLAIN POSTERIORS AND PRIORS HERE WITH EQUATIONS.
TABLE WITH ALL THE QUANTITIES AND THE INTERPRETATION, RULE OF THUMB, REFERENCE.
Betancourt, Michael. 2018. “Towards a Principled Bayesian Workflow.” https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html.
Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). Foundation for Open Access Statistic. doi:10.18637/jss.v076.i01.
Cook, Samantha R, Andrew Gelman, and Donald B Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” Journal of Computational and Graphical Statistics 15 (3). Informa UK Limited: 675–92. doi:10.1198/106186006x136976.
Damiano, Luis, Brian Peterson, and Michael Weylandt. 2018. “A Tutorial on Hidden Markov Models Using Stan.” Zenodo. doi:10.5281/zenodo.1284341.
Gabry, Jonah, Tristan Mahr, Paul-Christian Bürkner, Martin Modrák, and Malcolm Barrett. 2018. “bayesplot: Plotting for Bayesian Models.” https://CRAN.R-project.org/package=bayesplot.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2017. “Visualization in Bayesian Workflow,” September. http://arxiv.org/abs/1709.01449v5.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. Taylor & Francis Ltd.
Stan Development Team. 2018. “RStan: The R Interface to Stan.” http://mc-stan.org/.
Talts, Sean, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. 2018. “Validating Bayesian Inference Algorithms with Simulation-Based Calibration,” April. http://arxiv.org/abs/1804.06788v1.
Team, Stan Development. 2017. Stan Modeling Language: User’s Guideand Reference Manual. Version 2.17.0. http://mc-stan.org/users/documentation/.
Although Cook, Gelman, and Rubin (2006) state that the samples do not need to be independent nor jointly exchangeable, in our implemenetation they are generated independently. Additionally, although the second step may be more naturally thought as a substep of #4, it is more computationally efficient to produce the \(N\) independent samples in one run.↩