- 1 The Input-Output Hidden Markov Model
- 2 Learning by simulation
- 3 Stock Market Forecasting Using Hidden Markov Model
- 4 Original Computing Environment
- References

This work aims at replicating the Input-Output Hidden Markov Model (IOHMM) originally proposed by Hassan and Nath (2005) to forecast stock prices. The main goal is to produce public programming code in Stan (Carpenter et al. 2016) for a fully Bayesian estimation of the model parameters and inference on hidden quantities, namely filtered state belief, smoothed state belief, jointly most probable state path and fitted output. The model is introduced only briefly, a more detailed mathematical treatment can be found in our literature review.

The authors acknowledge Google for financial support via the Google Summer of Code 2017 program.

Model complexity and limited software availability requires that we implement our sampler from scratch. Following a very simplified version of the methodology proposed by Cook, Gelman, and Rubin (2006), we first create a simulation routine that generates data complying with the model assumptions and we then write a MCMC sampler in Stan to recover the parameters and other hidden quantities. Only after the correctness of our software is tested, we feed our model with real data and analyse the results.

We believe that learning by simulation has many benefits, including:

- Confirmation whether the software developed to implement the algorithms works properly, i.e. it retrieves the parameters used to generate the data.
- Enhanced interpretation of the results and deeper understanding of the model behaviour, especially because the generated data meets all the underlying assumptions and the estimates are free of the effects of data contamination originated by unmodeled phenomena.
- The possibility of calibrating different combinations of values for the parameters, the inputs and the outputs help the development of intuition and insight. This may prove valuable for the data analysis stage.

We write an auxiliary R function to compute the softmax transformation of a given vector. The calculations are run in log scale for greater numerical stability, i.e. to avoid any overflow.

`source('../common/R/math.R')`

As plots are extensively used, we arrange the code in an auxiliary file.

`source('../common/R/plots.R')`

We arrange the code to generate simulated data, as explained below, in an auxiliary file.

`source('../iohmm-reg/R/iohmm-sim.R')`

We first write an R function for our generative model. The arguments are the sequence length \(T\), the number of discrete hidden states \(K\), the input matrix \(\mat{u}\), the initial state distribution vector \(\mat{\pi}_1\), a matrix with the parameters of the multinomial regression that rules the hidden states dynamics \(\mat{w}\), the name of a function drawing samples from the observation distribution and its arguments.

```
iohmm_sim <- function(T, K, u, w, p.init, obs.model, obs.pars) {
m <- ncol(u)
if (dim(u)[1] != T)
stop("The input matrix must have T rows.")
if (any(dim(w) != c(K, m)))
stop("The transition weight matrix must be of size Kxm, where m is the size of the input vector.")
if (length(p.init) != K)
stop("The vector p.init must have length K.")
p.mat <- matrix(0, nrow = T, ncol = K)
p.mat[1, ] <- p.init
z <- vector("numeric", T)
z[1] <- sample(x = 1:K, size = 1, replace = FALSE, prob = p.init)
for (t in 2:T) {
p.mat[t, ] <- softmax(sapply(1:K, function(j) {u[t, ] %*% w[j, ]}))
z[t] <- sample(x = 1:K, size = 1, replace = FALSE, prob = p.mat[t, ])
}
x <- do.call(obs.model, c(list(u = u, z = z), obs.pars))
list(
u = u,
z = z,
x = x,
p.mat = p.mat
)
}
```

The initial hidden state is drawn from a multinomial distribution with one trial and event probabilities given by the initial state probability vector \(\mat{\pi}_1\). The transition probabilities for each of the following steps \(t \in \{2, \dots, T\}\) are generated from a multinomial regression with vector parameters \(\mat{w}_k\), one set per possible hidden state \(k \in \{1, \dots, K\}\), and covariates \(\mat{u}_t\). The hidden states are subsequently sampled based on these transition probabilities.

The observation at each step may generate from a linear regressions with parameters \(\mat{b}_k\) and \(\mat{\Sigma}_k\), one set per possible hidden state.

```
obsmodel_reg <- function(...) {
args <- list(...)
u <- args$u
z <- args$z
b <- args$b
s <- args$s
K <- length(unique(z))
m <- ncol(u)
if (any(dim(b) != c(K, m)))
stop("The regressors matrix must be of size Kxm, where m is the size of the input vector.")
T.length <- nrow(u)
x <- vector("numeric", T.length)
for (t in 1:T.length) {
x[t] <- rnorm(1, u[t, ] %*% b[z[t], ], s[z[t]])
}
return(x)
}
```

Alternatively, the observation could originate in a Gaussian mixture density where \(\lambda_{kl}\) is the component weight for the \(l\)-th component within the \(k\)-th state, \(0 \le \lambda_{kl} \le 1 \ \forall \ l \in \{1, \dots, L\}, k \in \{1, \dots, K\}\) and \(\sum_{l=1}^{L}{\lambda_{kl}} = 1 \ \forall \ k\).

```
obsmodel_mix <- function(...) {
args <- list(...)
z <- args$z
lambda <- args$lambda
mu <- args$mu
s <- args$s
if (!all.equal(length(unique(z)), length(lambda), length(mu), length(s)))
stop("The size of the vector parameters lambda, mu and s must equal to the
number of different states.")
T.length <- length(z)
L <- ncol(lambda)
x <- vector("numeric", T.length)
for (t in 1:T.length) {
l <- sample(1:L, 1, FALSE, prob = lambda[z[t], ])
x[t] <- rnorm(1, mu[z[t], l], s[z[t], l])
}
return(x)
}
```

We set up our simulated experiments for a regression observation model with arbitrary values for all the involved parameters. Additionally, we define the settings for the Markov Chain Monte Carlo sampler.

```
# Data
T.length = 300
K = 3
M = 4
R = 1
u.intercept = FALSE
w = matrix(
c(1.2, 0.5, 0.3, 0.1, 0.5, 1.2, 0.3, 0.1, 0.5, 0.1, 1.2, 0.1),
nrow = K, ncol = M, byrow = TRUE)
b = matrix(
c(5.0, 6.0, 7.0, 0.5, 1.0, 5.0, 0.1, -0.5, 0.1, -1.0, -5.0, 0.2),
nrow = K, ncol = M, byrow = TRUE)
s = c(0.2, 1.0, 2.5)
p1 = c(0.4, 0.2, 0.4)
# Markov Chain Monte Carlo
n.iter = 400
n.warmup = 200
n.chains = 1
n.cores = 1
n.thin = 1
n.seed = 9000
```

We anticipate identification issues in our sampler given the nature of the model. As an arguable simplification with obvious drawbacks, we decide to rely only on one chain to avoid between-chain label switching of regression parameters. We refer the reader to Betancourt (2017) for an in-depth treatment of the diagnostics, causes and possible solutions for label switching in Bayesian Mixture Models.

We draw random inputs from a standard Gaussian distribution and generate the dataset accordingly.

```
set.seed(n.seed)
u <- matrix(rnorm(T.length*M, 0, 1), nrow = T.length, ncol = M)
if (u.intercept)
u[, 1] = 1
dataset <- iohmm_sim(T.length, K, u, w, p1, obsmodel_reg, list(b = b, s = s))
```

`plot_inputoutput(x = dataset$x, u = dataset$u, z = dataset$z)`

We observe how the chosen values for the parameters affect the generated data. For example, the relationship between the third input \(\mat{u}_3\) and the output \(\mat{x}\) is positive, indifferent and negative for the hidden states 1 through 3, the true slopes being 7, 0.1 and -5 respectively.

`plot_inputprob(u = dataset$u, p.mat = dataset$p.mat, z = dataset$z)`

We then analyse the relationship between the input and the state probabilities, which are usually hidden in applications with real data. The pairs \(\{ \mat{u}_1, p(z_t = 1) \}\), \(\{ \mat{u}_2, p(z_t = 2) \}\) and \(\{ \mat{u}_3, p(z_t = 3) \}\) are most related due to the choice of values for the true regression parameters: those inputs take the largest weight in each state, namely \(w_{11} = 1.2\), \(w_{22} = 1.2\) and \(w_{33} = 1.2\).

Adopting a fully Bayesian approach, we run our software to draw samples from the posterior density of model parameters and other hidden quantities.

`library(rstan)`

`## Loading required package: ggplot2`

`## Loading required package: StanHeaders`

`## rstan (Version 2.14.2, packaged: 2017-03-19 00:42:29 UTC, GitRev: 5fa1e80eb817)`

```
## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())
```

`library(shinystan)`

`## Loading required package: shiny`

```
##
## This is shinystan version 2.3.0
```

```
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
stan.model = '../iohmm-reg/stan/iohmm-reg.stan'
stan.data = list(
T = T.length,
K = K,
M = M,
u_tm = as.array(u),
x_t = dataset$x
)
stan.fit <- stan(file = stan.model,
data = stan.data, verbose = T,
iter = n.iter, warmup = n.warmup,
thin = n.thin, chains = n.chains,
cores = n.cores, seed = n.seed)
```

```
##
## TRANSLATING MODEL 'iohmm-reg' FROM Stan CODE TO C++ CODE NOW.
## successful in parsing the Stan model 'iohmm-reg'.
##
## CHECKING DATA AND PREPROCESSING FOR MODEL 'iohmm-reg' NOW.
##
## COMPILING MODEL 'iohmm-reg' NOW.
##
## STARTING SAMPLER FOR MODEL 'iohmm-reg' NOW.
##
## SAMPLING FOR MODEL 'iohmm-reg' NOW (CHAIN 1).
##
## Chain 1, Iteration: 1 / 400 [ 0%] (Warmup)
## Chain 1, Iteration: 40 / 400 [ 10%] (Warmup)
## Chain 1, Iteration: 80 / 400 [ 20%] (Warmup)
## Chain 1, Iteration: 120 / 400 [ 30%] (Warmup)
## Chain 1, Iteration: 160 / 400 [ 40%] (Warmup)
## Chain 1, Iteration: 200 / 400 [ 50%] (Warmup)
## Chain 1, Iteration: 201 / 400 [ 50%] (Sampling)
## Chain 1, Iteration: 240 / 400 [ 60%] (Sampling)
## Chain 1, Iteration: 280 / 400 [ 70%] (Sampling)
## Chain 1, Iteration: 320 / 400 [ 80%] (Sampling)
## Chain 1, Iteration: 360 / 400 [ 90%] (Sampling)
## Chain 1, Iteration: 400 / 400 [100%] (Sampling)
## Elapsed Time: 99.975 seconds (Warm-up)
## 47.487 seconds (Sampling)
## 147.462 seconds (Total)
```

`n.samples = (n.iter - n.warmup) * n.chains`

We rely on several diagnostic statistics and plots provided by rstan (Stan Development Team 2017a) and shinystan (Stan Development Team 2017b) to assess mixing, convergence and the inexistence of divergences. We then extract the samples for some quantities of interest, namely the filtered probabilities vector \(\mat{\alpha}_t\), the smoothed probability vector \(\mat{\gamma}_t\), the most probable hidden path \(\mat{z}^*\) and the fitted output \(\hat{x}\).

```
# MCMC Diagnostics --------------------------------------------------------
summary(stan.fit,
pars = c('p_1k', 'w_km', 'b_km', 's_k'),
probs = c(0.50))$summary
# launch_shinystan(stan.fit)
# Extraction --------------------------------------------------------------
alpha_tk <- extract(stan.fit, pars = 'alpha_tk')[[1]]
gamma_tk <- extract(stan.fit, pars = 'gamma_tk')[[1]]
zstar_t <- extract(stan.fit, pars = 'zstar_t')[[1]]
hatx_t <- extract(stan.fit, pars = 'hatx_t')[[1]]
```

Mean | MCSE | SE | Med | ESS | \(\hat{R}\) | |
---|---|---|---|---|---|---|

p_1k[1] | 0.24 | 0.01 | 0.20 | 0.20 | 200.00 | 1.00 |

p_1k[2] | 0.27 | 0.02 | 0.21 | 0.23 | 200.00 | 1.00 |

p_1k[3] | 0.49 | 0.02 | 0.24 | 0.47 | 200.00 | 1.00 |

w_km[1,1] | -0.23 | 0.30 | 2.84 | -0.33 | 89.03 | 1.00 |

w_km[1,2] | 0.10 | 0.27 | 2.95 | 0.47 | 122.00 | 1.00 |

w_km[1,3] | 0.39 | 0.29 | 2.66 | 0.46 | 84.81 | 1.02 |

w_km[1,4] | -0.18 | 0.53 | 3.30 | -0.17 | 38.12 | 1.01 |

w_km[2,1] | -0.09 | 0.30 | 2.85 | -0.19 | 88.67 | 1.00 |

w_km[2,2] | 0.27 | 0.27 | 2.95 | 0.64 | 122.88 | 1.00 |

w_km[2,3] | 0.32 | 0.29 | 2.65 | 0.36 | 84.84 | 1.02 |

w_km[2,4] | -0.53 | 0.53 | 3.30 | -0.54 | 38.28 | 1.01 |

w_km[3,1] | -0.28 | 0.30 | 2.86 | -0.38 | 90.44 | 1.00 |

w_km[3,2] | 0.14 | 0.27 | 2.96 | 0.56 | 122.08 | 1.00 |

w_km[3,3] | 0.17 | 0.29 | 2.65 | 0.15 | 86.48 | 1.02 |

w_km[3,4] | -0.37 | 0.53 | 3.30 | -0.43 | 38.08 | 1.01 |

b_km[1,1] | 0.01 | 0.03 | 0.39 | 0.00 | 181.98 | 1.02 |

b_km[1,2] | -1.05 | 0.02 | 0.29 | -1.06 | 200.00 | 1.00 |

b_km[1,3] | -4.91 | 0.03 | 0.36 | -4.90 | 200.00 | 1.00 |

b_km[1,4] | -0.09 | 0.02 | 0.33 | -0.09 | 200.00 | 1.01 |

b_km[2,1] | 0.86 | 0.01 | 0.13 | 0.86 | 200.00 | 1.00 |

b_km[2,2] | 5.00 | 0.01 | 0.08 | 5.00 | 200.00 | 1.00 |

b_km[2,3] | 0.15 | 0.01 | 0.10 | 0.15 | 200.00 | 1.00 |

b_km[2,4] | -0.28 | 0.01 | 0.13 | -0.28 | 200.00 | 1.00 |

b_km[3,1] | 5.04 | 0.00 | 0.03 | 5.04 | 200.00 | 1.01 |

b_km[3,2] | 6.01 | 0.00 | 0.02 | 6.01 | 200.00 | 1.01 |

b_km[3,3] | 6.99 | 0.00 | 0.02 | 6.99 | 200.00 | 1.00 |

b_km[3,4] | 0.46 | 0.00 | 0.03 | 0.47 | 200.00 | 1.00 |

s_k[1] | 2.74 | 0.01 | 0.21 | 2.74 | 200.00 | 1.00 |

s_k[2] | 1.00 | 0.01 | 0.09 | 0.99 | 200.00 | 1.00 |

s_k[3] | 0.21 | 0.00 | 0.02 | 0.21 | 200.00 | 1.00 |

While mixing and convergence is extremely efficient, as expected when dealing with generated data, we note that the regression parameters for the latent states are the worst performers. The small effective size translates into a high Monte Carlo standard error and broader credibility intervals. One possible reason is that softmax is invariant to change in location, thus the parameters of a multinomial regression do not have a natural center and become harder to estimate.

We assess that our software recover the hidden states correctly. Due to label switching, the states generated under the labels 1 through 3 were recovered in inverse order. In consequence, we decide to relabel the observations based on the best fit. This would not prove to be a problem with real data considering that the hidden states are never observed.

```
# Relabelling (ugly hack edition) -----------------------------------------
dataset$zrelab <- rep(0, T)
hard <- sapply(1:T.length, function(t, med) {
which.max(med[t, ])
}, med = apply(alpha_tk, c(2, 3),
function(x) {
quantile(x, c(0.50)) }))
tab <- table(hard = hard, original = dataset$z)
for (k in 1:K) {
dataset$zrelab[which(dataset$z == k)] <- which.max(tab[, k])
}
print("Label re-imputation (relabeling due to switching labels)")
```

`## [1] "Label re-imputation (relabeling due to switching labels)"`

`table(new = dataset$zrelab, original = dataset$z)`

```
## original
## new 1 2 3
## 1 0 0 102
## 2 0 108 0
## 3 90 0 0
```

Point estimates and credibility intervals are provided by rstan’s `{r}summary`

function.

```
print("Estimated initial state probabilities")
summary(stan.fit,
pars = c('p_1k'),
probs = c(0.10, 0.50, 0.90))$summary[, c(1, 3, 4, 5, 6)]
print("Estimated probabilities in the transition matrix")
head(summary(stan.fit,
pars = c('A_ij'),
probs = c(0.10, 0.50, 0.90))$summary[, c(1, 3, 4, 5, 6)])
print("Estimated regressors of hidden states")
summary(stan.fit,
pars = c('w_km'),
probs = c(0.10, 0.50, 0.90))$summary[, c(1, 3, 4, 5, 6)]
print("Estimated regressors and standard deviation of observations in each state")
summary(stan.fit,
pars = c('b_km', 's_k'),
probs = c(0.10, 0.50, 0.90))$summary[, c(1, 3, 4, 5, 6)]
```

Mean | SE | 10% | Med | 90% | |
---|---|---|---|---|---|

p_1k[1] | 0.24 | 0.20 | 0.03 | 0.20 | 0.52 |

p_1k[2] | 0.27 | 0.21 | 0.03 | 0.23 | 0.55 |

p_1k[3] | 0.49 | 0.24 | 0.16 | 0.47 | 0.85 |

Mean | SE | 10% | Med | 90% | |
---|---|---|---|---|---|

A_ij[1,1] | 0.24 | 0.20 | 0.03 | 0.20 | 0.52 |

A_ij[1,2] | 0.27 | 0.21 | 0.03 | 0.23 | 0.55 |

A_ij[1,3] | 0.49 | 0.24 | 0.16 | 0.47 | 0.85 |

A_ij[2,1] | 0.33 | 0.02 | 0.31 | 0.33 | 0.36 |

A_ij[2,2] | 0.36 | 0.02 | 0.33 | 0.36 | 0.39 |

A_ij[2,3] | 0.31 | 0.02 | 0.29 | 0.31 | 0.33 |

Mean | SE | 10% | Med | 90% | |
---|---|---|---|---|---|

w_km[1,1] | -0.23 | 2.84 | -3.50 | -0.33 | 3.30 |

w_km[1,2] | 0.10 | 2.95 | -3.56 | 0.47 | 3.64 |

w_km[1,3] | 0.39 | 2.66 | -3.27 | 0.46 | 3.51 |

w_km[1,4] | -0.18 | 3.30 | -4.51 | -0.17 | 4.22 |

w_km[2,1] | -0.09 | 2.85 | -3.57 | -0.19 | 3.43 |

w_km[2,2] | 0.27 | 2.95 | -3.49 | 0.64 | 3.84 |

w_km[2,3] | 0.32 | 2.65 | -3.23 | 0.36 | 3.52 |

w_km[2,4] | -0.53 | 3.30 | -4.89 | -0.54 | 3.83 |

w_km[3,1] | -0.28 | 2.86 | -3.69 | -0.38 | 3.36 |

w_km[3,2] | 0.14 | 2.96 | -3.65 | 0.56 | 3.68 |

w_km[3,3] | 0.17 | 2.65 | -3.46 | 0.15 | 3.37 |

w_km[3,4] | -0.37 | 3.30 | -4.72 | -0.43 | 4.01 |

Mean | SE | 10% | Med | 90% | |
---|---|---|---|---|---|

b_km[1,1] | 0.01 | 0.39 | -0.44 | 0.00 | 0.54 |

b_km[1,2] | -1.05 | 0.29 | -1.41 | -1.06 | -0.65 |

b_km[1,3] | -4.91 | 0.36 | -5.33 | -4.90 | -4.48 |

b_km[1,4] | -0.09 | 0.33 | -0.52 | -0.09 | 0.31 |

b_km[2,1] | 0.86 | 0.13 | 0.68 | 0.86 | 1.02 |

b_km[2,2] | 5.00 | 0.08 | 4.90 | 5.00 | 5.10 |

b_km[2,3] | 0.15 | 0.10 | 0.03 | 0.15 | 0.28 |

b_km[2,4] | -0.28 | 0.13 | -0.44 | -0.28 | -0.11 |

b_km[3,1] | 5.04 | 0.03 | 5.00 | 5.04 | 5.07 |

b_km[3,2] | 6.01 | 0.02 | 5.98 | 6.01 | 6.04 |

b_km[3,3] | 6.99 | 0.02 | 6.97 | 6.99 | 7.02 |

b_km[3,4] | 0.46 | 0.03 | 0.43 | 0.47 | 0.50 |

s_k[1] | 2.74 | 0.21 | 2.50 | 2.74 | 3.02 |

s_k[2] | 1.00 | 0.09 | 0.90 | 0.99 | 1.13 |

s_k[3] | 0.21 | 0.02 | 0.19 | 0.21 | 0.23 |

The samples drawn from the posterior density are reasonably close to the true values. We additionally check the estimated hidden quantities.

`plot_stateprobability(alpha_tk, gamma_tk, 0.8, dataset$zrelab)`

```
print("Estimated hidden states (hard naive classification using filtered probabilities)")
print(table(
estimated = apply(round(apply(alpha_tk, c(2, 3),
function(x) {
quantile(x, c(0.50)) })), 1, which.max),
real = dataset$zrelab))
```

Real 1 | Real 2 | Real 3 | |
---|---|---|---|

1 | 89 | 4 | 1 |

2 | 10 | 100 | 1 |

3 | 3 | 4 | 88 |

The filtered probability is overly accurate, as expected, making the additional information contained in the smoothed seem of little value. The confusion matrix makes evident that, under ideal conditions, the sampler works as intended. Similarly, the Viterbi algorithm recovers the expected most probably hidden state.

`plot_statepath(zstar_t, dataset$zrelab)`

```
print("Estimated hidden states for the jointly most likely path (Viterbi decoding)")
round(table(
actual = rep(dataset$zrelab, each = n.samples),
fit = zstar_t) / n.samples, 0)
```

1 | 2 | 3 |
---|---|---|

89 | 4 | 1 |

10 | 100 | 1 |

3 | 4 | 88 |

`plot_outputfit(dataset$x, hatx_t, z = dataset$zrelab, TRUE)`

All in all, we are confident that the Stan code works as specified and we find model calibration satisfactory.

`writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))`

```
##
## CXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function
```

`devtools::session_info("rstan")`

`## Session info -------------------------------------------------------------`

```
## setting value
## version R version 3.3.3 (2017-03-06)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate Spanish_Argentina.1252
## tz America/Buenos_Aires
## date 2017-07-02
```

`## Packages -----------------------------------------------------------------`

```
## package * version date source
## BH 1.62.0-1 2016-11-19 CRAN (R 3.3.2)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.3.3)
## dichromat 2.0-0 2013-01-24 CRAN (R 3.3.2)
## digest 0.6.12 2017-01-27 CRAN (R 3.3.3)
## ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.3.3)
## graphics * 3.3.3 2017-03-06 local
## grDevices * 3.3.3 2017-03-06 local
## grid 3.3.3 2017-03-06 local
## gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.3)
## gtable 0.2.0 2016-02-26 CRAN (R 3.3.3)
## inline 0.3.14 2015-04-13 CRAN (R 3.3.3)
## labeling 0.3 2014-08-23 CRAN (R 3.3.2)
## lattice 0.20-34 2016-09-06 CRAN (R 3.3.3)
## lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.3)
## magrittr 1.5 2014-11-22 CRAN (R 3.3.3)
## MASS 7.3-45 2016-04-21 CRAN (R 3.3.3)
## Matrix 1.2-8 2017-01-20 CRAN (R 3.3.3)
## methods * 3.3.3 2017-03-06 local
## munsell 0.4.3 2016-02-13 CRAN (R 3.3.3)
## plyr 1.8.4 2016-06-08 CRAN (R 3.3.3)
## RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.2)
## Rcpp 0.12.10 2017-03-19 CRAN (R 3.3.3)
## RcppEigen 0.3.2.9.1 2017-03-15 CRAN (R 3.3.3)
## reshape2 1.4.2 2016-10-22 CRAN (R 3.3.3)
## rstan * 2.14.2 2017-03-19 CRAN (R 3.3.3)
## scales 0.4.1 2016-11-09 CRAN (R 3.3.3)
## StanHeaders * 2.14.0-1 2017-01-09 CRAN (R 3.3.3)
## stats * 3.3.3 2017-03-06 local
## stats4 3.3.3 2017-03-06 local
## stringi 1.1.3 2017-03-21 CRAN (R 3.3.3)
## stringr 1.2.0 2017-02-18 CRAN (R 3.3.3)
## tibble 1.3.0 2017-04-01 CRAN (R 3.3.3)
## tools 3.3.3 2017-03-06 local
## utils * 3.3.3 2017-03-06 local
```

Baum, Leonard E., and J. A. Eagon. 1967. “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology.” *Bulletin of the American Mathematical Society* 73 (3). American Mathematical Society (AMS): 360–64. doi:10.1090/s0002-9904-1967-11751-8.

Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.” *The Annals of Mathematical Statistics* 41 (1). Institute of Mathematical Statistics: 164–71. doi:10.1214/aoms/1177697196.

Bengio, Yoshua, and Paolo Frasconi. 1995. “An Input Output Hmm Architecture.”

Betancourt, Michael. 2017. “Identifying Bayesian Mixture Models.” *Stan Case Studies*, no. Volume 4. http://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html.

Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2016. “Stan: A Probabilistic Programming Language.” *Journal of Statistical Software* 20.

Cook, Samantha R, Andrew Gelman, and Donald B Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” *Journal of Computational and Graphical Statistics* 15 (3). Taylor & Francis: 675–92.

Gelman, Andrew, and Donald B Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” *Statistical Science*. JSTOR, 457–72.

Gelman, Andrew, Kenneth Shirley, and others. 2011. “Inference from Simulations and Monitoring Convergence.” *Handbook of Markov Chain Monte Carlo*, 163–74.

Hassan, Md Rafiul, and Baikunth Nath. 2005. “Stock Market Forecasting Using Hidden Markov Model: A New Approach.” In *Intelligent Systems Design and Applications, 2005. Isda’05. Proceedings. 5th International Conference on*, 192–96. IEEE.

Stan Development Team. 2017a. “RStan: The R Interface to Stan.” http://mc-stan.org/.

———. 2017b. “Shinystan: Interactive Visual and Numerical Diagnostics and Posterior Analysis for Bayesian Models.” http://mc-stan.org/.

Team, Stan Development. 2017. *Stan Modeling Language: User’s Guide and Reference Manual: Version 2.15.0.*