Task 1: Installing Stan

Note: Windows users in both Stan and Python might need RTools installed. Martin has a flash drive with RTools installers so that you don’t have to wait for the large download.

R

We will use the cmdstanr package, follow https://mc-stan.org/cmdstanr/articles/cmdstanr.html

Python

We will use the CmdStanPy package, follow https://mc-stan.org/cmdstanpy/installation.html

Task 2: Hello World model

Create a file called initial.stan and paste the following simple model.

data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real mu;
}

model {
  // likelihood
  target += normal_lpdf(y | mu, 1);
  // prior
  target += normal_lpdf(mu | 0, 1);
}

Note that we express all contributions to the target density explicitly with target +=.

The corresponding statistical model is

\[ y_i \sim N(\mu, 1) \\ \mu \sim N(0,1) \]

i.e. we are estimating the mean of a normal distribution with known standard deviation and a somewhat informative prior on the mean.

Note: You’ll notice many Stan models use statements like y ~ normal(mu, 1) instead of target += normal(y | mu, 1). We will discuss this alternative syntax and the differences in next lesson.

Simulate data for the model with 10 data points, using the same structure as the model assumes, i.e. draw \(\mu\) from its prior, then draw \(y\) (using rnorm in R, numpy.random.Generator.normal in Python)

Compile the model, prepare the data in a format for the model, fit the model and display a summary:

Inspect the summary - how big is the uncertainty? Are the “true” values we used in simulation recovered?

Task 3: Estimate sigma

Add a sigma parameter of type real to model the standard deviation, with a half-Normal prior i.e. the new mathematical model is:

\[ y_i \sim N(\mu, \sigma) \\ \mu \sim N(0,1) \\ \sigma \sim HalfN(0, 2) \]

Since standard deviation cannot be negative, add a <lower=0> constraint to sigma.

HMC needs the parameter space to be Euclidean (i.e. \(\mathbb{R}^K\) for some \(K\)), so when something is constrained, there is an implicit transformation, so Stan will use log(sigma) internally.

There is no builtin half-normal distribution in Stan. But it turns out, using a normal prior on a variable constrained to be positive is the same as using a half-normal distribution. We will go into the reasons for this and some nuances next time.

Generate data according to this model, i.e. first draw \(\mu\) and \(\sigma\) from their priors, then draw \(y\). You can draw from the half-normal distribution simply by taking the absolute value of a draw from the normal distribution.

Explore the summary, do you recover the true parameters?

Task 4: Our first convergence problems

Use the model from the previous task with dataset of size 1. What happens?

Is there a reason the model may not work well with just a single datapoint?

Does the model from Task 2 also have issues with a dataset of size 1? Why?

Bonus: Try this if you feel it won’t take you too much time. Assume that the single data point is \(y = 1\), make a contour and/or heatmap plot of the posterior density as a function \(\mu\) and \(\sigma\) (in R use dnorm( log = TRUE), in Python use scipy.stats.norm and the logpdf method).

Generally, to get a basic understanding of warnings about convergence from Stan, the guide at https://mc-stan.org/misc/warnings is IMHO quite good (I wrote most of it :-D).

Task 5: Prior as data

Add two new data elements of type real - mu_prior_mean and mu_prior_sd to define the location and scale of the normal prior for \(\mu\)

Should those data have some constraints attached? If yes, add them as well.

Test that the model now works with some of the datasets you generated previously.

Task 6: Prior influence

Generate a dataset of size 10 with the prior \(\mu \sim N(0,1), \sigma \sim HalfN(0, 2)\). Find the least extreme values of mu_prior_mean and mu_prior_sd that make the posterior generated by Stan exclude the true value of \(\mu\) from your simulation.

Try the same task with a dataset of size 3 and a dataset of size 100.

Task 7: Bayesian “t-test”

Expand the model so that you now have two sets of data (possibly of unequal size), each belonging to a different group and estimate the mean in the first group and the difference in means. More specifically, the model should look like this:

\[ y_A \sim N(\mu, \sigma_A) \\ y_B \sim N(\mu + \delta, \sigma_B) \\ \mu \sim N(0, 1)\\ \delta \sim N\left(0, \frac{1}{2}\right) \\ \sigma_A, \sigma_B \sim HalfN(0, 1) \]

(you can keep the location and scale of the prior as data, but you can also hardcode it)

Repeatedly simulate data according to the model and fit it. How big do the groups need to be, so that your posterior interval for \(\delta\) excludes 0 most of the time?