Simulate 30 values with neg. binomial with \(\phi = \frac{1}{2}\) (size
in
R’s rnbinom
), fit with the model from Task 2. Do you
recover the simulated mean of the neg. binomial?
Try to detect the model-data mismatch with a posterior predictive check.
R: - Use fit$draws(format = "draws_matrix")
to get a
matrix of draws, then use
rpois`` to generate the predictions as the Stan model assumes - use either
bayesplot::ppc_statwith a stat measuring the dispersion (e.g. variance, sd, ...) or
bayesplot::ppc_dens_overlayPython: use either
arviz.plot_bpv()with
kind=“t_stat”and a stat measuring the dispersion (e.g. variance, sd, ...) or
arviz.plot_ppc`
What is the smallest number of observations we need to reliably detect the problem?
Implement a negative binomial model and see how the check behaves now.
neg_binomial_2_lpmf
)exponential(1)
prior on the overdispersion
parameter.Given the following data:
y : 2, 0, 11, 25, 9, 4, 17, 11, 8, 4, 5, 2, 6, 4, 8, 24, 0, 3, 6, 4, 12, 9, 5, 6, 2, 10, 4, 15, 0, 1, 87, 19, 2, 1, 38, 16, 5, 7, 18, 11, 1, 0, 7, 15, 5, 0, 6, 1, 0, 6, 34, 29, 7, 11, 5, 10, 8, 3, 12, 18, 7, 1, 18, 18, 6, 3, 37, 5, 1, 0, 22, 13, 1, 0, 26, 19, 6, 7, 21, 45
group : "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B"
type : "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y"
Try to fit the neg. binomial model from task 3 with a shared mean and
overdispersion parameter to the y
column of the data.
The model does not represent the data well. Try to figure out what it is and fix it!
Hint: many PPCs can be performed per group, in R this is the
ppc_xxx_grouped
functions.
Given the following data:
y : 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1
group : "B", "C", "D", "B", "C", "B", "D", "C", "A", "D", "C", "C", "C", "D", "C", "C", "B", "B", "D", "C", "D", "C", "C", "A", "B", "D", "C", "D", "D", "D", "C", "D", "A", "A", "C", "D", "D", "C", "C", "B", "B", "C", "D", "D", "A", "D", "A", "B", "C", "A"
Once again use the model from Task 3. Use posterior predictive check to determine what is wrong.
Can you fix it?