Basics of Bayesian Inference
- Bayes theorem for events
- Exercise - which terms should go into nominator/denumerator?
- Bayes theorem for models
\[\begin{gather*}
\pi_\text{joint}(y, \theta) = \pi_\text{obs}(y | \theta)
\pi_\text{prior}(\theta)\\
\pi_\text{marg}\left(y \right) = \int_\Theta \mathrm{d} \theta \:
\pi_{\text{obs}}(y | \theta) \pi_\text{prior}(\theta)\\
\pi_\text{post}(\theta | y) = \frac{\pi_\text{obs}(y | \theta)
\pi_\text{prior}(\theta)}{\pi_\text{marg}\left(y \right)}.
\end{gather*}\]
- Bayesian statistics \(\neq\)
Bayesian epistemology
- Problems with catchall hypothesis
- Problems in computation of Bayes factors
- Problems with selective inference
- Compare to modern falsificationist ideas, notably Mayo (e.g. Mayo
2018, SIST — I can lend you a copy, or Mayo &
Spanos 2011)
- Modelling notation, e.g.
\[
y \sim N(\mu,\sigma) \\
\mu \sim N(0, 1) \\
\sigma \sim HalfN(0, 2)
\]
Modelled vs. unmodelled data
Great intro to the underlying philosophical ideas: Chapters 1 - 2
of McElreath’s “Statistical Rethinking” - available freely online at https://xcelab.net/rmpubs/sr2/statisticalrethinking2_chapters1and2.pdf
Extracting information from probability distributions via
expectation values (see Mike
Betancourt’s writing for a very detailed coverage)
MCMC algorithms allow us to sample posterior
- What is a chain
- Convergence, Rhat, ESS
HMC - show example, requires gradient
- Conceptually two components — random initial momentum for each
trajectory makes the chain explore, preference for high density regions
makes the chain spend more time there.
- Wax poetic how HMC is great
- HMC will work great, if gradient is useful
- HMC has good diagnostic - the “divergence”
What is needed to be proficient at probabilistic programming?
- Mathematical probability/statistics
- Numerics
- A little software engineering
Basics of Stan
- A decent tutorial covering similar ground as we did (and some more)
is at https://betanalpha.github.io/assets/case_studies/stan_intro.html
Note that the example R code there uses the
rstan
package,
while we will use the newer cmdstanr
package.
- Stan documentation:
- Stan has very helpful community at https://discourse.mc-stan.org
- Fundamentally a Stan program computes logarithm of the density (and
its gradient, Hessian via autodiff)
- Stan is statically typed (and generally quite restrictive). Discuss
pros and cons.
- Stan architecture:
- Math library + autodiff
- The language
- The interfaces
- compiler to C++, compile C++
- Go through the “hello world”
model for the lesson.
- Main program blocks
data
(both modelled and non-modelled)
parameters
model
target +=
statements
_lpdf
The tasks
- Start a project to contain your coursework
- In general: we expect you to use online documentation etc. a lot,
just like in “real life”
- Task description - separate
page
Summary after tasks
- Constraints
- Divergences as helpful diagnostics
- Discuss project proposals