Intro VMC - modeling and computing

Stat 221, Lecture 3


Lecture plan

  1. Intro.
  2. Modeling and computing material.
    • Computing classification of models.
    • Intro to computing with Odyssey.
    • Thinking like a programmer - pseudocode, class languages.
  3. Critiquing a visualization, pset 1 information.


  • Final projects out.
  • Ranking survey out soon.
  • Piazza is good, use it.

Lecture 2 recap

  • Distinctive features of parametric stat modeling.
  • Ingredients of a stat model.
  • Finding the estimand.
  • Telling a story with the model.
  • Agile movement setup \\( \longleftrightarrow \\) modeling.
  • Parsimony in modeling.
  • Visualization for model design.

Lecture 3 focus

  • Problem setup.
  • Method design.
  • Implementation.
  • Reporting.

Specifically, we will look at the connection and tradeoffs between modeling and computing.


  • Survey done with RDS, need to estimate population mean.
  • The researchers are using naive mean model which has a non-ignorable problem.
  • We need to decide on a better method.

Used approaches

  • Classify models in terms of computing intensity.
  • Easily switch between workflow stages, and available options within a stage.
  • Think a little like a programmer.

Why this is important

Easy computing

Hard computing

Simple model

Linear regression


Complex model


complex HMM, HMC

Being able to navigate the tradeoffs in concrete applications is necessary to make the right choices.

Statistical models classified

Simple models:

  • Regressions (logistic, linear), mixed effects regressions (somewhat).
  • Autoregressive models.

Simple-to-complex models:

  • Generalized linear models (GLM), spline/smooth regressions.

Complex models:

  • Models with missing data/latent variables, HMMs.
  • GLM's without the L, models from above combined.

Need new models and faster algorithms

  • Standard models often don't work.
  • Increase in the amount of available data - 2.8 trillion GB in 2012.
  • Less and less cookie-cutter settings and challenges.
  • Personal computers cannot handle large datasets, need distributed computing such as Odyssey.

Parallel computing

Decision sequence

  1. Design a model.
  2. Work on selecting a good computing technique for it, evaluate findings.
  3. Go back to either look for another computing technique, or change the model if the results are not satisfactory.

Example: naive model for RDS

  • Simple model: $$Y_i \mathop{\sim}^{\rr{iid}} N\left( \mu, \sigma^2\right)$$
  • Can be estimated on the personal computer easily.
  • But CI coverage rates are bad. The model is not a good representation of the underlying mechanism.
  • Go back to the design phase.

Recap: HMM

  • \\(\{h_1, h_2, \ldots \} \\) live on discrete space, follow a Markov Process with a transition matrix \\( T_\theta \\).
  • At time \\( t\\), state \\( h_t \\) emits observation \\( y_t \\) according to some specified model.

AP: first candidate

First candidate example

  • Why would this be a bad candidate?

AP: second candidate

$$$$\begin{align} Y_i \given Y_{r(i)} & \sim N \left(\rho Y_{r(i)} + (1 - \rho) \mu, (1 - \rho^2) \sigma^2 \right), \\Y_i & \sim N\left( \mu, \sigma^2 \right) \rr{ if } r(i)\rr{ doesn't exist}\end{align}$$$$

  • Needs Gibbs sampler to fit.
  • Mostly analytic updates.
  • Needs distributed computing for sensitivity analysis and coverage rate analysis.
  • Provides good interpretation and interval coverage properties.

Benchmark: coverage


  • Second candidate.
  • Numerically stable (not so many parameters/missing data).
  • Other favorable features.

What drives the decision?

  • The estimand.
  • Model performance metrics.
  • What models are available/can be adapted?
  • Work towards parsimony.

Choice of algorithm

  • Numeric stability.
  • Cost of implementation and execution.
  • Point estimation vs. full sampling approaches.
  • Point estimation: MLE (via LP, QP approximations, EM, Fisher scoring, simulated annealing), MAP.
  • Full sampling: MCMC, HMC, Gibbs, exact sampling.

Thinking like a programmer


Simulate from the simple model:

for i in 1:N
  generate and store Y_i from Normal(mu, sigma-sq)

Simulate from the AP model

for i in 1:N
  referrer = r(i)
  if referrer
    Yr = Y_referrer
    generate and store Y_i from Normal(rho * Yr + (1 - rho) * mu, (1 - rho^2) * sigma-sq)
    generate and store Y_i from Normal(mu, sigma-sq)

Pseudocode for main fitting function

initialize parameter values
for i in 1:ndraws
  ## this is Gibbs sampler
  draw mu analytically given current rho and sigma-sq
  draw rho via Metropolis proposals on transformed space given current rho and sigma-sq
  draw sigma-sq analytically given current mu and rho
discard first nburnin draws of parameters

Pseudocode becomes R code

tr = init(data)
for (ii in 1:settings$ndraws) {
  ## this is Gibbs sampler
  pars = lastParams(tr)
  tr$mu = combine(tr$mu, drawMu(data, pars))
  pars = lastParams(tr)
  tr$rho = combine(tr$rho, drawRho(data, pars))
  pars = lastParams(tr)
  tr$ss = combine(tr$ss, drawSigmaS(data, pars))
tr = discard(tr, settings$burnin)

Pseudocode becomes good R code

## initialize parameter values
tr = init(data, settings)
## perform the main Gibbs sampler loop
for (ii in 1:settings$ndraws) {
  ## lastParams gets the most recent parameter draws
  pars = lastParams(tr)
  tr$mu = combine(old = tr$mu, add = drawMu(data, pars))
  pars = lastParams(tr)
  tr$rho = combine(old = tr$rho, add = drawRho(data, pars))
  pars = lastParams(tr)
  tr$ss = combine(old = tr$ss, add = drawSigmaS(data, pars))
## Burn in - throw away first burnin draws set in settings
tr = discard(allparameters = tr, nthrow = settings$burnin)

Good code

  • Saves time.
  • Easier to handle new languages.

Class computing languages


for (ii in 1:10) {
  print("R is awesome")


for ii in $(seq 1 10)
  echo "R is awesome"
  • Many code snippets provided in homeworks.

Class visualization languages

Javascript (optional), snippets provided

for (var ii = 0; ii < 10; ii++) {
  console.log("R is awesome");

CSS (optional), snippets provided

svg line {
  stroke : #000000;
  opacity : 0.3;

HTML (optional), snippets provided

<body><p>R is awesome</p></body>

Why need fast code in R

  • Personal to distributed computing leap.
  • short_serial, short_parallel queues on Odyssey.
  • Computing resources are usually constrained.

How to write fast R code

  • Vectorization.
  • Think in terms of matrix operations.
  • Vector indexing.
  • Built-in functions and functions in packages, some are in lower level languages and so are faster.
  • (Hint for pset1: may want to consider xpnd in MCMCpack).

Example of fast code in R

time1 <- proc.time()
res1 <- sapply(1:1000000, exp)
time2 <- proc.time()
time2 <- time1
## user system elapsed
## 6.321 0.146 6.429
time1 - proc.time()
res2 <- exp(1:1000000)
time2 <- proc.time()
time2 <- time1
## user system elapsed
## 0.022 0.016 0.051
all(res1 == res2)
## [1] TRUE

Problem set 1 visualization


  • Submit your final project rankings by February 10 at noon!
  • Odyssey access: send your name, email, and phone # to your TF.


Final slide

  • Next lecture: guest lecture by Rachel Schutt