The final lecture

Stat 221, Lecture 25



  • What the class was about
  • Class material recap
    • Theoretical material
    • Computing
    • Visualization
    • What's your favorite?
  • Student creations
  • Where can we go from here?

The class was about the full data analysis workflow

What we went through

Theoretical material

Topics we have explored:

  1. Classic point estimation: MLE, Fisher scoring
  2. Point estimation with missing data: EM+
  3. Full-posterior inference: MCMC, HMC, Parallel Tempering, Equi-Energy Sampler
  4. Decisions based on inferred system parameters
  5. A structured approach to interactive visualization

Observed Fisher Information puzzle

Mysteries of classic MLE estimation

  • Reparametrization brings more information from the same data
  • Perfect models have a unimodal likelihood
    • By consequence, imperfect models misbehave: the likelihood is not unimodal, log-likelihood does not look Normal

Modeling with missing data and latent variables - replication is crucial!

Much better with replication

EM: mystery of the \\( Q \\) function

$$Q\left( \theta | \theta^{(t)} \right) = \int l(\theta | y) f( Y_{\rr{mis}}| Y_{\rr{obs}}, \theta = \theta^{(t)}) dY_{\rr{mis}}$$

By Jensen's $$\begin{align} l( \theta^{(t+1)} | & Y_{\rr{obs}}) - l( \theta^{(t)} | Y_{\rr{obs}}) = \\ & [Q( \theta^{(t+1)} | Y_{\rr{obs}}) - Q( \theta^{(t)} | Y_{\rr{obs}})] \\ & - [H( \theta^{(t+1)} | Y_{\rr{obs}}) - H( \theta^{(t)} | Y_{\rr{obs}})] \geq 0\end{align}$$

Puzzle: can the \\( Q \\) function be used to stop bad EM early?

Intro hacker level with EM: data augmentation

  • Introducing missing data when it's not there: simple \\( \rr{Normal} + \chi^2_n =t_n \\) mixture example.
  • Down the rabbit hole: PX-EM
    • Why did we stop there?
    • Data augmentation/parameter space expansion = improved numerical properties of algorithms at the expense of harder implementation

MCMC for when log-likelihood isn't Normal

Going deeper: imperfect models, multimodal posteriors

"Strictly better than Metropolis" - HMC

  • Hamiltonian (Hybrid) Monte Carlo, a so-called "derivative-based" MCMC technique.
  • Performs very well for differentiable energy functions (densities).
  • Generic implementation, but with a twist
    • The gradient needs to be evaluated up to 10-20 times per single MCMC iteration. Its implementation should be fast enough.

HMC mechanics

HMC is great for fast convergence, but only for something like this

Going beyond HMC: Parallel Tempering

  • Running chains in parallel, swapping draws between them in order to explore the space better.
  • Can be used with any MCMC strategy inside each chain: Metropolis, Metropolis-Hastings, Hamiltonian Monte Carlo, Slice Sampler etc.
  • Works better than any single chain with the same local jump, but takes time and effort to implement, computing infrastructure to execute (MPI, cloud).

Parallel Tempering mechanics

Further questions with PT

  • What are the optimal setting of PT?
  • Do non-blocking MPI designs for PT really have a positive affect?
  • Are PT-like algorithms really the future of statistical computing?

Making PT obsolete - the EE sampler

  • The Equi-Energy Sampler makes uniform equi-energy proposals from the pools of previously visited states with similar energy values. This allows to directly make MCMC jumps between the modes that are potentially far apart.
  • Superior performance of the EE sampler comes at the price of harder technical implementation, large required memory storage.

EE Sampler illustrated

EE sampler vs PT sampler

Open research with the EE sampler

  • Can we define continuous energies for each temperature values so that we can use HMC with EE?
  • The EE sampler takes us to the limit: is it more an optimization algorithm or more an inference algorithm. Specifically, highly multidimensional statistical models don't make sense (or do they?).

Having inferred system parameters, need to decide

  • Desicion Theory is not well adapted to real world decision problems that actual movers and shakers need to make based on data-driven systems. $$ \int C(\theta, \delta(X)) p(\theta \given R, X) d \theta$$
  • Defining the right cost function, connecting to Response Surface methods, RCM - open research areas.


  • Can't live with it, can't live without it
  • Several themes in R programming
    • Vectorization
    • Code abstraction
    • Cloud computing
    • Parallel computing with MPI+friends
  • Are we expert R programmers now?


time1 <- proc.time()
res1 <- sapply(1:1000000, exp)
time2 <- proc.time()
time2 - time1
## user system elapsed
## 6.321 0.146 6.429
time1 <- proc.time()
res2 <- exp(1:1000000)
time2 <- proc.time()
time2 - time1
## user system elapsed
## 0.022 0.016 0.051
all(res1 == res2)
## [1] TRUE



What we've learned

  • The impact of interactive experiences
  • Technical interactive visualization
  • Tools how to make it with d3js
  • Structuring the perception, evaluation of interactive experiences
  • Technical features of visualization

Classifying visualizations

  • By goal
    • Engage
    • Provoke thinking
    • Make them pay you
    • Other
  • By audience
    • General audience
    • Scientific audience
    • Other
  • By type

Visualization type

Comparison is important

Another one

Building analogy (Steven Wittens)

Creating interactive experiences has become easier than ever

  • SVG standard development
  • HTML5
  • WebGL
  • Cinder
  • Not so bad to learn and use!

Class visualization tools

  • R
    ## or
    p <- ggplot(data.frame(x = 1, y = 1), aes(x, y))
    p + geom_point()
  • d3js
    // initialize the SVG container
    var con ="#" + container)
                .insert("div", ":first-child")
                .attr('class', 'ivcont');
    // add visualization title

Student work - great!

Where can we go from here?

  • Anywhere*

*Class material is universal and can help be effective when dealing with data at the highest level of complexity both technically and conceptually.


  • Final project write-up/code submissions due tomorrow 11:59pm
  • Final project presentations Wednesday May 8 at 1-4:12pm - everyone is expected to attend


Final slide

Thank you guys, you are awesome!

See you around...