Statistical models and likelihood
Stat 221, Lecture 6
@snesterko
Lecture plan
- Design phase: what matters.
- R- and design-friendly ways to write the model.
- Actors in the model.
- Critiquing the modeling process.
Statistical models
- What do they mean to us as data analysts?
Doing things with models
- Formulating intuition.
- Answering a question.
- Checking goodness.
- Predicting.
- Generalizing.
Practical steps
- Formulate questions \\( \rightarrow \\) benchmarks to check against.
- Model "mean response".
- Model noise around it.
- Incorporate features of the system.
$$Y \sim f_{\theta}(X, \epsilon)$$
Classic example - regression
$$\vec{Y} \sim \mathbf{X}\vec{\beta} + \vec{\epsilon}, \rr{ } \vec{\epsilon} \sim N\left(0, \sigma^2\mathbf{I}\right)$$
$$\begin{align}Y_i & \sim \beta_0 + X_1\beta_1 + \ldots + X_n\beta_n + \epsilon_i \\ \epsilon_i & \sim N\left(0, \sigma^2\right)\end{align}$$
$$\vec{Y} \sim N\left(\mathbf{X}\vec{\beta}, \sigma^2\mathbf{I}\right)$$
Data density (likelihood!)
$$f(y \given \theta, X)\, = \frac{1}{(2\pi \sigma^2)^{n/2}} e^{-\frac{1}{2\sigma^2}(y -X\beta)^T(y- X\beta)}$$
Alternatively, $$f(y \given \theta, X) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{1}{2\sigma^2} \left( y_i - (\beta_0 + x_1 \beta_1 + \ldots + x_p \beta_p)\right)^2}$$
Useful: matrix calculus.
Actors in the model
- Response.
- Covariates.
- Free parameters.
- Constants (numbers).
- Latent variables/missing data.
Example: AP model
$$$$\begin{align} Y_i \given Y_{r(i)} & \sim N \left(\rho Y_{r(i)} + (1 - \rho) \mu, (1 - \rho^2) \sigma^2 \right), \\Y_i & \sim N\left( \mu, \sigma^2 \right) \rr{ if } r(i)\rr{ doesn't exist}\end{align}$$$$
- You don't necessarily need covariates.
- Autoregressive flavor.
- Homophily intuition.
Data density
$$\begin{align}f(y \given \theta) = \prod_{i=1}^n & \frac{1}{\sqrt{2\pi\sigma^2(1 - \rho_i)^2}} \\ & e^{\frac{1}{2\sigma^2(1-\rho_i^2)}\left( y_i - \left( \rho_i y_{r(i)} + (1 - \rho_i) \mu\right)\right)}\end{align}$$
- \\(\rho_i = \begin{cases} \rho & \rr{if } i \rr{ is referred} \\ 0 & \rr{if } i \rr{ is a seed}\end{cases}\\)
R-friendly math
- Reduce the number of math symbols in your formulas.
- Use matrices and vectors.
- More on that later.
Generalizing the model
$$\begin{align}f(y \given \theta) = \prod_{i=1}^n & \frac{1}{\sqrt{2\pi\sigma^2(1 - \rho_i)^2}} \\ & e^{\frac{1}{2\sigma^2(1-\rho_i^2)}\left( y_i - \left( \rho_i y_{r(i)} + (1 - \rho_i) \mu\right)\right)}\end{align}$$
- \\(\rho_i = \begin{cases} \rho^{(1)} & \rr{if } i \rr{ is referred by close friend} \\ \rho^{(2)} & \rr{if } i \rr{ is referred by distant friend}\\ 0 & \rr{if } i \rr{ is a seed}\end{cases}\\)
Being creative
Switching actors around:
- Data augmentation - introducing latent variables.
- Free parameters can become latent variables.
- Constants can morph into parameters.
Constraining
- Domains and more involved constraints.
- Priors.
Beware of possible consequences:
- Effect on the estimand.
- Computing.
Other models
Probit regression $$Y_i \given Z_i = \begin{cases}1 \rr{ if } Z_i < X_i \beta \\ 0 \rr{ otherwise}\end{cases} , \rr{ } Z_i \sim N(0,1)$$
or $$Y_i \sim \rr{Bernoulli}\left(\Phi(X_i\beta) \right) $$
Data pmf
$$p(y \given \beta, X) \propto \prod_{i=1}^n \Phi(X_i \beta)^{y_i} \left( 1 - \Phi(X_i\beta)\right)^{y_i}$$
Hidden Markov Model
- \\(\{h_1, h_2, \ldots \} \\) live on discrete space, follow a Markov Process with a transition matrix \\( T_\theta \\).
- At time \\( t\\), state \\( h_t \\) emits observation \\( y_t \\) according to some specified model.
Data generation process
for i in 1:n
if i is 1
generate h_i using distribution pi()
else
p = h_{i-1}
generate h_i using transition probabilities based on p
generate y_i from emission pdf f(h_i, params)
- Joint distribution of \\( h \\) and \\( y\\)?
Example: HMM
- Two hidden states, transition matrix $$T = \left( \begin{array} 00.5 & 0.5 \\ 0.1 & 0.9\end{array}\right)$$
- Generate response $$y_i \given h_i \sim \begin{cases} \rr{N} (0,1) &\rr{if } h=0 \\ \rr{N}(0, 3) & \rr{if } h=1 \end{cases} $$
Simulation versus inference
- Simulation: set model parameters to some values and generate the response through its mechanism.
- Inference: run an optimization algorithm to infer the best possible set of parameters (or their distributions) that can yield the observed response.
- Inference is optimization.
Inference: HMM
- Transition matrix $$T = \left( \begin{array} 0p_{11} & p_{12} \\ p_{21} & p_{22}\end{array}\right)$$
- Generate response $$y_i \given h_i \sim \begin{cases} \rr{N} (\mu,\sigma_1^2) &\rr{if } h=0 \\ \rr{N}(\mu, \sigma_2^2) & \rr{if } h=1 \end{cases} $$
Ways to do inference
- Maximize likelihood function \\( L(\theta \given data) = f_{\theta}(data) \\).
- M-estimators.
- Method of moments.
Likelihood for probit
$$L(\beta \given Y ) \propto \prod_{i=1}^n \Phi(\vec{X}'_i \beta)^{y_i}$$
Likelihood for HMM
- Observed-data $$L(\theta \given Y ) = \sum_{H} P_{\theta}(Y \given H)P_\theta(H)$$
- Complete-data $$\begin{align}L(\theta \given Y, H ) = & f_{\theta}(y_1 \given h_1)g_\theta(h_1) \cdot \\ & \prod_{i=2}^n f_{\theta}(y_i \given h_i)g_\theta(h_i \given h_{i-1})\end{align}$$
Important to know
- How to write (log-)likelihood function from a model statement. Conditioning/telescoping approach helps.
- How to write it in a simple, matrix form to help fast computation.
Announcements
- Problem Set 2 is out.
- Final project assignment over email by February 17.
Semi-final slide
- No lecture next Monday -- President's Day.
- Lecture Wednesday: Likelihood principle, ways to get MLE, intro Odyssey.
Guest-contributed model
- Arman Sabbaghi, 3D printing problem.
- Short presentation, class critique.