# Statistical models and likelihood

@snesterko

## Lecture plan

• Design phase: what matters.
• R- and design-friendly ways to write the model.
• Actors in the model.
• Critiquing the modeling process.

## Statistical models

• What do they mean to us as data analysts?

## Doing things with models

• Formulating intuition.
• Checking goodness.
• Predicting.
• Generalizing.

## Practical steps

• Formulate questions \$$\rightarrow \$$ benchmarks to check against.
• Model "mean response".
• Model noise around it.
• Incorporate features of the system.

$$Y \sim f_{\theta}(X, \epsilon)$$

## Classic example - regression

$$\vec{Y} \sim \mathbf{X}\vec{\beta} + \vec{\epsilon}, \rr{ } \vec{\epsilon} \sim N\left(0, \sigma^2\mathbf{I}\right)$$

• or

\begin{align}Y_i & \sim \beta_0 + X_1\beta_1 + \ldots + X_n\beta_n + \epsilon_i \\ \epsilon_i & \sim N\left(0, \sigma^2\right)\end{align}

• or

$$\vec{Y} \sim N\left(\mathbf{X}\vec{\beta}, \sigma^2\mathbf{I}\right)$$

## Data density (likelihood!)

$$f(y \given \theta, X)\, = \frac{1}{(2\pi \sigma^2)^{n/2}} e^{-\frac{1}{2\sigma^2}(y -X\beta)^T(y- X\beta)}$$

Alternatively, $$f(y \given \theta, X) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{1}{2\sigma^2} \left( y_i - (\beta_0 + x_1 \beta_1 + \ldots + x_p \beta_p)\right)^2}$$

Useful: matrix calculus.

## Actors in the model

• Response.
• Covariates.
• Free parameters.
• Constants (numbers).
• Latent variables/missing data.

## Example: AP model

\begin{align} Y_i \given Y_{r(i)} & \sim N \left(\rho Y_{r(i)} + (1 - \rho) \mu, (1 - \rho^2) \sigma^2 \right), \\Y_i & \sim N\left( \mu, \sigma^2 \right) \rr{ if } r(i)\rr{ doesn't exist}\end{align}

• You don't necessarily need covariates.
• Autoregressive flavor.
• Homophily intuition.

## Data density

\begin{align}f(y \given \theta) = \prod_{i=1}^n & \frac{1}{\sqrt{2\pi\sigma^2(1 - \rho_i)^2}} \\ & e^{\frac{1}{2\sigma^2(1-\rho_i^2)}\left( y_i - \left( \rho_i y_{r(i)} + (1 - \rho_i) \mu\right)\right)}\end{align}

• \$$\rho_i = \begin{cases} \rho & \rr{if } i \rr{ is referred} \\ 0 & \rr{if } i \rr{ is a seed}\end{cases}\$$

## R-friendly math

• Reduce the number of math symbols in your formulas.
• Use matrices and vectors.
• More on that later.

## Generalizing the model

\begin{align}f(y \given \theta) = \prod_{i=1}^n & \frac{1}{\sqrt{2\pi\sigma^2(1 - \rho_i)^2}} \\ & e^{\frac{1}{2\sigma^2(1-\rho_i^2)}\left( y_i - \left( \rho_i y_{r(i)} + (1 - \rho_i) \mu\right)\right)}\end{align}

• \$$\rho_i = \begin{cases} \rho^{(1)} & \rr{if } i \rr{ is referred by close friend} \\ \rho^{(2)} & \rr{if } i \rr{ is referred by distant friend}\\ 0 & \rr{if } i \rr{ is a seed}\end{cases}\$$

### Being creative

Switching actors around:

• Data augmentation - introducing latent variables.
• Free parameters can become latent variables.
• Constants can morph into parameters.

Constraining

• Domains and more involved constraints.
• Priors.

Beware of possible consequences:

• Effect on the estimand.
• Computing.

## Other models

Probit regression $$Y_i \given Z_i = \begin{cases}1 \rr{ if } Z_i < X_i \beta \\ 0 \rr{ otherwise}\end{cases} , \rr{ } Z_i \sim N(0,1)$$

or $$Y_i \sim \rr{Bernoulli}\left(\Phi(X_i\beta) \right)$$

## Data pmf

$$p(y \given \beta, X) \propto \prod_{i=1}^n \Phi(X_i \beta)^{y_i} \left( 1 - \Phi(X_i\beta)\right)^{y_i}$$

## Hidden Markov Model

• \$$\{h_1, h_2, \ldots \} \$$ live on discrete space, follow a Markov Process with a transition matrix \$$T_\theta \$$.
• At time \$$t\$$, state \$$h_t \$$ emits observation \$$y_t \$$ according to some specified model.

## Data generation process

for i in 1:n
if i is 1
generate h_i using distribution pi()
else
p = h_{i-1}
generate h_i using transition probabilities based on p
generate y_i from emission pdf f(h_i, params)
• Joint distribution of \$$h \$$ and \$$y\$$?

## Example: HMM

• Two hidden states, transition matrix $$T = \left( \begin{array} 00.5 & 0.5 \\ 0.1 & 0.9\end{array}\right)$$
• Generate response $$y_i \given h_i \sim \begin{cases} \rr{N} (0,1) &\rr{if } h=0 \\ \rr{N}(0, 3) & \rr{if } h=1 \end{cases}$$

## Simulation versus inference

• Simulation: set model parameters to some values and generate the response through its mechanism.
• Inference: run an optimization algorithm to infer the best possible set of parameters (or their distributions) that can yield the observed response.
• Inference is optimization.

## Inference: HMM

• Transition matrix $$T = \left( \begin{array} 0p_{11} & p_{12} \\ p_{21} & p_{22}\end{array}\right)$$
• Generate response $$y_i \given h_i \sim \begin{cases} \rr{N} (\mu,\sigma_1^2) &\rr{if } h=0 \\ \rr{N}(\mu, \sigma_2^2) & \rr{if } h=1 \end{cases}$$

## Ways to do inference

• Maximize likelihood function \$$L(\theta \given data) = f_{\theta}(data) \$$.
• M-estimators.
• Method of moments.

## Likelihood for probit

$$L(\beta \given Y ) \propto \prod_{i=1}^n \Phi(\vec{X}'_i \beta)^{y_i}$$

## Likelihood for HMM

• Observed-data $$L(\theta \given Y ) = \sum_{H} P_{\theta}(Y \given H)P_\theta(H)$$
• Complete-data \begin{align}L(\theta \given Y, H ) = & f_{\theta}(y_1 \given h_1)g_\theta(h_1) \cdot \\ & \prod_{i=2}^n f_{\theta}(y_i \given h_i)g_\theta(h_i \given h_{i-1})\end{align}

## Important to know

• How to write (log-)likelihood function from a model statement. Conditioning/telescoping approach helps.
• How to write it in a simple, matrix form to help fast computation.

## Announcements

• Problem Set 2 is out.
• Final project assignment over email by February 17.

## Semi-final slide

• No lecture next Monday -- President's Day.
• Lecture Wednesday: Likelihood principle, ways to get MLE, intro Odyssey.

## Guest-contributed model

• Arman Sabbaghi, 3D printing problem.
• Short presentation, class critique.