Generalized Linear Models - Model fitting, Assessing assumptions

Review “theory”

Have a response variable with distribution in exponential family

\[f ( y | \theta , \phi ) = \exp \left[ \frac { y \theta - b ( \theta ) } { a ( \phi ) } + c ( y , \phi ) \right]\]

\(\theta\) is the natural parameter, \(\phi\) is dispersion parameter (if present)

Includes Gaussian, Binomial, Poisson, etc.

Mean and variance

\[\begin{array} { r l } { E Y } & { = \mu = b ^ { \prime } ( \theta ) } \\ { \operatorname { var } Y } & { = b ^ { \prime \prime } ( \theta ) a ( \phi ) } \end{array}\]

Variance function: \(b''(\theta)\)

Link functions: \(g(\mu) = \eta = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \cdots + \beta _ { p } x _ { p } = x ^ { T } \beta\)

Examples

\[\begin{array} { l l l } { \text { Family } } & { \text { Link } } & { \text { Variance Function } } \\ { \text { Normal } } & { \eta = \mu } & { 1 ~ ;~ a(\phi) = \sigma^2 } \\ { \text { Poisson } } & { \eta = \log \mu } & { \mu } \\ { \text { Binary } } & { \eta = \log ( \mu / ( 1 - \mu ) ) } & { \mu ( 1 - \mu ) } \\ { \text { Gamma } } & { \eta = \mu ^ { - 1 } } & { \mu ^ { 2 } } \\ { \text { Inverse Gaussian } } & { \eta = \mu ^ { - 2 } } & { \mu ^ { 3 } } \end{array}\]

Iteratively re-weighted least squares

Consider following approximation, remembering \(g\) is link function.

\[\begin{array} { r l } { g ( y ) } & { \approx g ( \mu ) + ( y - \mu ) g ^ { \prime } ( \mu ) } \\ { } & { = \eta + ( y - \mu ) \frac { d \eta } { d \mu } } \\ { } & { \equiv z } \end{array}\]

Letting \(V(\hat{\mu_i})\) = variance function evaluated at \(\hat{\mu_i}\) define \(w\) by:

\[\widehat { \operatorname { var } } z = \left( \frac { d \eta } { d \mu } \right) ^ { 2 } V ( \hat { \mu } ) = \frac { 1 } { w }\]

Together with \(E(z) \approx \mathbf{X} \mathbf{\beta}\), suggests following iterately re-weig hted least squares algorithm

Set initial estimates \(\hat{\eta_0}\) and \(\hat{\mu_0}\)
Form the “adjusted dependent variable”
Form estimated weights
Perform weighted regression of \(z\) on x’s, weight weights \(w\), yields \(\hat{\mathbf{\beta_{(1)}}}\)
Use \(\hat{\mathbf{\beta_{(0)}}}\) to calculate \(\hat{\eta_1}\).
Iterate steps 2 to 5 until convergence.

At convergence yields m.l.e. \(\hat{\mathbf{\beta}}\)

\[\operatorname { var } ( \hat { \beta } ) = \left( X ^ { T } W X \right) ^ { - 1 }\] where \(W = Var(\mathbf{Y})\), (diagonal) variance matrix of \(Y\).

yields standard errors for \(\hat{\beta}\) (Wald test for \(H_0 : \beta_i = 0\), \(z = \hat{\beta_i}/s.e.(\hat{\beta})\)

\[ Deviance = -2 \text{log likelihood ratio} \] , where denominator likelihood is saturated model.

GLM Diagnostics

Pearson residual: \(r_i^P = \frac { y_i - \hat { \mu_i } } { \sqrt { V ( \hat { \mu_i } ) } }\) where \(V(\hat{\mu_i})\) is variance function evaluated at \(\hat{\mu_i}\)

note - for binary data, \(r_{P}\) is asymptotically binary.

Deviance residual: \(r_i^{D} = sgn ( y_i - \hat{\mu_i}) ) \sqrt(d_i)\) where \(Deviance = \sum_{i=1}^{n} d_i\)

Hat values: \(H = W ^ { 1 / 2 } X \left( X ^ { T } W X \right) ^ { - 1 } X ^ { T } W ^ { 1 / 2 }\), then \(h_i\) is the \(i^{th}\) diagonal value.

Terminology originates in linear model where \(\hat{\mathbf{y}} = X { ( X^T X )^{-1} X } X^T \mathbf{y}\) - In that case, mean of \(h_i\)’s = \(1/n\). \(h_i\) close to one indicates that fitted value f for case \(i\) is mainly determined by \(y_i\).

Cooks distance: \(D_{ i } = \frac { \left( \hat { \beta } _ { ( i ) } - \hat { \beta } \right) ^ { T } \left( X ^ { T } W X \right) \left( \hat { \beta } _ { ( i ) } - \hat { \beta } \right) } { p \hat { \phi } }\)

\(\hat{ \beta } _{(i)}\) is estimate of \(\beta\) after deleting case \(i\).
measure of influence of case \(i\) on overall estimates.

Diagnostic plots

Example: Binary logistic regression, WCGA example

## 
## Call:
## glm(formula = chd69 ~ age + chol + bmi + smoke + typeA, family = binomial, 
##     data = wcgs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2419  -0.4452  -0.3282  -0.2310   2.9352  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -10.450174   0.945501 -11.053  < 2e-16 ***
## age           0.067912   0.011744   5.783 7.35e-09 ***
## chol          0.011135   0.001489   7.476 7.65e-14 ***
## bmi           0.086813   0.025631   3.387 0.000707 ***
## smokeYes      0.614705   0.140115   4.387 1.15e-05 ***
## typeAB       -0.730325   0.143669  -5.083 3.71e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1779.2  on 3141  degrees of freedom
## Residual deviance: 1608.3  on 3136  degrees of freedom
##   (12 observations deleted due to missingness)
## AIC: 1620.3
## 
## Number of Fisher Scoring iterations: 6

For generalized linear models, especially unreplicated binary, small Poisson counts, crosstabulated data, these plots aren’t much help.

multinomial case, ordinal logistic case, plots aren’t even well defined.
need to tailor examination to structure, e.g. look at various stratified estimates in case of aggregated data.