Generalized Linear Models - Model fitting, Assessing assumptions

Review “theory”

Have a response variable with distribution in exponential family

\[f ( y | \theta , \phi ) = \exp \left[ \frac { y \theta - b ( \theta ) } { a ( \phi ) } + c ( y , \phi ) \right]\]

\(\theta\) is the natural parameter, \(\phi\) is dispersion parameter (if present)

Includes Gaussian, Binomial, Poisson, etc.

Mean and variance

\[\begin{array} { r l } { E Y } & { = \mu = b ^ { \prime } ( \theta ) } \\ { \operatorname { var } Y } & { = b ^ { \prime \prime } ( \theta ) a ( \phi ) } \end{array}\]

Variance function: \(b''(\theta)\)

Link functions: \(g(\mu) = \eta = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \cdots + \beta _ { p } x _ { p } = x ^ { T } \beta\)

Examples

\[\begin{array} { l l l } { \text { Family } } & { \text { Link } } & { \text { Variance Function } } \\ { \text { Normal } } & { \eta = \mu } & { 1 ~ ;~ a(\phi) = \sigma^2 } \\ { \text { Poisson } } & { \eta = \log \mu } & { \mu } \\ { \text { Binary } } & { \eta = \log ( \mu / ( 1 - \mu ) ) } & { \mu ( 1 - \mu ) } \\ { \text { Gamma } } & { \eta = \mu ^ { - 1 } } & { \mu ^ { 2 } } \\ { \text { Inverse Gaussian } } & { \eta = \mu ^ { - 2 } } & { \mu ^ { 3 } } \end{array}\]

Iteratively re-weighted least squares

Consider following approximation, remembering \(g\) is link function.

\[\begin{array} { r l } { g ( y ) } & { \approx g ( \mu ) + ( y - \mu ) g ^ { \prime } ( \mu ) } \\ { } & { = \eta + ( y - \mu ) \frac { d \eta } { d \mu } } \\ { } & { \equiv z } \end{array}\]

Letting \(V(\hat{\mu_i})\) = variance function evaluated at \(\hat{\mu_i}\) define \(w\) by:

\[\widehat { \operatorname { var } } z = \left( \frac { d \eta } { d \mu } \right) ^ { 2 } V ( \hat { \mu } ) = \frac { 1 } { w }\]

Together with \(E(z) \approx \mathbf{X} \mathbf{\beta}\), suggests following iterately re-weig hted least squares algorithm

  1. Set initial estimates \(\hat{\eta_0}\) and \(\hat{\mu_0}\)

  2. Form the “adjusted dependent variable”

  3. Form estimated weights
  4. Perform weighted regression of \(z\) on x’s, weight weights \(w\), yields \(\hat{\mathbf{\beta_{(1)}}}\)

  5. Use \(\hat{\mathbf{\beta_{(0)}}}\) to calculate \(\hat{\eta_1}\).

  6. Iterate steps 2 to 5 until convergence.

At convergence yields m.l.e. \(\hat{\mathbf{\beta}}\)

\[\operatorname { var } ( \hat { \beta } ) = \left( X ^ { T } W X \right) ^ { - 1 }\] where \(W = Var(\mathbf{Y})\), (diagonal) variance matrix of \(Y\).

yields standard errors for \(\hat{\beta}\) (Wald test for \(H_0 : \beta_i = 0\), \(z = \hat{\beta_i}/s.e.(\hat{\beta})\)

\[ Deviance = -2 \text{log likelihood ratio} \] , where denominator likelihood is saturated model.

GLM Diagnostics

Pearson residual: \(r_i^P = \frac { y_i - \hat { \mu_i } } { \sqrt { V ( \hat { \mu_i } ) } }\) where \(V(\hat{\mu_i})\) is variance function evaluated at \(\hat{\mu_i}\)

note - for binary data, \(r_{P}\) is asymptotically binary.

Deviance residual: \(r_i^{D} = sgn ( y_i - \hat{\mu_i}) ) \sqrt(d_i)\) where \(Deviance = \sum_{i=1}^{n} d_i\)

Hat values: \(H = W ^ { 1 / 2 } X \left( X ^ { T } W X \right) ^ { - 1 } X ^ { T } W ^ { 1 / 2 }\), then \(h_i\) is the \(i^{th}\) diagonal value.

  • Terminology originates in linear model where \(\hat{\mathbf{y}} = X { ( X^T X )^{-1} X } X^T \mathbf{y}\) - In that case, mean of \(h_i\)’s = \(1/n\). \(h_i\) close to one indicates that fitted value f for case \(i\) is mainly determined by \(y_i\).

Cooks distance: \(D_{ i } = \frac { \left( \hat { \beta } _ { ( i ) } - \hat { \beta } \right) ^ { T } \left( X ^ { T } W X \right) \left( \hat { \beta } _ { ( i ) } - \hat { \beta } \right) } { p \hat { \phi } }\)

  • \(\hat{ \beta } _{(i)}\) is estimate of \(\beta\) after deleting case \(i\).
  • measure of influence of case \(i\) on overall estimates.

Diagnostic plots

Example: Binary logistic regression, WCGA example

## 
## Call:
## glm(formula = chd69 ~ age + chol + bmi + smoke + typeA, family = binomial, 
##     data = wcgs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2419  -0.4452  -0.3282  -0.2310   2.9352  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -10.450174   0.945501 -11.053  < 2e-16 ***
## age           0.067912   0.011744   5.783 7.35e-09 ***
## chol          0.011135   0.001489   7.476 7.65e-14 ***
## bmi           0.086813   0.025631   3.387 0.000707 ***
## smokeYes      0.614705   0.140115   4.387 1.15e-05 ***
## typeAB       -0.730325   0.143669  -5.083 3.71e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1779.2  on 3141  degrees of freedom
## Residual deviance: 1608.3  on 3136  degrees of freedom
##   (12 observations deleted due to missingness)
## AIC: 1620.3
## 
## Number of Fisher Scoring iterations: 6

For generalized linear models, especially unreplicated binary, small Poisson counts, crosstabulated data, these plots aren’t much help.

  • multinomial case, ordinal logistic case, plots aren’t even well defined.

  • need to tailor examination to structure, e.g. look at various stratified estimates in case of aggregated data.