Review “theory”
Have a response variable with distribution in exponential family
\[f ( y | \theta , \phi ) = \exp \left[ \frac { y \theta - b ( \theta ) } { a ( \phi ) } + c ( y , \phi ) \right]\]
\(\theta\) is the natural parameter, \(\phi\) is dispersion parameter (if present)
Includes Gaussian, Binomial, Poisson, etc.
\[\begin{array} { r l } { E Y } & { = \mu = b ^ { \prime } ( \theta ) } \\ { \operatorname { var } Y } & { = b ^ { \prime \prime } ( \theta ) a ( \phi ) } \end{array}\]
Variance function: \(b''(\theta)\)
Link functions: \(g(\mu) = \eta = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \cdots + \beta _ { p } x _ { p } = x ^ { T } \beta\)
Examples
\[\begin{array} { l l l } { \text { Family } } & { \text { Link } } & { \text { Variance Function } } \\ { \text { Normal } } & { \eta = \mu } & { 1 ~ ;~ a(\phi) = \sigma^2 } \\ { \text { Poisson } } & { \eta = \log \mu } & { \mu } \\ { \text { Binary } } & { \eta = \log ( \mu / ( 1 - \mu ) ) } & { \mu ( 1 - \mu ) } \\ { \text { Gamma } } & { \eta = \mu ^ { - 1 } } & { \mu ^ { 2 } } \\ { \text { Inverse Gaussian } } & { \eta = \mu ^ { - 2 } } & { \mu ^ { 3 } } \end{array}\]
Consider following approximation, remembering \(g\) is link function.
\[\begin{array} { r l } { g ( y ) } & { \approx g ( \mu ) + ( y - \mu ) g ^ { \prime } ( \mu ) } \\ { } & { = \eta + ( y - \mu ) \frac { d \eta } { d \mu } } \\ { } & { \equiv z } \end{array}\]
Letting \(V(\hat{\mu_i})\) = variance function evaluated at \(\hat{\mu_i}\) define \(w\) by:
\[\widehat { \operatorname { var } } z = \left( \frac { d \eta } { d \mu } \right) ^ { 2 } V ( \hat { \mu } ) = \frac { 1 } { w }\]
Together with \(E(z) \approx \mathbf{X} \mathbf{\beta}\), suggests following iterately re-weig hted least squares algorithm
Set initial estimates \(\hat{\eta_0}\) and \(\hat{\mu_0}\)
Form the “adjusted dependent variable”
Perform weighted regression of \(z\) on x’s, weight weights \(w\), yields \(\hat{\mathbf{\beta_{(1)}}}\)
Use \(\hat{\mathbf{\beta_{(0)}}}\) to calculate \(\hat{\eta_1}\).
Iterate steps 2 to 5 until convergence.
At convergence yields m.l.e. \(\hat{\mathbf{\beta}}\)
\[\operatorname { var } ( \hat { \beta } ) = \left( X ^ { T } W X \right) ^ { - 1 }\] where \(W = Var(\mathbf{Y})\), (diagonal) variance matrix of \(Y\).
yields standard errors for \(\hat{\beta}\) (Wald test for \(H_0 : \beta_i = 0\), \(z = \hat{\beta_i}/s.e.(\hat{\beta})\)
\[ Deviance = -2 \text{log likelihood ratio} \] , where denominator likelihood is saturated model.
Pearson residual: \(r_i^P = \frac { y_i - \hat { \mu_i } } { \sqrt { V ( \hat { \mu_i } ) } }\) where \(V(\hat{\mu_i})\) is variance function evaluated at \(\hat{\mu_i}\)
note - for binary data, \(r_{P}\) is asymptotically binary.
Deviance residual: \(r_i^{D} = sgn ( y_i - \hat{\mu_i}) ) \sqrt(d_i)\) where \(Deviance = \sum_{i=1}^{n} d_i\)
Hat values: \(H = W ^ { 1 / 2 } X \left( X ^ { T } W X \right) ^ { - 1 } X ^ { T } W ^ { 1 / 2 }\), then \(h_i\) is the \(i^{th}\) diagonal value.
Cooks distance: \(D_{ i } = \frac { \left( \hat { \beta } _ { ( i ) } - \hat { \beta } \right) ^ { T } \left( X ^ { T } W X \right) \left( \hat { \beta } _ { ( i ) } - \hat { \beta } \right) } { p \hat { \phi } }\)
Example: Binary logistic regression, WCGA example
##
## Call:
## glm(formula = chd69 ~ age + chol + bmi + smoke + typeA, family = binomial,
## data = wcgs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2419 -0.4452 -0.3282 -0.2310 2.9352
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.450174 0.945501 -11.053 < 2e-16 ***
## age 0.067912 0.011744 5.783 7.35e-09 ***
## chol 0.011135 0.001489 7.476 7.65e-14 ***
## bmi 0.086813 0.025631 3.387 0.000707 ***
## smokeYes 0.614705 0.140115 4.387 1.15e-05 ***
## typeAB -0.730325 0.143669 -5.083 3.71e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1779.2 on 3141 degrees of freedom
## Residual deviance: 1608.3 on 3136 degrees of freedom
## (12 observations deleted due to missingness)
## AIC: 1620.3
##
## Number of Fisher Scoring iterations: 6
For generalized linear models, especially unreplicated binary, small Poisson counts, crosstabulated data, these plots aren’t much help.
multinomial case, ordinal logistic case, plots aren’t even well defined.
need to tailor examination to structure, e.g. look at various stratified estimates in case of aggregated data.