## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Coeff. | p-value | 95% Limits | ||
---|---|---|---|---|
(Intercept) | 0.000000e+00 | 0.9995046 | 0 | Inf |
estrogen | 0.000000e+00 | 0.9990517 | 0 | Inf |
androgen | 6.657128e+43 | 0.9991320 | 0 | Inf |
## 1 | 2: represents 0.12
## leaf unit: 0.01
## n: 26
## (15) 0 | 000000000000000
## 1 |
## 2 |
## 3 |
## 4 |
## 5 |
## 6 |
## 7 |
## 8 |
## 11 9 | 9
## 10 10 | 10101010101010101010
## 1 | 2: represents 0.12
## leaf unit: 0.01
## n: 26
## 12 0 | 000000112336
## (1) 1 | 4
## (2) 2 | 56
## 3 |
## 4 |
## 5 |
## 11 6 | 6
## 7 |
## 10 8 | 7
## 9 9 | 234667789
Coeff. | p-value | 95% Limits | ||
---|---|---|---|---|
(Intercept) | 0.0259852 | 0.2097630 | 0.0000866 | 7.7988764 |
estrogen | 0.0277284 | 0.0167279 | 0.0014704 | 0.5229104 |
androgen | 58.7449870 | 0.0119570 | 2.4520149 | 1407.4031647 |
General Case
\[ X ^ { 2 } = \sum _ { i = 1 } ^ { n } \frac { \left( O _ { i } - E _ { i } \right) ^ { 2 } } { E _ { i } } \]
Replicated binary case ( \(Y_i \sim \bf{Binom}\left( n_i, p_i \right)\) )
\[ X ^ { 2 } = \sum _ { i = 1 } ^ { n } \frac { \left( y _ { i } - n _ { i } \hat { p } _ { i } \right) ^ { 2 } } { n _ { i } \hat { p } _ { i } \left( 1 - \hat { p } _ { i } \right) } \]
approximates the Residual Deviance
\[ r _ { i } ^ { P } = \left( y _ { i } - n _ { i } \hat { p } _ { i } \right) / \sqrt { \operatorname { var } } \hat { y } _ { i } \]
\[ X ^ { 2 } = \sum _ { i = 1 } ^ { n } \left( r _ { i } ^ { P } \right) ^ { 2 } \]
\[ r_{ i }^{D} = \sqrt { 2 \left[ y _ { i } \log \left( \frac { y _ { i } } { n \hat { p }_ { i } } \right) + \left( n _ { i } - y _ { i } \right) \log \left( \frac { n _ { i } - y _ { i } } { n _ { i } - n_i \hat {p} _ { i } } \right) \right] } \]
\[ Residual~~Deviance = \sum _ { i = 1 } ^ { n } \left( r _ { i } ^ { D } \right) ^ { 2 } \]
Nagelkirke’s statistic
\[ R ^ { 2 } = \frac { 1 - \left( \hat { L } _ { 0 } / \hat { L } \right) ^ { 2 / n } } { 1 - \hat { L } _ { o } ^ { 2 / n } } = \frac { 1 - \exp \left( \left( D - D _ { \text {null} } \right) / n \right) } { 1 - \exp \left( - D _ { \text { null } } / n \right) } \]
\[\downarrow\]
When outcomes are not common, but not so uncommon as to justify a relative risk interpretation for odds ratios, and alternative is to use the log link, \(\eta_i = log(p_i)\).
##
## Call:
## glm(formula = cbind(survive, total - survive) ~ location + period,
## family = binomial, data = troutegg)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.8305 -0.3650 -0.0303 0.6191 3.2434
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.6358 0.2813 16.479 < 2e-16 ***
## location2 -0.4168 0.2461 -1.694 0.0903 .
## location3 -1.2421 0.2194 -5.660 1.51e-08 ***
## location4 -0.9509 0.2288 -4.157 3.23e-05 ***
## location5 -4.6138 0.2502 -18.439 < 2e-16 ***
## period7 -2.1702 0.2384 -9.103 < 2e-16 ***
## period8 -2.3256 0.2429 -9.573 < 2e-16 ***
## period11 -2.4500 0.2341 -10.466 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1021.469 on 19 degrees of freedom
## Residual deviance: 64.495 on 12 degrees of freedom
## AIC: 157.03
##
## Number of Fisher Scoring iterations: 5
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: cbind(survive, total - survive)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 19 1021.47
## location 4 792.90 15 228.57 < 2.2e-16 ***
## period 3 164.08 12 64.50 < 2.2e-16 ***
## location:period 12 64.50 0 0.00 3.379e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The validity of the test for interaction depends on the assumption of independence
This assumption is questionable given the study design
Suppose that survival outcomes (0/1) of eggs within box have positive correlation \(\rho\) Can write variance of \(Y_i\) (the number of surviving eggs) as
\[ Var(Y_i) = m_i p_i ( 1-p_i ) \left\{ 1 + ( m_i - 1) \rho \right\} \]
where \(m_i\) is the number of eggs in box \(i\). The expression \(1 + (m_i -1 ) \rho\) gives the proportional increase in variance due to correlation. This expression is used as a sample size inflation factor in cluster-randomized trials.
A clever and easily implemented method to account for overdispersion is to extend the generalized linear model in an ad hoc fashion with a dispersion coefficient, which is analogous to the residual error variance \(\sigma_{\epsilon}^2\) in a linear model.
The overdispersion parameter is derived by dividing the Pearson goodness of fit statistic by its degrees of freedeom, i.e. \(\hat{\sigma}^2 = \frac{X^2}{residual~ d.f.}\) This estimate can be used to inflate the standard errors of the \(\beta\) estimates to allow for overdispersion.
sigma2 <- sum(residuals(lgtfit,type="pearson")^2)/12
lgtfit <- glm(cbind(survive,total-survive) ~ location+period, family=binomial,troutegg)
summary(lgtfit,dispersion=sigma2)
##
## Call:
## glm(formula = cbind(survive, total - survive) ~ location + period,
## family = binomial, data = troutegg)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.8305 -0.3650 -0.0303 0.6191 3.2434
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.6358 0.6495 7.138 9.49e-13 ***
## location2 -0.4168 0.5682 -0.734 0.4632
## location3 -1.2421 0.5066 -2.452 0.0142 *
## location4 -0.9509 0.5281 -1.800 0.0718 .
## location5 -4.6138 0.5777 -7.987 1.39e-15 ***
## period7 -2.1702 0.5504 -3.943 8.05e-05 ***
## period8 -2.3256 0.5609 -4.146 3.38e-05 ***
## period11 -2.4500 0.5405 -4.533 5.82e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 5.330322)
##
## Null deviance: 1021.469 on 19 degrees of freedom
## Residual deviance: 64.495 on 12 degrees of freedom
## AIC: 157.03
##
## Number of Fisher Scoring iterations: 5
The same result can be obtained automatically by using the quasibinomial link.
##
## Call:
## glm(formula = cbind(survive, total - survive) ~ location + period,
## family = quasibinomial, data = troutegg)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.8305 -0.3650 -0.0303 0.6191 3.2434
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6358 0.6495 7.138 1.18e-05 ***
## location2 -0.4168 0.5682 -0.734 0.477315
## location3 -1.2421 0.5066 -2.452 0.030501 *
## location4 -0.9509 0.5281 -1.800 0.096970 .
## location5 -4.6138 0.5777 -7.987 3.82e-06 ***
## period7 -2.1702 0.5504 -3.943 0.001953 **
## period8 -2.3256 0.5609 -4.146 0.001356 **
## period11 -2.4500 0.5405 -4.533 0.000686 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 5.330358)
##
## Null deviance: 1021.469 on 19 degrees of freedom
## Residual deviance: 64.495 on 12 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
Note that the p-values in the above analysis are derived from the t distribution
While the quasi-likelihood provides a convenient adjustment for overdispersion, modern computing facilities permit a more theoretically satisfactory analysis using a *random effects$ approach.
The fixed effect model takes \(\eta_i\) to be the non-random quantity \(\mathbf{x}_i \mathbf{\beta}\). Adding an additional random component \(\epsilon_i \sim N(0,\sigma^2)\), i.e. \(\eta_i = \mathbf{x}_i \mathbf{\beta} + \epsilon_i\) also gives rise to an \(overdispersion\) model, although there is no closed form expression for the variance due to the non-linear relation between \(\eta_i\) and \(p_i\).
Such models are implemented in the \(lme4\) package, as follows
troutegg$box <- with(troutegg, factor( paste(location,period,sep=":")))
mefit <- glmer(cbind(survive, total-survive) ~ location + period + (1 | box),
family=binomial, data=troutegg)
summary(mefit,correlation=FALSE)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(survive, total - survive) ~ location + period + (1 | box)
## Data: troutegg
##
## AIC BIC logLik deviance df.resid
## 148 157 -65 130 11
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.07855 -0.16858 0.01644 0.26816 1.31293
##
## Random effects:
## Groups Name Variance Std.Dev.
## box (Intercept) 0.2195 0.4685
## Number of obs: 20, groups: box, 20
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.4483 0.4385 10.143 < 2e-16 ***
## location2 -0.2708 0.4366 -0.620 0.53505
## location3 -1.1040 0.4174 -2.645 0.00817 **
## location4 -0.7207 0.4368 -1.650 0.09896 .
## location5 -4.6067 0.4296 -10.724 < 2e-16 ***
## period7 -2.0500 0.4087 -5.016 5.27e-07 ***
## period8 -2.1519 0.4058 -5.303 1.14e-07 ***
## period11 -2.5133 0.4145 -6.063 1.33e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1