Logistic Regression (part II)

Interpreting Odds

Baby feeding practice example

## 
## Call:
## glm(formula = cbind(disease, n - disease) ~ sex + feeding, family = binomial, 
##     data = babies)
## 
## Deviance Residuals: 
##       1        2        3        4        5        6  
##  0.1096  -0.5052   0.1922  -0.1342   0.5896  -0.2284  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -1.6127     0.1124 -14.347  < 2e-16 ***
## sexGirl        -0.3126     0.1410  -2.216   0.0267 *  
## feedingBoth    -0.1725     0.2056  -0.839   0.4013    
## feedingBreast  -0.6693     0.1530  -4.374 1.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.37529  on 5  degrees of freedom
## Residual deviance:  0.72192  on 2  degrees of freedom
## AIC: 40.24
## 
## Number of Fisher Scoring iterations: 4

estimate	std.error	statistic	p.value	conf.low	conf.high
0.199	0.112	-14.347	0.000	0.159	0.247
0.732	0.141	-2.216	0.027	0.554	0.963
0.842	0.206	-0.839	0.401	0.556	1.246
0.512	0.153	-4.374	0.000	0.378	0.690

The likelihood ratio test

Logistic regression model = mathematical formula for probability of observing \(\bf{Y} = \bf{y}\) (particular set of outcomes), given values for \(\bf{X}\) and \(\bf{\beta}\))
- \(\bf{X}\) = \((n \times q)~~\) “X matrix”, \(\bf{\beta}\) = parameter vector (\(q\) dimensional ).
Likelihood is simply same function where we consider \(\bf{\beta}\) as the variable of interest, \(\bf{X}\) and \(\bf y\) fixed at their observed values.
- write \(L (\bf{\beta})\)
- \(\hat{ \bf{\beta}}\), the maximum likelihood is the particular set of \(q\) values that maximizes \(L\)
\(L\) evaluated at \(\hat{ \bf{\beta}}\) (i.e. \(L(\hat{\bf{\beta}})\)), is called the observed likelihood.
Consider alternate models, a bigger model \(M\) with parameter vector \(\bf{\beta}_M\), and a smaller model, \(R\) with “same” set of parameters, only with restrictions placed on the some or all of the values of \(\bf{\beta}_M\) by a particular hypothesis, generally of the form of \(H_0\) fixing values of some of the parameters, usually at \(0\).
can formally test \(H_0\): Model \(R\) is correct, vs. \(H_A\): Model \(M\) is correct by calculating the likelihood ratio statistic:

\[\Delta = 2 \log \frac { L (\hat{\bf{\beta}}_M) } { L (\hat{\bf{\beta}_R})} = 2 \left[ log L(\hat{\bf{\beta}}_M) - log L( \hat{\bf{\beta}_R}) \right] \]

where \(\hat{\bf{\beta}}_R\) is called the restricted maximum likelihood estimate.

If \(H_0\) is correct, the sampling distribution of \(\Delta\) will be approximately \({\chi}^2\) h \(r\) degrees of freedom, where \(r\) is the number of mathematically independent restrictions.

The Deviance

a variant of the above statistic, called the Deviance, is derived by setting
- \(M\) = model which provides a perfect fit to the data (the Saturated model) and
- \(R\) = the particular model of interest.
In logistic regression based on the binomial data this takes the form

\[ D = 2 \sum _ { i = 1 } ^ { n } \left\{ y _ { i } \log y _ { i } / \hat { y } _ { i } + \left( n _ { i } - y _ { i } \right) \log \left( n _ { i } - y _ { i } \right) / \left( n _ { i } - \hat { y } _ { i } \right) \right\} \]

If we have binary data this equals

\[ D = - 2 \sum _ { i = 1 } ^ { n } \left\{ \hat { p } _ { i } \operatorname { logit } \left( \hat { p } _ { i } \right) + \log \left( 1 - \hat { p } _ { i } \right) \right\} \]

Note if we wish to perform a comparison of nested models in general (i.e. M not the saturated model) we write the log of the likelihood ratio as

\[ \Delta = Deviance_R - Deviance_M \] which will tend (asymptotically) to follow a \(\chi^2\) distribution on \(q_M - q_R\) degrees of freedeom, \(q_M\) is the number of parameters in model \(M\) and \(q_R\) is the number of parameters in model \(R\).

M.I. Example

The following variables were recorded for case series of 58 patients who underwent coronary bypass surgery.

sex
age at surgery in years
pre-operative use of beta blockers (yes or no)
pre-operative NYHA functional class (II, III, IV)
occurrence of peri-operative M.I.(myocardial infarct) (yes or no)

The investigators wished to determine predictors of M.I.(heart attack)

##  MI                    sex            age                  beta          
##  N              :200   M:183(91.5%)   Mean   : 57.475000   No : 44(22%)  
##  Complete cases :200   F: 17( 8.5%)   S.D.   :  8.275979   Yes:156(78%)  
##  % missing items:  0                  Min.   : 35.000000                 
##  Variables      :  5                  1st Qu.: 52.000000                 
##  Factors        :  4                  Median : 57.000000                 
##                                       3rd Qu.: 62.000000                 
##                                       Max.   : 73.000000                 
##                                       n      :200.000000                 
##                                                                          
##  nyha             mi              
##  II : 53(26.5%)   No :163(81.5%)  
##  III:118(59.0%)   Yes: 37(18.5%)  
##  IV : 29(14.5%)                   
##                                   
##                                   
##                                   
##                                   
##                                   
##

##  mi                sex                               
##  No :163(81.5%)            No             Yes        
##  Yes: 37(18.5%)    M 154/183(84.2%)  29/183(15.8%)   
##                    F    9/17(52.9%)    8/17(47.1%)   
##                                                      
##                                                      
##  age                   beta                                
##      Mean( SD , N )              No             Yes        
##  No  58.4(8.03,163)    No    33/44(75.0%)   11/44(25.0%)   
##  Yes 53.2(8.06, 37)    Yes 130/156(83.3%)  26/156(16.7%)   
##        s=8,r2=0.061                                        
##                                                            
##  nyha                              
##            No           Yes        
##  II   46/53(86.8%)   7/53(13.2%)   
##  III 98/118(83.1%) 20/118(16.9%)   
##  IV   19/29(65.5%)  10/29(34.5%)   
##

## # A tibble: 6 x 7
##   term        estimate std.error statistic    p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>      <dbl>    <dbl>     <dbl>
## 1 (Intercept)  336.       1.68        3.46 0.000535     13.5   10355.   
## 2 sexF           2.71     0.596       1.67 0.0941        0.831     8.81 
## 3 age            0.874    0.0303     -4.45 0.00000848    0.821     0.925
## 4 betaYes        0.358    0.476      -2.16 0.0308        0.140     0.921
## 5 nyhaIII        1.87     0.527       1.19 0.234         0.696     5.60 
## 6 nyhaIV        15.4      0.758       3.61 0.000307      3.71     74.4

Investigators wished to see if NYHA functional class had different significance depending on sex.

## # A tibble: 8 x 5
##   term         estimate std.error statistic    p.value
##   <chr>           <dbl>     <dbl>     <dbl>      <dbl>
## 1 (Intercept)     6.70     1.91      3.51   0.000453  
## 2 age            -0.169    0.0358   -4.74   0.00000217
## 3 betaYes        -1.17     0.524    -2.23   0.0261    
## 4 sexF            5.08     1.39      3.66   0.000250  
## 5 nyhaIII         1.65     0.746     2.21   0.0273    
## 6 nyhaIV          4.91     1.07      4.61   0.00000409
## 7 sexF:nyhaIII   -4.10     1.69     -2.42   0.0153    
## 8 sexF:nyhaIV   -21.9   1073.       -0.0204 0.984

## Analysis of Deviance Table
## 
## Model 1: mi ~ sex + age + beta + nyha
## Model 2: mi ~ age + beta + sex * nyha
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       194     153.64                          
## 2       192     130.65  2    22.99 1.018e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note the very large estimate with large standard error for the second component of the sex*nyha interaction.

Can also (if lazy) use sequential ANOVA

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: mi
## 
## Terms added sequentially (first to last)
## 
## 
##          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                       199     191.56              
## age       1  12.6537       198     178.90 0.0003748 ***
## beta      1   3.6754       197     175.23 0.0552219 .  
## sex       1   6.2093       196     169.02 0.0127082 *  
## nyha      2  15.3782       194     153.64 0.0004578 ***
## sex:nyha  2  22.9897       192     130.65 1.018e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpreting the interaction

When trying to explain a complicated model, fitted values are always helpful.

Prediction probabilities

I’ve chosen a new set of typical \(x\)-values to examine the interaction. How? - Age 60 is close to average age in sample
- Most patients were on beta-blockers

sex	age	beta	nyha
M	60	Yes	II
M	60	Yes	III
M	60	Yes	IV
F	60	Yes	II
F	60	Yes	III
F	60	Yes	IV

Predicted values derived from \(\hat{\eta} = \bf{X_{new}}\hat{\beta}\)

preds <- predict(lfit2,newdata=MIpred,type="link",se.fit=TRUE)
predMat <- round(
  1/(1+(exp(-1 *cbind(preds$fit,sweep(outer(1.96*preds$se.fit,c(-1,1),"*"),1,preds$fit,"+"))))),2)
dimnames(predMat)[[2]] <- c("p.hat","2.5%","97.5%") 
kableN(cbind(MIpred,predMat))

sex	age	beta	nyha	p.hat	2.5%	97.5%
M	60	Yes	II	0.01	0.00	0.05
M	60	Yes	III	0.05	0.02	0.11
M	60	Yes	IV	0.57	0.32	0.79
F	60	Yes	II	0.61	0.14	0.94
F	60	Yes	III	0.12	0.02	0.48
F	60	Yes	IV	0.00	0.00	1.00

Applying bias reduced logistic regression

Last coefficient of the “interaction” model large, large standard error
Result of small sample sizes in cross tabulation of \(sex\), \(nyha\)

	II	III	IV
M	48	111	24
F	5	7	5

Indicates we have unstable estimates,
Asymptotic likelihood base procedures may not be mis-behaving

NB -

Possible solution is to “regularize” estimates
Shrink estimates towards zero
- possible approaches, penalized likelihood, pseudo-Bayesian priors,
More classical approach - introducing correction to remove \(O(1/n)\) bias in estimates
- Firth’s method, available in packages brglm, brglm2

## 
## Call:
## brglm(formula = mi ~ age + beta + sex * nyha, family = binomial, 
##     data = MI)
## 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   6.37229    1.84742   3.449 0.000562 ***
## age          -0.15944    0.03427  -4.652 3.29e-06 ***
## betaYes      -1.12446    0.51042  -2.203 0.027595 *  
## sexF          4.54841    1.29510   3.512 0.000445 ***
## nyhaIII       1.47037    0.70376   2.089 0.036680 *  
## nyhaIV        4.57496    1.00519   4.551 5.33e-06 ***
## sexF:nyhaIII -3.62777    1.60837  -2.256 0.024098 *  
## sexF:nyhaIV  -7.20278    2.14939  -3.351 0.000805 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 171.26  on 199  degrees of freedom
## Residual deviance: 131.75  on 192  degrees of freedom
## Penalized deviance: 119.339 
## AIC:  147.75

## Analysis of Deviance Table
## 
## Model 1: mi ~ age + beta + sex + nyha
## Model 2: mi ~ age + beta + sex * nyha
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       194     153.75                          
## 2       192     131.75  2   22.003 1.668e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Fitted proportions

	II	III	IV
M	0.07	0.15	0.42
F	0.76	0.56	0.08

Analyzing Rare Events

Convergence problems in likelihood estimation

Graphical inspection

Choice of Link Function

NB -