Logistic Regression (part II)

Interpreting Odds

Baby feeding practice example

NB -

## 
## Call:
## glm(formula = cbind(disease, n - disease) ~ sex + feeding, family = binomial, 
##     data = babies)
## 
## Deviance Residuals: 
##       1        2        3        4        5        6  
##  0.1096  -0.5052   0.1922  -0.1342   0.5896  -0.2284  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -1.6127     0.1124 -14.347  < 2e-16 ***
## sexGirl        -0.3126     0.1410  -2.216   0.0267 *  
## feedingBoth    -0.1725     0.2056  -0.839   0.4013    
## feedingBreast  -0.6693     0.1530  -4.374 1.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.37529  on 5  degrees of freedom
## Residual deviance:  0.72192  on 2  degrees of freedom
## AIC: 40.24
## 
## Number of Fisher Scoring iterations: 4
estimate std.error statistic p.value conf.low conf.high
0.199 0.112 -14.347 0.000 0.159 0.247
0.732 0.141 -2.216 0.027 0.554 0.963
0.842 0.206 -0.839 0.401 0.556 1.246
0.512 0.153 -4.374 0.000 0.378 0.690

The likelihood ratio test

  • Logistic regression model = mathematical formula for probability of observing \(\bf{Y} = \bf{y}\) (particular set of outcomes), given values for \(\bf{X}\) and \(\bf{\beta}\))

    • \(\bf{X}\) = \((n \times q)~~\) “X matrix”, \(\bf{\beta}\) = parameter vector (\(q\) dimensional ).
  • Likelihood is simply same function where we consider \(\bf{\beta}\) as the variable of interest, \(\bf{X}\) and \(\bf y\) fixed at their observed values.

    • write \(L (\bf{\beta})\)

    • \(\hat{ \bf{\beta}}\), the maximum likelihood is the particular set of \(q\) values that maximizes \(L\)

  • \(L\) evaluated at \(\hat{ \bf{\beta}}\) (i.e. \(L(\hat{\bf{\beta}})\)), is called the observed likelihood.

  • Consider alternate models, a bigger model \(M\) with parameter vector \(\bf{\beta}_M\), and a smaller model, \(R\) with “same” set of parameters, only with restrictions placed on the some or all of the values of \(\bf{\beta}_M\) by a particular hypothesis, generally of the form of \(H_0\) fixing values of some of the parameters, usually at \(0\).

  • can formally test \(H_0\): Model \(R\) is correct, vs. \(H_A\): Model \(M\) is correct by calculating the likelihood ratio statistic:

\[\Delta = 2 \log \frac { L (\hat{\bf{\beta}}_M) } { L (\hat{\bf{\beta}_R})} = 2 \left[ log L(\hat{\bf{\beta}}_M) - log L( \hat{\bf{\beta}_R}) \right] \]

where \(\hat{\bf{\beta}}_R\) is called the restricted maximum likelihood estimate.

If \(H_0\) is correct, the sampling distribution of \(\Delta\) will be approximately \({\chi}^2\) h \(r\) degrees of freedom, where \(r\) is the number of mathematically independent restrictions.

The Deviance

  • a variant of the above statistic, called the Deviance, is derived by setting
    • \(M\) = model which provides a perfect fit to the data (the Saturated model) and
    • \(R\) = the particular model of interest.
  • In logistic regression based on the binomial data this takes the form

\[ D = 2 \sum _ { i = 1 } ^ { n } \left\{ y _ { i } \log y _ { i } / \hat { y } _ { i } + \left( n _ { i } - y _ { i } \right) \log \left( n _ { i } - y _ { i } \right) / \left( n _ { i } - \hat { y } _ { i } \right) \right\} \]

If we have binary data this equals

\[ D = - 2 \sum _ { i = 1 } ^ { n } \left\{ \hat { p } _ { i } \operatorname { logit } \left( \hat { p } _ { i } \right) + \log \left( 1 - \hat { p } _ { i } \right) \right\} \]

Note if we wish to perform a comparison of nested models in general (i.e. M not the saturated model) we write the log of the likelihood ratio as

\[ \Delta = Deviance_R - Deviance_M \] which will tend (asymptotically) to follow a \(\chi^2\) distribution on \(q_M - q_R\) degrees of freedeom, \(q_M\) is the number of parameters in model \(M\) and \(q_R\) is the number of parameters in model \(R\).

M.I. Example

The following variables were recorded for case series of 58 patients who underwent coronary bypass surgery.

  • sex
  • age at surgery in years
  • pre-operative use of beta blockers (yes or no)
  • pre-operative NYHA functional class (II, III, IV)
  • occurrence of peri-operative M.I.(myocardial infarct) (yes or no)

The investigators wished to determine predictors of M.I.(heart attack)

##  MI                    sex            age                  beta          
##  N              :200   M:183(91.5%)   Mean   : 57.475000   No : 44(22%)  
##  Complete cases :200   F: 17( 8.5%)   S.D.   :  8.275979   Yes:156(78%)  
##  % missing items:  0                  Min.   : 35.000000                 
##  Variables      :  5                  1st Qu.: 52.000000                 
##  Factors        :  4                  Median : 57.000000                 
##                                       3rd Qu.: 62.000000                 
##                                       Max.   : 73.000000                 
##                                       n      :200.000000                 
##                                                                          
##  nyha             mi              
##  II : 53(26.5%)   No :163(81.5%)  
##  III:118(59.0%)   Yes: 37(18.5%)  
##  IV : 29(14.5%)                   
##                                   
##                                   
##                                   
##                                   
##                                   
## 
##  mi                sex                               
##  No :163(81.5%)            No             Yes        
##  Yes: 37(18.5%)    M 154/183(84.2%)  29/183(15.8%)   
##                    F    9/17(52.9%)    8/17(47.1%)   
##                                                      
##                                                      
##  age                   beta                                
##      Mean( SD , N )              No             Yes        
##  No  58.4(8.03,163)    No    33/44(75.0%)   11/44(25.0%)   
##  Yes 53.2(8.06, 37)    Yes 130/156(83.3%)  26/156(16.7%)   
##        s=8,r2=0.061                                        
##                                                            
##  nyha                              
##            No           Yes        
##  II   46/53(86.8%)   7/53(13.2%)   
##  III 98/118(83.1%) 20/118(16.9%)   
##  IV   19/29(65.5%)  10/29(34.5%)   
## 
## # A tibble: 6 x 7
##   term        estimate std.error statistic    p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>      <dbl>    <dbl>     <dbl>
## 1 (Intercept)  336.       1.68        3.46 0.000535     13.5   10355.   
## 2 sexF           2.71     0.596       1.67 0.0941        0.831     8.81 
## 3 age            0.874    0.0303     -4.45 0.00000848    0.821     0.925
## 4 betaYes        0.358    0.476      -2.16 0.0308        0.140     0.921
## 5 nyhaIII        1.87     0.527       1.19 0.234         0.696     5.60 
## 6 nyhaIV        15.4      0.758       3.61 0.000307      3.71     74.4

Investigators wished to see if NYHA functional class had different significance depending on sex.

## # A tibble: 8 x 5
##   term         estimate std.error statistic    p.value
##   <chr>           <dbl>     <dbl>     <dbl>      <dbl>
## 1 (Intercept)     6.70     1.91      3.51   0.000453  
## 2 age            -0.169    0.0358   -4.74   0.00000217
## 3 betaYes        -1.17     0.524    -2.23   0.0261    
## 4 sexF            5.08     1.39      3.66   0.000250  
## 5 nyhaIII         1.65     0.746     2.21   0.0273    
## 6 nyhaIV          4.91     1.07      4.61   0.00000409
## 7 sexF:nyhaIII   -4.10     1.69     -2.42   0.0153    
## 8 sexF:nyhaIV   -21.9   1073.       -0.0204 0.984
## Analysis of Deviance Table
## 
## Model 1: mi ~ sex + age + beta + nyha
## Model 2: mi ~ age + beta + sex * nyha
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       194     153.64                          
## 2       192     130.65  2    22.99 1.018e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note the very large estimate with large standard error for the second component of the sex*nyha interaction.

Can also (if lazy) use sequential ANOVA

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: mi
## 
## Terms added sequentially (first to last)
## 
## 
##          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                       199     191.56              
## age       1  12.6537       198     178.90 0.0003748 ***
## beta      1   3.6754       197     175.23 0.0552219 .  
## sex       1   6.2093       196     169.02 0.0127082 *  
## nyha      2  15.3782       194     153.64 0.0004578 ***
## sex:nyha  2  22.9897       192     130.65 1.018e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpreting the interaction

When trying to explain a complicated model, fitted values are always helpful.

Prediction probabilities

I’ve chosen a new set of typical \(x\)-values to examine the interaction. How? - Age 60 is close to average age in sample
- Most patients were on beta-blockers

sex age beta nyha
M 60 Yes II
M 60 Yes III
M 60 Yes IV
F 60 Yes II
F 60 Yes III
F 60 Yes IV

Predicted values derived from \(\hat{\eta} = \bf{X_{new}}\hat{\beta}\)

preds <- predict(lfit2,newdata=MIpred,type="link",se.fit=TRUE)
predMat <- round(
  1/(1+(exp(-1 *cbind(preds$fit,sweep(outer(1.96*preds$se.fit,c(-1,1),"*"),1,preds$fit,"+"))))),2)
dimnames(predMat)[[2]] <- c("p.hat","2.5%","97.5%") 
kableN(cbind(MIpred,predMat))
sex age beta nyha p.hat 2.5% 97.5%
M 60 Yes II 0.01 0.00 0.05
M 60 Yes III 0.05 0.02 0.11
M 60 Yes IV 0.57 0.32 0.79
F 60 Yes II 0.61 0.14 0.94
F 60 Yes III 0.12 0.02 0.48
F 60 Yes IV 0.00 0.00 1.00

Applying bias reduced logistic regression

  • Last coefficient of the “interaction” model large, large standard error
  • Result of small sample sizes in cross tabulation of \(sex\), \(nyha\)
II III IV
M 48 111 24
F 5 7 5
  • Indicates we have unstable estimates,
  • Asymptotic likelihood base procedures may not be mis-behaving

NB -

  • Possible solution is to “regularize” estimates
  • Shrink estimates towards zero
    • possible approaches, penalized likelihood, pseudo-Bayesian priors,
  • More classical approach - introducing correction to remove \(O(1/n)\) bias in estimates
    • Firth’s method, available in packages brglm, brglm2
## 
## Call:
## brglm(formula = mi ~ age + beta + sex * nyha, family = binomial, 
##     data = MI)
## 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   6.37229    1.84742   3.449 0.000562 ***
## age          -0.15944    0.03427  -4.652 3.29e-06 ***
## betaYes      -1.12446    0.51042  -2.203 0.027595 *  
## sexF          4.54841    1.29510   3.512 0.000445 ***
## nyhaIII       1.47037    0.70376   2.089 0.036680 *  
## nyhaIV        4.57496    1.00519   4.551 5.33e-06 ***
## sexF:nyhaIII -3.62777    1.60837  -2.256 0.024098 *  
## sexF:nyhaIV  -7.20278    2.14939  -3.351 0.000805 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 171.26  on 199  degrees of freedom
## Residual deviance: 131.75  on 192  degrees of freedom
## Penalized deviance: 119.339 
## AIC:  147.75
## Analysis of Deviance Table
## 
## Model 1: mi ~ age + beta + sex + nyha
## Model 2: mi ~ age + beta + sex * nyha
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       194     153.75                          
## 2       192     131.75  2   22.003 1.668e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Fitted proportions
II III IV
M 0.07 0.15 0.42
F 0.76 0.56 0.08

Analyzing Rare Events

Convergence problems in likelihood estimation

Graphical inspection