date: “2019-01-02” output: html_document: toc: true toc_depth: 3 toc_float: true fig_caption: false —

Assignment 4 “Model” Analyses

As always, there are multiple analysis approaches that can be defended.
I put these forward as possiblities.

Q1

The toenail data comes from a multicenter study comparing two oral treatments for toenail infection. Patients were evaluated for the degree of separation of the nail. Patients were randomized into two treatments and were followed over seven visits: four in the first year and yearly thereafter. The patients have not been treated prior to the first visit so this should be regarded as the baseline.

Analyze the data to determine the difference between the two treatments and the progression of the infection over time.

Grading:

Model construction: 2
Examination of treatment differences: 2
Examination of progression over time: 2

One can consider applying either a mixed model or GEE approaches. However, since the data is serial in structure, I prefer to follow a GEE approach which allows for serial correlation, either by explicit use of a serial correlation structure like AR(1), or through the use of the robust standard errors.

As noted in my comments, the observation taking at “month=0” needs to be incorporated as a baseline measure. This can be done by including the baseline value as a covariate or by constraining the equality of linear prediction values for the two treatments at month 0. The latter is especially advantageous in serial correlation models.

The first step is to select a model upon which to base our conclusions. It’s my belief that model selection should be done using AIC or BIC, rather than p-values. AIC and BIC values are not readily obtained from GEE models, since there is no underlying likelihood function to base these on. Instead we can use QIC (Quasilikelihood Information Criterion).

data(toenail)
toenail$ID <- factor(toenail$ID)
### set up constrained treatment indicator
toenail$trtInd <- with(toenail, treatment * I(month > 0))
qics <- NULL
for (kni in 1:4) {
gfit <- geeglm(outcome ~ trtInd *ns(month,kni), family=binomial , id = ID, data=toenail, corstr="ar1")
qics <- c(qics,QIC(gfit))
}
names(qics) <- paste("#Knots =",1:4)
qics

## #Knots = 1 #Knots = 2 #Knots = 3 #Knots = 4 
##   1831.999   1826.265   1825.926   1826.557

As there’s no practical difference between having 2 and 3 knots, we’ll use 2, as below. Now we’ll examine the model with interaction, without interaction, and then without treatment effect.

gfit <- geeglm(outcome ~ trtInd * ns(month,2), family=binomial , id = ID, data=toenail, corstr="ar1")
gfitR <- geeglm(outcome ~ trtInd  + ns(month,2), family=binomial , id = ID, data=toenail, corstr="ar1")
gfitR2 <- geeglm(outcome ~ ns(month,2), family=binomial , id = ID, data=toenail, corstr="ar1")
gfits <- list(gfit,gfitR,gfitR2)
QIC(gfit,gfitR,gfitR2)

##             QIC
## gfit   1826.265
## gfitR  1825.539
## gfitR2 1825.167

The above indicates that the treatment term is not helpful, i.e. theres no clear evidence for an interaction or treatment effect. Thus we can set that aside and plot an overall curve for prevalence of nail separation. Unfortunately, there doesn’t seem to be any way to get predictions from geeglm objects, except from using fitted values, so the below is somewhat kludgy (look this technical term up if you need to).

kp <- with(toenail, !duplicated(round(month,1)))
toe2 <- toenail[kp,]
reord <- with(toe2,order(month))
fits <- fitted(gfitR2)[kp][reord]
toe2 <- toe2[reord,]
plot(toe2$month,fits,type="l",xlab="Month",ylab="Proportion",main="Prevalence of Nail Separation")

Q2

The dataset melanoma gives data on a sample of patients suffering from melanoma (skin cancer) cross-classified by the type of cancer and the location on the body. Determine whether the type and location are independent. Examine the residuals to determine whether any dependence can be ascribed to particular type/location combinations.

Grading:

Model and test for indepenence: 2
Residual examination and description of depenendence pattern: 2

First we fit the independence model for the cross-tabulation of tumor and site using the log-linear modeling approach based on the Poisson distribution

data(melanoma)
llfit <- glm(count ~  tumor + site, family=poisson,data=melanoma)
summary(llfit)

## 
## Call:
## glm(formula = count ~ tumor + site, family = poisson, data = melanoma)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0453  -1.0741   0.1297   0.5857   5.1354  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          2.9554     0.1770  16.696  < 2e-16 ***
## tumorindeterminate   0.4990     0.2174   2.295   0.0217 *  
## tumornodular         1.3020     0.1934   6.731 1.68e-11 ***
## tumorsuperficial     1.6940     0.1866   9.079  < 2e-16 ***
## sitehead            -1.2010     0.1383  -8.683  < 2e-16 ***
## sitetrunk           -0.7571     0.1177  -6.431 1.27e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 295.203  on 11  degrees of freedom
## Residual deviance:  51.795  on  6  degrees of freedom
## AIC: 122.91
## 
## Number of Fisher Scoring iterations: 5

The residual deviance provides a chi-squared test of the independence assumption, with \(P=Prob{\chi^{2}_{6} > 51.8} \approx\) 0

We can use either Pearson or Deviance residuals to look for lack of fit of the independence model.

melanoma$resid <- residuals(llfit,type="pearson")
xtabs(resid ~ site + tumor , data=melanoma)

##            tumor
## site            freckle indeterminate     nodular superficial
##   extremity -2.10133816   -0.64711750  0.28260796  1.02457545
##   head       6.74663058    0.47967076 -0.48809353 -2.75497815
##   trunk     -2.33536960    0.56070806 -0.02171861  0.71053305

Examination of the Pearson residuals indicates that freckle tumors are somewhat rare on an extremity or trunk, but quite common on the head, and that superficial tumors are less common on the head cojmpared to extremity and trunk.

The following set of row percentages corroborate these results.

with(melanoma,percents(xtabs( count ~  site + tumor),denom=1))

##            tumor
## site        freckle        indeterminate  nodular        superficial   
##   extremity  10/226( 4.4%)  28/226(12.4%)  73/226(32.3%) 115/226(50.9%)
##   head        22/68(32.4%)   11/68(16.2%)   19/68(27.9%)   16/68(23.5%)
##   trunk       2/106( 1.9%)  17/106(16.0%)  33/106(31.1%)  54/106(50.9%)