Contingency tables

Data from a semiconductor factory (Faraway p.)

  • sample of chip wafers classified as - whether a particle was found on the die that produced the wafer - whether the wafer was good (satisfactory) or bad (defective)

Representations in R

## Loading required package: kableExtra
Matrix Layout
None Some
Good 320 14
Bad 80 36
Binomial Layout
Good Bad particles
320 80 None
14 36 Some
Frequency Layout
count quality particles
320 Good None
80 Bad None
14 Good Some
36 Bad Some

Types of Analysis

Classical

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  cts
## X-squared = 60.124, df = 1, p-value = 8.907e-15
## 
##  Fisher's Exact Test for Count Data
## 
## data:  cts
## p-value = 2.955e-13
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   5.090628 21.544071
## sample estimates:
## odds ratio 
##   10.21331

Logistic regression

## 
## Call:
## glm(formula = cbind(Good, Bad) ~ particles, family = binomial, 
##     data = binomial.data)
## 
## Deviance Residuals: 
## [1]  0  0
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     1.3863     0.1250  11.090  < 2e-16 ***
## particlesSome  -2.3308     0.3389  -6.878 6.07e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 54.03  on 1  degrees of freedom
## Residual deviance:  0.00  on 0  degrees of freedom
## AIC: 14.161
## 
## Number of Fisher Scoring iterations: 3

Log-linear representation

llfit <- glm(count ~ quality * particles, family=poisson, data=count.data)
summary(llfit)
## 
## Call:
## glm(formula = count ~ quality * particles, family = poisson, 
##     data = count.data)
## 
## Deviance Residuals: 
## [1]  0  0  0  0
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                5.7683     0.0559 103.187  < 2e-16 ***
## qualityBad                -1.3863     0.1250 -11.090  < 2e-16 ***
## particlesSome             -3.1293     0.2730 -11.461  < 2e-16 ***
## qualityBad:particlesSome   2.3308     0.3389   6.878 6.07e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 4.7410e+02  on 3  degrees of freedom
## Residual deviance: 4.4409e-14  on 0  degrees of freedom
## AIC: 31.744
## 
## Number of Fisher Scoring iterations: 3
llfit <- glm(count ~ quality + particles, family=poisson, data=count.data)
summary(llfit)
## 
## Call:
## glm(formula = count ~ quality + particles, family = poisson, 
##     data = count.data)
## 
## Deviance Residuals: 
##      1       2       3       4  
##  1.324  -2.370  -4.350   5.266  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     5.6934     0.0572  99.535   <2e-16 ***
## qualityBad     -1.0575     0.1078  -9.813   <2e-16 ***
## particlesSome  -2.0794     0.1500 -13.863   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 474.10  on 3  degrees of freedom
## Residual deviance:  54.03  on 1  degrees of freedom
## AIC: 83.774
## 
## Number of Fisher Scoring iterations: 5
cat("Deviance based chi-squared (1 d.f.) test: P=", 1-pchisq(llfit$deviance,1))
## Deviance based chi-squared (1 d.f.) test: P= 1.973977e-13

Multinomial data (two variables)

Snee (1974) presents data on 592 students cross-classified by hair and eye color.

##    count   eye  hair
## 1      5 green BLACK
## 2     29 green BROWN
## 3     14 green   RED
## 16     7 brown BLOND

Cross-tabulation; Row percents

##        eye
## hair    green      hazel      blue       brown     
##   BLACK   5( 4.6%)  15(13.9%)  20(18.5%)  68(63.0%)
##   BROWN  29(10.1%)  54(18.9%)  84(29.4%) 119(41.6%)
##   RED    14(19.7%)  14(19.7%)  17(23.9%)  26(36.6%)
##   BLOND  16(12.6%)  10( 7.9%)  94(74.0%)   7( 5.5%)

Classical tests

## 
##  Pearson's Chi-squared test
## 
## data:  ct
## X-squared = 138.29, df = 9, p-value < 2.2e-16

Fisher’s exact test based on multivariate hypergeometric is theoretically possible

Test for independence based on log-linear model

## 
## Call:  glm(formula = count ~ hair + eye, family = poisson, data = haireye)
## 
## Coefficients:
## (Intercept)    hairBROWN      hairRED    hairBLOND     eyehazel  
##      2.4575       0.9739      -0.4195       0.1621       0.3737  
##     eyeblue     eyebrown  
##      1.2118       1.2347  
## 
## Degrees of Freedom: 15 Total (i.e. Null);  9 Residual
## Null Deviance:       453.3 
## Residual Deviance: 146.4     AIC: 241

The residual deviance 146.4 on 9 degrees of freedom provides a test of independence between hair and eye colour, \(Pr (\chi^2_{(9)} \ge 146.4)\)= 0

Test for independence based on multinomial logit model

## # weights:  20 (12 variable)
## initial  value 820.686262 
## iter  10 value 681.465249
## iter  20 value 676.828633
## iter  20 value 676.828632
## iter  20 value 676.828632
## final  value 676.828632 
## converged
## # weights:  8 (3 variable)
## initial  value 820.686262 
## final  value 750.050421 
## converged
## Likelihood ratio tests of Multinomial Models
## 
## Response: cbind(green, hazel, blue, brown)
##   Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
## 1     1         9   1500.101                              
## 2  hair         0   1353.657 1 vs 2     9 146.4436       0

Examining the margins

In Stuart (1955), data on the vision of a sample of women is presented. The left and right eye performance is graded into four categories:

##         left
## right    best second third worst
##   best   1520    266   124    66
##   second  234   1512   432    78
##   third   117    362  1772   205
##   worst    36     82   179   492
  • Large counts on the diagonal of cross-tabulation
  • Suggests strong association left and right eyes
chisq.test(ct)
## 
##  Pearson's Chi-squared test
## 
## data:  ct
## X-squared = 8096.9, df = 9, p-value < 2.2e-16
llfit  <- glm(count ~ right+left, family=poisson, eyegrade)
xtabs(llfit$residuals ~ right + left, eyegrade)
##         left
## right          best     second      third      worst
##   best    2.0160139 -0.5470210 -0.8128423 -0.7030465
##   second -0.5933194  1.2552569 -0.4288927 -0.6926120
##   third  -0.8132184 -0.5040207  1.1518322 -0.2579095
##   worst  -0.8211034 -0.6502802 -0.3233734  4.5439523
  • \(\chi^2\) test rejects independence
  • cross-tabulation of residuals confirms structure of dependence

Real question of interest: is there a systematic difference between left and right eyes.

  • suggests we examine marginal tables
## Right eyes
##      best        second      third       worst      
## [1,] 1976(26.4%) 2256(30.2%) 2456(32.8%)  789(10.6%)
## Left eyes
##      best        second      third       worst      
## [1,] 1907(25.5%) 2222(29.7%) 2507(33.5%)  841(11.2%)

Appear similar, but can apply formal test

  • but hypothesis of marginal homogeneity can’t be represented in log-linear parameterization
  • “ad hoc” Stuart-Maxwell test
library(nonpar)
#stuart.maxwell(ct)
cat("Error in stuart.maxwell(ct) : Data must be a 3x3 matrix\n")
## Error in stuart.maxwell(ct) : Data must be a 3x3 matrix
stewart.test(ct)
## $chisq
## [1] 11.95657
## 
## $df
## [1] 3
## 
## $p.value
## [1] 0.007533425

Other ad hoc tests can be derived from use of contrasts - Are right eyes better or worse than left eyes

rightBetter <- with(eyegrade,sum(count[as.integer(right)>as.integer(left)]))
leftBetter <- with(eyegrade,sum(count[as.integer(right)<as.integer(left)]))

We observer that 1010 girls have right eye better than left eye and 1171 vice versa.

ctr.coefs <- with(eyegrade, sign(as.integer(right) - as.integer(left)))
xtabs(ctr.coefs ~ right + left, eyegrade)
##         left
## right    best second third worst
##   best      0     -1    -1    -1
##   second    1      0    -1    -1
##   third     1      1     0    -1
##   worst     1      1     1     0
cntr <- sum(ctr.coefs*eyegrade$count)
var.cntr <- t(matrix(ctr.coefs))%*%diag(eyegrade$count)%*%ctr.coefs
cat("P-value=", 1-pchisq(cntr^2/var.cntr,1))
## P-value= 0.000565904