Comparing Risk using 2x2 Tables

Example 1: A randomized clinical trial (RCT) was conducted to compare the efficacy of subcutaneous heparin versus Warfarin in preventing new episodes of deep vein thrombosis (DVT) in patients admitted to hospital for treatment of an initial episode. Patients were followed for 90 days following initial therapy. Of 101 patients assigned to Warfarin, 22 experienced new episodes compared to 10 episodes in 98 patients assigned to heparin.

	DVT	noDVT	Total
Warfarin	a=22	b=79	a+b=101
Heparin	c=10	d=88	c+d=98
Total	a+c=32	b+d=167	n=199

Estimating a proportion

Y = number of “successes“ in sample size n independent observations \[ Y = \sum_{i= 1}^{n} y_{ i } \]
\(\hat{p} = \frac{ Y }{ n }\) estimate of true proportion, \(p\), in population

\[\mathrm {S.E} ( \hat { p } ) = \sqrt { \frac { \hat { p } ( 1 - \hat { p } ) } { n } }\]

Based on normal approximation to binomial distribution, large sample confidence interval:

Small sample confidence interval?
Significance tests?

Comparing two proportions

Let \(p_1\) and \(p_2\) represent the risk of DVT for subjects receiving warfarin and heparin respectively.

Two independent estimates \(\hat{p_1}\) and \(\hat {p_2}\), based on samples sizes \(n_1\) and \(n_2\), with corresponding population proportions \(p_1\) and \(p_2\) from different populations.

Estimated difference (“risk difference”) = \(\hat { p } _ { 1 } - \hat { p } _ { 2 }\)
Standard error: \(S . E . = \left( \hat { p } _ { 1 } - \hat { p } _ { 2 } \right) = \sqrt { \frac { \hat { p } _ { 1 } \left( 1 - \hat { p } _ { 1 } \right) } { n _ { 1 } } + \frac { \hat { p } _ { 2 } \left( 1 - \hat { p } _ { 2 } \right) } { n _ { 2 } } }\)
Confidence interval?

Testing: Large sample Z and Chi-square tests

Small sample (exact) test?
Inference for paired binary data?

The Relative Risk - \(RR\)

\[RR = \frac{p_1 }{ p_2}\]

most relevant for cohort type studies (e.g. RCT)
\(RR = 1\) implies equal risks (i.e. \(p_1 = p_2\))

Inference for \(RR\)

\[\hat{RR} = \frac{\hat{p_1}}{\hat{p_2}}\] where \(\hat{p_1} = \frac{a }{ a + b}\) and \(\hat{p_2} = \frac{c }{ c + d}\).

\[SE( ln \hat{RR} ) = \sqrt{ \frac{1}{a} - \frac{ 1}{a + b } + \frac{1}{c} - \frac{ 1 }{ c + d }}\]

95% CI (large sample): \(\hat{RR}e^{\pm1.96 SE( ln \hat{RR} ) }\)
Test of \(RR=1\): the usual \({\chi}^2\) test on 1 d.f (or Fisher’s test in small samples)

Output from epi.2by2, cohort.count method

##              Outcome +    Outcome -      Total        Inc risk *
## Exposed +           22           79        101              21.8
## Exposed -           10           88         98              10.2
## Total               32          167        199              16.1
##                  Odds
## Exposed +       0.278
## Exposed -       0.114
## Total           0.192
## 
## Point estimates and 95 % CIs:
## -------------------------------------------------------------------
## Inc risk ratio                               2.13 (1.07, 4.27)
## Odds ratio                                   2.45 (1.09, 5.49)
## Attrib risk *                                11.58 (1.54, 21.61)
## Attrib risk in population *                  5.88 (-2.00, 13.75)
## Attrib fraction in exposed (%)               53.15 (6.24, 76.60)
## Attrib fraction in population (%)            36.54 (-2.52, 60.72)
## -------------------------------------------------------------------
##  X2 test statistic: 4.941 p-value: 0.026
##  Wald confidence limits
##  * Outcomes per 100 population units

The Odds Ratio

Example 2: In a landmark study published in 1950 by Richard Doll and Austin Bradford Hill in the U.K. 60 female lung cancer patients were compared with 60 similarly aged healthy patients. 41 of the cancer patients were established smokers compared to 28 in the control group.

The odds relative to a probability \(p\) = \(\frac{p }{ 1-p}\).
The odds ratio for comparing two probabilities \(p_1\) and \(p_2\) is \(OR = \frac{ p_1 / ( 1 - p_1 ) }{ p_2 / ( 2 - p_2 ) }\)

Motivation?

All-purpose measure of association which applies to all designs, esp. case control, cross-sectional
OR=1 implies no association, \(p_1 = p_2\).
Approximation to RR under Rare Disease Assumption
Mathematically “elegant”: easy to work with

Inference for \(OR\) \[\hat{OR} = \frac{ \hat{p_1} / ( 1 - \hat{p_1} ) }{ \hat{p_2} / ( 2 - \hat{p_2} ) } = \frac{ad}{bc}\]

\[SE(ln \hat{OR}) = \sqrt{ \frac{1}{a} + \frac{ 1}{ b } + \frac{1}{c} + \frac{ 1 }{ d }}\]

95% CI (large sample): \(\hat{OR}e^{\pm1.96 SE( ln \hat{OR} ) }\)
Test of \(OR=1\): same as for \(RR\)

Output from epi.2by2, case.control method

##              Outcome +    Outcome -      Total        Prevalence *
## Exposed +           41           28         69                59.4
## Exposed -           19           32         51                37.3
## Total               60           60        120                50.0
##                  Odds
## Exposed +       1.464
## Exposed -       0.594
## Total           1.000
## 
## Point estimates and 95 % CIs:
## -------------------------------------------------------------------
## Odds ratio (W)                               2.47 (1.17, 5.19)
## Attrib prevalence *                          22.17 (4.55, 39.78)
## Attrib prevalence in population *            12.75 (-3.26, 28.75)
## Attrib fraction (est) in exposed  (%)        59.14 (9.19, 82.02)
## Attrib fraction (est) in population (%)      40.62 (7.75, 61.79)
## -------------------------------------------------------------------
##  X2 test statistic: 5.763 p-value: 0.016
##  Wald confidence limits
##  * Outcomes per 100 population units

Testing Association using the Chi-square test

Can express null hypothesis of association three ways

\(p_1 − p_2=0\) or \(RR=1\) or \(OR=1\)
Can test using Fisher’s exact test or Normal Theory test (Large sample)

\[X ^ { 2 } = \sum _ { i = 1,2 } \sum _ { j = 1,2 } \frac { \left( O _ { i j } - E _ { i j } \right) ^ { 2 } } { E _ { i j } }\]

\(O_{ij}\) = observed frequency in row i column j
\(E_{ij}\) = expected count under hypothesis of no association
exactly equal to square of \(Z\) for testing \(p_1 = p_2\) with SE adapted to Null Hypothesis
Yate’s corrected chi-square = modification of above to work better for small samples

\[X^ { 2 } = \frac { \sum ( | O - E | - .5 ) ^ { 2 } } { E }\]