Example 1: A randomized clinical trial (RCT) was conducted to compare the efficacy of subcutaneous heparin versus Warfarin in preventing new episodes of deep vein thrombosis (DVT) in patients admitted to hospital for treatment of an initial episode. Patients were followed for 90 days following initial therapy. Of 101 patients assigned to Warfarin, 22 experienced new episodes compared to 10 episodes in 98 patients assigned to heparin.
DVT | noDVT | Total | |
---|---|---|---|
Warfarin | a=22 | b=79 | a+b=101 |
Heparin | c=10 | d=88 | c+d=98 |
Total | a+c=32 | b+d=167 | n=199 |
Y = number of “successes“ in sample size n independent observations \[ Y = \sum_{i= 1}^{n} y_{ i } \]
\(\hat{p} = \frac{ Y }{ n }\) estimate of true proportion, \(p\), in population
\[\mathrm {S.E} ( \hat { p } ) = \sqrt { \frac { \hat { p } ( 1 - \hat { p } ) } { n } }\]
Based on normal approximation to binomial distribution, large sample confidence interval:
Small sample confidence interval?
Significance tests?
Let \(p_1\) and \(p_2\) represent the risk of DVT for subjects receiving warfarin and heparin respectively.
Two independent estimates \(\hat{p_1}\) and \(\hat {p_2}\), based on samples sizes \(n_1\) and \(n_2\), with corresponding population proportions \(p_1\) and \(p_2\) from different populations.
Estimated difference (“risk difference”) = \(\hat { p } _ { 1 } - \hat { p } _ { 2 }\)
Standard error: \(S . E . = \left( \hat { p } _ { 1 } - \hat { p } _ { 2 } \right) = \sqrt { \frac { \hat { p } _ { 1 } \left( 1 - \hat { p } _ { 1 } \right) } { n _ { 1 } } + \frac { \hat { p } _ { 2 } \left( 1 - \hat { p } _ { 2 } \right) } { n _ { 2 } } }\)
Confidence interval?
Small sample (exact) test?
Inference for paired binary data?
\[RR = \frac{p_1 }{ p_2}\]
most relevant for cohort type studies (e.g. RCT)
\(RR = 1\) implies equal risks (i.e. \(p_1 = p_2\))
\[\hat{RR} = \frac{\hat{p_1}}{\hat{p_2}}\] where \(\hat{p_1} = \frac{a }{ a + b}\) and \(\hat{p_2} = \frac{c }{ c + d}\).
\[SE( ln \hat{RR} ) = \sqrt{ \frac{1}{a} - \frac{ 1}{a + b } + \frac{1}{c} - \frac{ 1 }{ c + d }}\]
95% CI (large sample): \(\hat{RR}e^{\pm1.96 SE( ln \hat{RR} ) }\)
Test of \(RR=1\): the usual \({\chi}^2\) test on 1 d.f (or Fisher’s test in small samples)
## Outcome + Outcome - Total Inc risk *
## Exposed + 22 79 101 21.8
## Exposed - 10 88 98 10.2
## Total 32 167 199 16.1
## Odds
## Exposed + 0.278
## Exposed - 0.114
## Total 0.192
##
## Point estimates and 95 % CIs:
## -------------------------------------------------------------------
## Inc risk ratio 2.13 (1.07, 4.27)
## Odds ratio 2.45 (1.09, 5.49)
## Attrib risk * 11.58 (1.54, 21.61)
## Attrib risk in population * 5.88 (-2.00, 13.75)
## Attrib fraction in exposed (%) 53.15 (6.24, 76.60)
## Attrib fraction in population (%) 36.54 (-2.52, 60.72)
## -------------------------------------------------------------------
## X2 test statistic: 4.941 p-value: 0.026
## Wald confidence limits
## * Outcomes per 100 population units
Example 2: In a landmark study published in 1950 by Richard Doll and Austin Bradford Hill in the U.K. 60 female lung cancer patients were compared with 60 similarly aged healthy patients. 41 of the cancer patients were established smokers compared to 28 in the control group.
The odds relative to a probability \(p\) = \(\frac{p }{ 1-p}\).
The odds ratio for comparing two probabilities \(p_1\) and \(p_2\) is \(OR = \frac{ p_1 / ( 1 - p_1 ) }{ p_2 / ( 2 - p_2 ) }\)
Motivation?
All-purpose measure of association which applies to all designs, esp. case control, cross-sectional
OR=1 implies no association, \(p_1 = p_2\).
Approximation to RR under Rare Disease Assumption
Mathematically “elegant”: easy to work with
Inference for \(OR\) \[\hat{OR} = \frac{ \hat{p_1} / ( 1 - \hat{p_1} ) }{ \hat{p_2} / ( 2 - \hat{p_2} ) } = \frac{ad}{bc}\]
\[SE(ln \hat{OR}) = \sqrt{ \frac{1}{a} + \frac{ 1}{ b } + \frac{1}{c} + \frac{ 1 }{ d }}\]
95% CI (large sample): \(\hat{OR}e^{\pm1.96 SE( ln \hat{OR} ) }\)
Test of \(OR=1\): same as for \(RR\)
## Outcome + Outcome - Total Prevalence *
## Exposed + 41 28 69 59.4
## Exposed - 19 32 51 37.3
## Total 60 60 120 50.0
## Odds
## Exposed + 1.464
## Exposed - 0.594
## Total 1.000
##
## Point estimates and 95 % CIs:
## -------------------------------------------------------------------
## Odds ratio (W) 2.47 (1.17, 5.19)
## Attrib prevalence * 22.17 (4.55, 39.78)
## Attrib prevalence in population * 12.75 (-3.26, 28.75)
## Attrib fraction (est) in exposed (%) 59.14 (9.19, 82.02)
## Attrib fraction (est) in population (%) 40.62 (7.75, 61.79)
## -------------------------------------------------------------------
## X2 test statistic: 5.763 p-value: 0.016
## Wald confidence limits
## * Outcomes per 100 population units
Can express null hypothesis of association three ways
\(p_1 − p_2=0\) or \(RR=1\) or \(OR=1\)
Can test using Fisher’s exact test or Normal Theory test (Large sample)
\[X ^ { 2 } = \sum _ { i = 1,2 } \sum _ { j = 1,2 } \frac { \left( O _ { i j } - E _ { i j } \right) ^ { 2 } } { E _ { i j } }\]
\(O_{ij}\) = observed frequency in row i column j
\(E_{ij}\) = expected count under hypothesis of no association
exactly equal to square of \(Z\) for testing \(p_1 = p_2\) with SE adapted to Null Hypothesis
Yate’s corrected chi-square = modification of above to work better for small samples
\[X^ { 2 } = \frac { \sum ( | O - E | - .5 ) ^ { 2 } } { E }\]