One well known difficulty in the interpretation aggregate data is the
the issue of ecological inflation of association. To provide a simple
example, below I present regression analyses relating heparin sulfate
levels to weight (untransformed), based on the data from Chapter 2.
As the plots and the
values show, the results based on the
aggregate data are quite misleading.
. regress hsulf weight
Source | SS df MS Number of obs = 148
-------------+------------------------------ F( 1, 146) = 26.15
Model | 1.75793155 1 1.75793155 Prob > F = 0.0000
Residual | 9.81634188 146 .067235218 R-squared = 0.1519
-------------+------------------------------ Adj R-squared = 0.1461
Total | 11.5742734 147 .078736554 Root MSE = .2593
. regress avhep avwt
Source | SS df MS Number of obs = 6
-------------+------------------------------ F( 1, 4) = 21.74
Model | .063427068 1 .063427068 Prob > F = 0.0096
Residual | .011669653 4 .002917413 R-squared = 0.8446
-------------+------------------------------ Adj R-squared = 0.8058
Total | .075096721 5 .015019344 Root MSE = .05401