Measuring Agreement

Below are three examples to be considered in class Thursday, Feb. 13.

Inter-Observer Comparison

In a study of of inter-observer reliability for hand and wrist X-rays in the assessment of rheumatoid arthritis, 4 clinicians made independent evaluations of overall disease severity on 0 to 100 scale (bigger is better) for each of 42 X-rays. The scores are plotted below by "rater".

\scalebox{.7}{\includegraphics{/home/rollin/teach/UofC/bioI/f01/parts/agree.f2}}

The agreement can also be asssessed by looking at scatter-plots.

\scalebox{.7}{\includegraphics{/home/rollin/teach/UofC/bioI/f01/parts/agree.f1}}

What is the best way to describe the "agreement" between these observers?

Inter-method Comparison

Thermo-dilution Cardiac Output is measured by infusing cold water into the circulation and monitoring the down-stream drop in temperature. The standard protocol for doing so involved using iced water. It was suggested that cold water from the refrigerator would do just as well and be more convenient. 40 patients were assessed by both methods.

\scalebox{.7}{\includegraphics{/home/rollin/teach/UofC/bioI/f01/parts/agree.f4}}

What is the best way to describe the "agreement" between these measurement methods?

Categorical Data

Here is a table which represents the results of assessments of 100 liver-spleen scans as read by two physicians:

           |         phys2
     phys1 |  Abnormal     Normal |     Total
-----------+----------------------+----------
  Abnormal |        21         14 |        35 
    Normal |         8         57 |        65 
-----------+----------------------+----------
     Total |        29         71 |       100

How can we describe the agreement between the two physician?