There are multiple ways in which 2 × 2 tables arise in clinical research. Different facets of 2 × 2 tables can be identified which require appropriate statistical analysis and interpretation. This paper presents a brief overview of such tables.

In statistics, 2 × 2 tables are generally obtained by cross-classifying data from two binary variables; one variable will represent the rows of the table and the other the columns. For example, if gender (male, female) and smoking (no, yes) are being recorded for n subjects, data will best be summarized by a 2 × 2 table displaying gender against smoking. Numbers in cells of the table are counts, not measurements. Thus, in the example, the cells will contain the number of male smokers and nonsmokers and the number of female smokers and nonsmokers, respectively. Contingency (or count) 2 × 2 tables are among the most basic concepts taught in any elementary course in statistics, along with the mean, the standard deviation, and the correlation [

For clarity, we denote

General Representation of a 2 × 2 Contingency Table.

Variable |
Variable |
Total | |
---|---|---|---|

0 | 1 | ||

0 | |||

1 | |||

Total |

From a statistical sampling standpoint, there are only three ways to establish a 2 × 2 contingency table: (i) the row margins (

This is the most familiar case. Smoking (No/Yes) was assessed in a sample of 1,262 high school boys and in a separate sample of 1,132 high school girls of the province of Luxembourg (data not published). Data are displayed in Table

Smoking in High School Boys and Girls in the Province of Luxembourg (Belgium).

Smoking | Gender |
Total | |
---|---|---|---|

Boys | Girls | ||

No | 873 | 730 | 1,633 |

Yes | 389 | 372 | 761 |

Total | 1,262 | 1,132 | 2,394 |

To test the null hypothesis of equal proportions of smokers in high school boys and girls, we use the “homogeneity test” by calculating the renowned chi-squared test on 1 degree of freedom (

Thus:

Since the associated

This case is often confounded with the homogeneity test above. Postoperative nausea (No/Yes) and vomiting (No/Yes) were recorded in 671 surgical patients [

Postoperative Nausea and Vomiting in 671 Surgical Patients.

Nausea | Vomiting |
Total | |
---|---|---|---|

No | Yes | ||

No | 532 | 13 | 545 |

Yes | 73 | 53 | 126 |

Total | 605 | 66 | 671 |

Applying the formula above (see Case 1), we get:

The large chi-squared value evidenced a highly significant association between postoperative nausea and vomiting in surgical patients (

In contrast to the homogeneity test, the McNemar test [

Walking Distance before and after Surgery of 156 Patients Suffering from Degenerative Lumbar Stenosis with Neurogenic Intermittent Claudication.

Before Surgery | After Surgery |
Total | |
---|---|---|---|

≤500 m | >500 m | ||

≤500 m | 56 | 37 | 93 |

>500 m | 20 | 43 | 63 |

Total | 76 | 80 | 156 |

The proportion of patients who walked more than 500 m before surgery was 63/156 (40.4%), while the proportion after surgery was 80/156 (51.3%). Are these two proportions significantly different? The homogeneity test cannot be used because the two proportions were obtained on the same 156 patients; they are correlated. The null hypothesis of equal proportions is tested by the McNemar chi-squared test on 1 degree of freedom:

Using data in Table

This shows a significant difference between the two proportions. In other terms, the surgical treatment did improve the walking distance of patients.

The degree of agreement between two raters or methods can best be measured by the Cohen kappa (κ) coefficient [

Diagnosis of 187 Suspected Tumors by 2D Mammography and 3D Tomosynthesis.

Mammography | Tomosynthesis |
Total | |
---|---|---|---|

Benign | Malignant | ||

Benign | 54 | 68 | 122 |

Malignant | 14 | 51 | 65 |

Total | 68 | 119 | 187 |

One may think here of the McNemar test as in Case 3; indeed, the proportion of malignancy was 65/187 (34.8%) for mammography and 119/187 (63.6%) for tomosynthesis, and the chi-squared test was equal to

Let _{o} = (_{o} = (54 + 51)/187 = 0.561. Next, compute the expected proportion of agreements due to chance only (as if the two raters were to decide randomly and independently of each other). Denote by _{e} = [(^{2} this proportion. In our example, we have. _{e} = [122 × 68 + 65 × 119]/(187)^{2} = 0.458. Then, Cohen kappa coefficient writes:

The closer κ is to 1, the better the agreement between the two raters. The value of 0.19 is quite low, indicating poor agreement between the two diagnostic methods, hence confirming the highly significant McNemar test.

In medical practice, assessing the diagnostic (prognostic) ability of a clinical (biological, radiological) test is often required [

Diagnostic Ability of Folin-Wu Test for Diabetes.

Folin-Wu Test | Diabetes |
Total | |
---|---|---|---|

Absent | Present | ||

Negative | 461 | 14 | 475 |

Positive | 49 | 56 | 105 |

Total | 510 | 70 | 580 |

As in Case 1, we could compute the proportions of positive tests in healthy and diabetic subjects and compare them by a chi-square test, but this is clearly not the purpose here. Instead, we shall investigate how the laboratory test performs in diseased and nondiseased subjects.

We would expect the clinical test to be mostly negative in healthy individuals. This can be measured by the specificity of the test

The positive predictive value (

For the Folin-Wu study, assuming a prevalence of diabetes in the population of 6% (

In other terms, when the Folin-Wu colorimetric test is positive, the subject has a 34.7% chance of having diabetes, which is substantially higher than the expected 6% before the test was performed. Similarly, the negative predictive value (

For the Folin-Wu data, we have:

Thus, when the Folin-Wu test is negative, diabetes can almost surely be excluded.

Returning to the diagnosis of suspected tumors by 2D mammography and 3D tomosynthesis (readings by a senior radiologist), the 156 tumors were also analyzed by a pathologist (gold standard). It turned out that the specificity and sensitivity were equal to 78% and 36% (

One of the main objectives of epidemiological studies is to assess the association between a risk factor and a disease by means of 2 × 2 tables. This gives rise to the renowned notions of relative risk (

As an example, consider the retrospective study of Hiller and Kahn [

Association between Diabetes and Eye Cataract in Subjects Aged 50–69 Years.

Diabetes | Cataract |
Total | |
---|---|---|---|

Present | Absent | ||

Yes (Exposed) | 55 | 84 | 139 |

No (Nonexposed) | 552 | 1927 | 2,479 |

Total | 607 | 2,011 | 2,618 |

In such studies, the association between the risk factor (

Data in Table

The odds ratio is significantly different from 1 as confirmed by the 95% confidence interval (95% CI: 1.6–3.3), but also by the chi-squared homogeneity test described in Case 1 (

Odds ratios have become very popular to measure the association between a risk factor and a disease, even in a clinical environment. They are also used in cross-sectional, prospective, and cohort studies, where normally the relative risk (RR) should be preferred. They are easily derived and generalized by (multivariate) logistic regression analysis when it comes to studying the association between several risk factors for a single disease [

Clinicians and researchers are regularly faced with 2 × 2 contingency tables, particularly when analyzing small datasets or large databases containing binary data. Although simple at first glance, their interpretation can sometimes become difficult. We have insisted on the way 2 × 2 tables were established. Were row or column margins fixed or was the grand total fixed? This is particularly important when it comes to calculating percentages; dividing cell numbers by totals must be done with caution. A remarkable example is the calculation of positive predictive values.

Two-by-two tables arise in various situations, as we have seen, and the way to analyze the data should be done cautiously. For instance, when comparing two proportions from distinct groups (Case 1: column margins fixed), it makes no sense to calculate the correlation between the two binary variables. This can only be done when both variables have been observed together (Case 2: grand total fixed). Thus, for the comparison of smoking in male and female teenagers, we cannot conclude the independence between smoking and gender nor calculate a correlation coefficient. By contrast, when fixing the grand total

The distinction between independent proportions (Case 1) and paired proportions (Case 3) is also essential. Applying the homogeneity test where the McNemar test is requested can lead to fallacious conclusions because the proportions to be compared are not the same. As an illustration, the homogeneity test applied to data in Table

We already mentioned the relationship between Cohen kappa coefficient (Case 4, agreement between raters) and the McNemar test (Case 3); in both tables, the grand total was fixed. A significant McNemar test corresponds to a κ coefficient significantly different from 0, but it does not necessarily mean that there is a high degree of agreement between the two raters, particularly when the sample size is large. In relation to the assessment of the diagnostic capacity of a clinical test (Case 5), it should be emphasized again that the

Finally, for measuring the association between a risk factor and a disease (Case 6), we only mentioned the odds ratio, a widely used indicator in epidemiological and clinical studies. In prospective or cohort studies, however, where a sample of subjects exposed to the risk factor and a separate sample of nonexposed subjects are followed up over time and the occurrence of the disease recorded (row margins are fixed rather than column margins),

In conclusion, 2 × 2 tables are common place in the medical literature and one of the first summary statistics taught in any basic textbook. When facing such a table, ask yourself which totals (margins) are fixed (row, column, or grand total); calculate the appropriate percentages; perform the adequate statistical test; and provide the best interpretation of the data.

The author has no competing interests to declare.