15 Multiple Proportions

15.1 One-sample goodness-of-fit testing

We look at the Mega Millions lottery in the United States. In the current version of the game, each draw consists of five balls sampled without replacement from an urn with white balls numbered \(1, \dots, 70\), and one ball (the “MegaBall”) sampled from an urn with gold-colored balls numbered \(1, \dots, 25\). Here are the draws for the year 2018.

We focus on the MegaBall draws and ask whether they are consistent with being independently uniformly distributed in \(\{1, \dots, 25\}\) (as they are supposed to be).

The main function that R provides for one-sample goodness-of-fit testing is the following, which computes the Pearson test, obtaining the p-value using the asymptotic null distribution (default) or Monte Carlo simulations. In the former case, the test statistic is “corrected” to make the approximation more accurate, and a warning is printed if the expected counts under the null are deemed too small for the approximation to be valid (as is the case with this dataset). If left unspecified, the null distribution is taken to be the uniform distribution on sets of values appearing at least once in the sample (hence the importance of making sure all possible values appear at least once, otherwise use the function tabulate).

    Chi-squared test for given probabilities

data:  counts
X-squared = 16.673, df = 24, p-value = 0.8623

    Chi-squared test for given probabilities with simulated p-value
    (based on 10000 replicates)

data:  counts
X-squared = 16.673, df = NA, p-value = 0.8794

15.2 Association studies

Consider the paper “Outcomes of Ethnic Minority Groups with Node-Positive, Non-Metastatic Breast Cancer in Two Tertiary Referral Centers in Sydney, Australia” published in PLOS ONE in 2014 (https://doi.org/10.1371/journal.pone.0095852). The primary purpose of this (retrospective) study was to examine in what respect Asian patients differed from Western patients referred to two tertiary cancer centers in South Western Sydney. We look at the age distribution.

    Western Asian
<50     154    79
>50     293    46

This 2x2 contingency table can be visualized with a segmented bar plot.

It’s pretty clear that Asian patients tend to be younger (at least based on the available age information), but let’s perform a test. In principle, this calls for a test of independence. It can be performed as follows.

    Pearson's Chi-squared test with Yates' continuity correction

data:  tab
X-squared = 32.261, df = 1, p-value = 1.348e-08

Conditioning on the margins leads to a permutation test.

    Pearson's Chi-squared test with simulated p-value (based on 10000

data:  tab
X-squared = 33.441, df = NA, p-value = 9.999e-05

For a 2x2 contingency table, as in the present case, we can in fact compute the p-value exactly. This is Fisher’s exact test. (The two p-values differ because the number of permutations above \(B = 10000\) is not large enough to properly estimate the actual permutation p-value.)

    Fisher's Exact Test for Count Data

data:  tab
p-value = 1.061e-08
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.1978184 0.4713373
sample estimates:
odds ratio 

15.3 Matched-pair experiment

The following data comes from the paper “Evaluating diagnostic tests for bovine tuberculosis in the southern part of Germany: A latent class analysis” published in PLOS ONE in 2017 (https://doi.org/10.1371/journal.pone.0179847). The goal is to compare a new test for bovine tuberculosis, Bovigam, with the reliable but time-consuming single intra-dermal cervical tuberculin (SICT) test. For this, \(n = 175\) animals were tested with both, resulting in the following. (These are the numbers relating to the so-called “standard interpretation” way of administering the tests.)

        Bovigam[+] Bovigam[-]
SICT[+]         46          2
SICT[-]        119          8

It’s quite clear that the two tests, SICT and Bogigam, yield very different results (with Bovigam yielding many more positive results). This is confirmed by the McNemar test.

In its basic implementation, the p-value relies on the asymptotic null distribution based on a “corrected” value of the test statistic.

    McNemar's Chi-squared test with continuity correction

data:  tab
McNemar's chi-squared = 111.21, df = 1, p-value < 2.2e-16

An exact version of the test can be performed as follows.

    Exact McNemar test (with central confidence intervals)

data:  tab
b = 2, c = 119, p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.002012076 0.062059175
sample estimates:
odds ratio