20 Multiple Testing

20.1 An example from genetics

We consider a gene expression dataset available from the Gene Expression Omnibus (GEO) database ( https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7621). The corresponding paper was published with PLOS Genetics (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0030098). In the experiment, substantia nigra tissue from postmortem brain of normal patients (9 of them) and Parkinson disease patients (16 of them) were used for RNA extraction and hybridization on Affymetrix microarrays. Both cohorts included males and females. (We only look at the first 50000 locuses for the sake of simplicity. Some the remaining ones have missing values.)

dat0 = read.table("data/GSE7621_series_matrix.txt", header = TRUE)
m = 50000
dat = as.matrix(dat0[1:m, -1]) # remove the ID
dat = matrix(as.numeric(unlist(dat)), nrow = m)

We perform a Student–Welch test for each gene/locus.

pval = numeric(m)
for (i in 1:m) {
  pval[i] = t.test(dat[i, 1:9], dat[i, -(1:9)])$p.value
}

20.1.1 Adjusted p-values

Here are the corrected p-values:

pval.bon = p.adjust(pval, "bon") # Bonferroni
pval.holm = p.adjust(pval, "holm") # Holm
pval.hoch = p.adjust(pval, "hoch") # Hochberg
pval.bh = p.adjust(pval, "BH") # Benjamini-- Hochberg
pval.by = p.adjust(pval, "BY") # Benjamini--Yekutieli

20.1.2 Rejections

# rejections at the 10% level without adjustement for multiple testing.
reject = (pval <= 0.10)
R = sum(reject) # total number of rejections

# rejections at the 10% FWER level (the last one does not guaranty control since assumption of independence is dubious)
reject.bon = (pval.bon <= 0.10)
R.bon = sum(reject.bon) 
reject.holm = (pval.holm <= 0.10)
R.holm = sum(reject.holm) 
reject.hoch = (pval.hoch <= 0.10)
R.hoch = sum(reject.hoch) 

# rejections at the 10% FDR level (the first one does not guaranty control since assumption of independence is dubious)
reject.bh = (pval.bh <= 0.10)
R.bh = sum(reject.bh) # many more rejections
reject.by = (pval.by <= 0.10)
R.by = sum(reject.by)

20.2 An example from meta-analysis

We use the dataset described in Table 3 of “Controlled low protein diets in chronic renal insufficiency” published in the BMJ in 1992 (https://www.bmj.com/content/304/6821/216.short).

dat = array(c(95, 110, 15, 8, 2, 5, 7, 5, 194, 209, 32, 21, 8, 14, 17, 11, 25, 30, 13, 4, 21, 21, 11, 12), dim = c(2, 2, 6), dimnames = list(diet = c("control", "treatment"), outcome = c("survived", "died"), study = 1:6))

20.2.1 Cochran–Mantel–Haenszel test

mantelhaen.test(dat) # p-value relies asymptotic theory


    Mantel-Haenszel chi-squared test with continuity correction

data:  dat
Mantel-Haenszel X-squared = 10.407, df = 1, p-value = 0.001255
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.3619622 0.7727168
sample estimates:
common odds ratio 
        0.5288613

20.2.2 Combination tests

We can also take a multiple testing stance, applying a test to each contingency table, and then apply a combination test to the p-values. Here we use the Fisher and Simes combination tests (coded by hand, although they are available in some packages).

out = apply(dat, 3, fisher.test)
pval = lapply(out, "[[", 1)
pval = unlist(pval, use.names = FALSE)

fisher.combine = function(p) {
  stat = -2*sum(log(p))
  df = 2*length(p)
  return(pchisq(stat, df, lower.tail = FALSE))
}
fisher.combine(pval) # some significance

[1] 0.04301827

simes.combine = function(p) {
  m = length(p)
  stat = m * min(sort(p)/(1:m))
  return(punif(stat, lower.tail = FALSE))
}
simes.combine(pval) # no significance

[1] 0.8229667