# 19 Correlation Analysis

We work with the famous dataset that Pearson collected on the heights of men and their sons. An appropriate plot is the scatterplot.

require(UsingR)
plot(father.son, pch = 16, xlab = "father's height", ylab = "son's height", asp = 1)

attach(father.son)

## 19.1 Sample correlations

We compute various types of correlations between the son and father heights.

cor(sheight, fheight, method = "pearson")
[1] 0.5013383
cor(sheight, fheight, method = "spearman")
[1] 0.5058485
cor(sheight, fheight, method = "kendall")
[1] 0.3492753

## 19.2 Correlations tests

Although it is pretty clear from the scatterplot that the father and son heights are positively correlated (or more generally, monotonically associated), for pedagodical reasons, we perform the corresponding tests. (Look at the manual for details on how the p-values are computed. In the present case, the first and third are approximated based on asymptotic theory, while the second is based on exact calculations but the p-value is only approximated because of ties.)

cor.test(sheight, fheight, method = "pearson")

Pearson's product-moment correlation

data:  sheight and fheight
t = 19.006, df = 1076, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4552586 0.5447396
sample estimates:
cor
0.5013383 
cor.test(sheight, fheight, method = "spearman") 

Spearman's rank correlation rho

data:  sheight and fheight
S = 103172697, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.5058485 
cor.test(sheight, fheight, method = "kendall")

Kendall's rank correlation tau

data:  sheight and fheight
z = 17.174, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.3492753 

## 19.3 Distance covariance (and test)

We also apply the distance covariance test. (The function returns the Monte Carlo permutation p-value based on R replicates.)

require(energy)
dcov.test(sheight, fheight, R = 1e3)

dCov independence test (permutation test)

data:  index 1, replicates 1000
nV^2 = 742.04, p-value = 0.000999
sample estimates:
dCov
0.8296702