10 Sampling and Simulation

10.1 Monte Carlo simulations

In the 5-card poker game, the probability of a flush is \(1/508 \approx 0.00197\). It’s a good exercise to derive this number analytically based on combinatorics. We can approximate this number using Monte Carlo simulations (based on the Law of Large Numbers) by simply repeatedly generating a poker hand and checking whether it’s a flush or not. Since the number we are estimating is quite small, the number of Monte Carlo replicates has to be quite large to result in a decent approximation. We use some code taken from a probability course at Duke University.

fraction of flush hands (out of 100000): 0.002070

10.2 Monte Carlo integration

Simulations, again via the Law of Large Numbers, allow for the computation of integrals, since integrals can be interpreted as expectations. This avenue offers some advantages over numerical integration such as ease of implementation (although code for numerical integration is readily available) and lower computational complexity in higher dimensions (in fact as low as \(3\)). We perform some numerical experiments in dimension one where a Monte Carlo approach is typically less desirable, but where we can more easily quantify its accuracy as compared to numerical integration.

[1] 7.483435e-08
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-8.196e-04 -1.283e-04  4.566e-05  3.157e-05  1.963e-04  6.626e-04 
Time units : microseconds 
           expr n.eval     min   lw.qu  median    mean   up.qu   max   total relative
 num_integral()    100    22.8    25.8    29.9    54.6    75.1   452    5460        1
  mc_integral()    100 31500.0 31900.0 33100.0 34700.0 36000.0 67300 3470000     1110

In dimension one, at least for this simple function, numerical integration is much superior in both accuracy and speed, as expected. In higher dimensions, numerical integration may not even be feasible… ## Rejection sampling Here is a simple example of rejection sampling, where the goal is to sample uniformly at random from the unit disc. The approach is based on sampling the unit square (which contains the unit disc) uniformly at random (which is easy) and only keeping those draws that fall in the disc.

10.3 Markov chain Monte Carlo

For illustration, we consider the setting of species co-occurrence, important in Ecology, where a presence-absence matrix is available, and we want to draw uniformly at random from the set of binary matrices of same size and with the same row and column totals. This is one of the random models considered in that field, meant to represent a situation where the various species have populated the various geographical sites “at random”. The Markov chain consists in swapping \(2 \times 2\) submatrices whenever possible.

                         Seymour Baltra Isabella Fernandina Santiago Rabida Pinzon Santa.Cruz Santa.Fe San.Cristobal Espanola Floreana Genovesa Marchena Pinta Darwin Wolf
Geospiza magnirostris          0      0        1          1        1      1      1          1        1             1        0        1        1        1     1      1    1
Geospiza fortis                1      1        1          1        1      1      1          1        1             1        0        1        0        1     1      0    0
Geospiza fuliginosa            1      1        1          1        1      1      1          1        1             1        1        1        0        1     1      0    0
Geospiza difficilis            0      0        1          1        1      0      0          1        0             1        0        1        1        0     1      1    1
Geospiza scandens              1      1        1          0        1      1      1          1        1             1        0        1        0        1     1      0    0
Geospiza conirostris           0      0        0          0        0      0      0          0        0             0        1        0        1        0     0      0    0
Camarhynchus psittacula        0      0        1          1        1      1      1          1        1             0        0        1        0        1     1      0    0
Camarhynchus pauper            0      0        0          0        0      0      0          0        0             0        0        1        0        0     0      0    0
Camarhynchus parvulus          0      0        1          1        1      1      1          1        1             1        0        1        0        0     1      0    0
Platyspiza crassirostris       0      0        1          1        1      1      1          1        1             1        0        1        0        1     1      0    0
Cactospiza pallida             0      0        1          1        1      0      1          1        0             1        0        0        0        0     0      0    0
Cactospiza heliobates          0      0        1          1        0      0      0          0        0             0        0        0        0        0     0      0    0
Certhidea olivacea             1      1        1          1        1      1      1          1        1             1        1        1        1        1     1      1    1
0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0
0 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1
1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
0 0 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0
0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0
0 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 0
1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1
0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0
0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 1 1 1 1 0 0 1 0 1 1 1 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0
0 1 1 1 1 1 1 0 1 1 0 0 0 1 1 0 1
0 0 1 0 1 1 0 1 0 1 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

This code produces only {} approximately uniform random binary matrix (among those with the same row and column totals).