# 11 Data Collection

## 11.1 Survey sampling

The following package accompanies the textbook Sampling Algorithms by Yves Tillé.

require(sampling)

### 11.1.1 Sampling according to prescribed inclusion probabilities

We assume that the inclusion probabilities (i.e., the probabilities that the individuals in the population are sampled) are all equal, unless otherwise specified. These inclusion probabilities sum to the (expected) sample size.

N = 1e2 # population size
n = 20 # desired sample size
Pi = n*rep(1/N, N) # inclusion probabilities

Each sampling design that follows is in a form of a binary vector with 1 indicating that the individual is to be sampled.

Systematic sampling. This sampling design consists in sampling at a regular interval until the desired sample size is achieved.

UPsystematic(Pi)
  [1] 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
[36] 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
[71] 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0

Poisson sampling. This sampling scheme amounts to sampling each individual according to its inclusion probability, independently of everything else. This design has the property that the desired sample size is not always achieved. (Rather, the sample size is random, having the Poisson distribution with mean $$n$$.)

UPpoisson(Pi)
  [1] 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1
[36] 1 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
[71] 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0

Conditional Poisson sampling (CPS). This sampling scheme is Poisson sampling conditioned on the sample being of the desired size. This can be implemented by rejection sampling (repeatedly rejecting a Poisson sample until it has the desired size), but there are faster methods. (CPS is known to maximize the entropy given the inclusion probabilities, therefore the name of the function below.)

UPmaxentropy(Pi)
  [1] 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0
[36] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
[71] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0

### 11.1.2 Cluster sampling

We use the following dataset on the Swiss municipalities in 2003, where clusters can be naturally defined. In particular, we use the cantons (CT) as clustering variable. (There are 26 cantons in total so that CT takes that many different values.)

data(swissmunicipalities)
cl = cluster(swissmunicipalities, "CT", size = 5, description = TRUE) # we select 5 cantons
Number of selected clusters: 5
Number of units in the population and number of selected units: 2896 198 
getdata(swissmunicipalities, cl) # lists the cantons to be surveyed
     REG  COM                       Nom HApoly Surfacesbois Surfacescult
1560   6 1025                  Ermensee    568          201          336
605    6 1086               Grosswangen   1971          302         1541
1007   6 1066             Schwarzenberg   3925         2109          963
8      6 1061                    Luzern   1581          408          183
993    6 1135                   Luthern   3779         1795         1529
20     6 1024                     Emmen   2033          370          948
23     6 1059                    Kriens   2734         1381          808
1182   6 1134        Langnau bei Reiden    860          357          410
817    6 1085                  Geuensee    645           97          479
764    6 1023                   Ballwil    875          102          670
505    6 1003               Escholzmatt   6141         2718         2414
264    6 1093                Neuenkirch   2548          499         1812
856    6 1037                      Rain    939          158          706
2365   6 1035                     Mosen    160           24          112
1977   6 1028                   Hamikon    466          126          310
766    6 1099                  Schenkon    670          115          459
415    6 1008                Schupfheim   3826         1258         1963
1129   6 1123                Altishofen    577          223          293
2040   6 1126                Ebersecken    856          149          675
382    6 1148             Willisau Land   3771          929         2584
229    6 1040                Rothenburg   1557          257         1052
1636   6 1129                 Fischbach    805          111          649
532    6 1143                    Schotz   1089          195          756
897    6 1004                    Fluhli  10819         4096         1428
908    6 1005                Hasle (LU)   4029         1595         1100
2364   6 1105                   Wilihof    240           29          195
1786   6 1100               Schlierbach    717          224          452
298    6 1051               Adligenswil    699          170          381
385    6 1140                    Reiden   1118          448          528
468    6 1102                   Sempach    891          130          615
1519   6 1106                   Winikon    757          223          485
1434   6 1122                  Altburon    677          201          411
837    6 1009              Werthenstein   1577          508          935
1078   6 1147                     Wikon    830          427          327
489    6 1125              Dagmersellen   1402          552          660
543    6 1149            Willisau Stadt    338          210           57
852    6 1067               Udligenswil    623          142          417
960    6 1089                   Knutwil    973          196          675
2229   6 1057                     Honau    125           24           86
100    6 1054                    Ebikon    968          232          372
480    6 1002                 Entlebuch   5689         2427         2163
723    6 1032                 Hohenrain   1959          343         1511
452    6 1069                    Weggis   1180          459          425
1128   6 1053                  Dierikon    278           63          167
916    6 1146                    Wauwil    295           30          212
914    6 1088              Hildisrieden    705           83          545
849    6 1087                   Gunzwil   2324          350         1818
1605   6 1027                 Gelfingen    389           97          250
622    6 1095                 Oberkirch    910          122          675
244    6 1063                    Meggen    728          172          346
1701   6 1144                   Uffikon    518          158          314
1675   6 1001             Doppleschwand    695          254          387
1581   6 1041                  Schongau   1245          325          877
1986   6 1036                 Muswangen    451          108          317
451    6 1065                      Root    866          241          442
1517   6 1145                   Ufhusen   1224          253          901
386    6 1107                  Wolhusen   1430          466          829
1169   6 1127                  Egolzwil    416           99          259
1494   6 1131              Grossdietwil   1019          225          739
1469   6 1055                   Gisikon    109           22           55
835    6 1082                     Buron    541           89          361
83     6 1058                      Horw   1284          545          418
619    6 1083                Buttisholz   1675          285         1236
2402   6 1029                Herlisberg    262           37          208
1312   6 1064              Meierskappel    679          170          446
235    6 1062                   Malters   2863          701         1849
680    6 1139                  Pfaffnau   1763          445         1154
301    6 1052                  Buchrain    481           82          237
2419   6 1101             Schwarzenbach    326          103          213
1367   6 1021                Aesch (LU)    462           90          315
1680   6 1142                Roggliswil    621          171          411
709    6 1030                 Hitzkirch    358           51          241
226    6 1098                    Ruswil   4520          949         3294
755    6 1137                   Nebikon    373          149          142
1280   6 1092                   Neudorf   1282          430          778
577    6 1136                   Menznau   3038         1056         1777
810    6 1150                 Zell (LU)   1390          316          943
879    6 1033                     Inwil   1031          167          729
690    6 1081               Beromunster    285           26          187
165    6 1031                  Hochdorf    963           98          653
1165   6 1039                 Romerswil   1396          233         1075
801    6 1097           Rickenbach (LU)    940          246          587
1333   6 1130                   Gettnau    608          225          319
562    6 1104                  Triengen    849          180          534
2235   6 1138                   Ohmstal    445          119          304
2463   6 1090                  Kulmerau    362          122          213
1296   6 1068                   Vitznau    891          502          197
1846   6 1121                 Alberswil    354           65          264
1364   6 1091                  Mauensee    716          123          491
161    6 1103                    Sursee    586          128          180
617    6 1094                   Nottwil   1031           92          803
2521   6 1038                 Retschwil    261           64          172
2487   6 1034                     Lieli    367          102          253
1205   6 1006              Marbach (LU)   4510         2136         1151
1145   6 1084                      Eich    594          115          400
2599   6 1042                 Sulz (LU)    385          114          251
862    6 1132    Hergiswil bei Willisau   3135         1150         1819
2084   6 1133                   Kottwil    611          105          468
1534   6 1056                   Greppen    332          131          143
55     6 1060                    Littau   1325          280          695
2103   6 1124                Buchs (LU)    467           98          327
2145   6 1022                    Altwis    292           84          185
899    6 1128                  Ettiswil    646          112          454
1654   6 1141                Richenthal    723          185          484
551    6 1026           Eschenbach (LU)   1323          191          978
1578   6 1007                    Romoos   3732         2303         1144
1627   6 1096                 Pfeffikon    248          126           96
1221   3 2702                 Bettingen    223          100           74
30     3 2703                    Riehen   1086          276          283
3      3 2701                     Basel   2391           97           93
56     5 3001                   Herisau   2518          678         1428
1956   5 3003              Schonengrund    520          157          320
595    5 3022                      Gais   2123          938          943
422    5 3023                  Speicher    821          238          450
1298   5 3031                 Grub (AR)    425          144          229
895    5 3034                 Rehetobel    675          236          362
911    5 3038                Wolfhalden    695          212          404
1297   5 3002                   Hundwil   2396          751          960
695    5 3006                   Urnasch   4825         1977         1457
1438   5 3036                 Wald (AR)    682          195          451
1595   5 3035                Reute (AR)    499          260          199
1087   5 3033                Lutzenberg    226           59          126
743    5 3037              Walzenhausen    698          241          367
1068   5 3005                Stein (AR)    937          207          642
261    5 3024               Teufen (AR)   1525          456          848
1018   5 3004              Schwellbrunn   1740          579         1052
906    5 3007                 Waldstatt    674          174          417
951    5 3021                    Buhler    561          145          350
846    5 3025                    Trogen   1001          411          491
391    5 3032                    Heiden    748          229          394
824    5 3105                  Schwende   5753         1848          679
1058   5 3102                    Gonten   2471          791         1230
1228   5 3104            Schlatt-Haslen   1792          544         1110
873    5 3111                   Oberegg   1467          623          727
564    5 3103                      Rute   4083         1301         1364
270    5 3101                 Appenzell   1686          371         1079
453    2 6457            Marin-Epagnier    335           43          123
1936   2 6482                Montmollin    406          151          162
1572   2 6511             Les Verrieres   2868         1460          645
1016   2 6451                   Cornaux    472          112          229
371    2 6455               Le Landeron   1031          454          425
110    2 6436                  Le Locle   2314          701          753
2359   2 6433              Brot-Plamboz   1603          459          862
2059   2 6435        La Chaux-du-Milieu   1728          840          487
1201   2 6510                   Travers   2465         1062          830
613    2 6505                    Couvet   1641          733          401
2070   2 6486                  Villiers   1059          577          269
2536   2 6411                Montalchez    641          249          274
2147   2 6501               Les Bayards   1915          981          559
991    2 6475                Dombresson   1277          466          714
1756   2 6509        Saint-Sulpice (NE)   1309          740          262
1839   2 6504          La Cote-aux-Fees   1285          529          451
884    2 6410                   Gorgier   1398          788          382
1681   2 6432                La Brevine   4182         1803         1336
1112   2 6437       Les Ponts-de-Martel   1823          556          683
883    2 6403                      Bole    258          119           72
2051   2 6485                  Valangin    377          189          150
2520   2 6415                 Vaumarcus    178           46          110
1402   2 6479            Fontaines (NE)   1014          292          353
16     2 6458                 Neuchatel   1810          998          197
1594   2 6471             Boudevilliers   1257          410          621
946    2 6478             Fontainemelon    245           98           61
1339   2 6481        Les Hauts-Geneveys    795          362          131
1543   2 6477       Fenin-Vilars-Saules    648          322          289
411    2 6407    Corcelles-Cormondreche    486          202          163
1719   2 6474                  Coffrane    649          169          328
521    2 6459              Saint-Blaise    887          479          240
1398   2 6456                 Lignieres   1251          400          683
1468   2 6507              Motiers (NE)    642          322          168
2425   2 6483           Le Paquier (NE)    958          488          356
1687   2 6503                    Buttes   1820         1091          339
1213   2 6431               Les Brenets   1153          571          321
433    2 6506                  Fleurier    774          408          143
2751   2 6405              Brot-Dessous    495          403           58
354    2 6408                Cortaillod    369           42          195
273    2 6412                    Peseux    343          241           10
948    2 6473      Chezard-Saint-Martin   1305          502          621
2244   2 6434      Le Cerneux-Pequignot   1567          634          555
1044   2 6480 Les Geneveys-sur-Coffrane    786          312          220
829    2 6472                   Cernier    911          357          233
629    2 6454                 Hauterive    212          127           31
454    2 6402                    Bevaix   1078          516          398
307    2 6406            Colombier (NE)    452           92          185
2454   2 6409                   Fresens    158           13          129
985    2 6401                 Auvernier    168            7           82
2792   2 6476                  Engollon    262           33          214
1771   2 6460             Thielle-Wavre    209            2          165
1944   2 6508                 Noiraigue    638          445           97
2438   2 6422           Les Planchettes   1173          678          262
1326   2 6423                  La Sagne   2555          954         1130
2125   2 6502                 Boveresse   1290          587          355
1300   2 6413                 Rochefort   2089         1302          311
828    2 6452             Cressier (NE)    855          466          247
1410   2 6484                 Savagnier    853          377          430
280    2 6404                    Boudry   1677         1063          405
12     2 6421         La Chaux-de-Fonds   5566         1572         2221
672    2 6414        Saint-Aubin-Sauges    771          301          280
2289   2 6453                     Enges    959          508          353
Alp Airbat Airind P00BMTOT P00BWTOT Pop020 Pop2040 Pop4065 Pop65P
1560    0     14      3      364      382    248     194     235     69
605     0     70      6     1371     1360    844     856     698    333
1007  475     57      5      752      737    441     435     441    172
8       0    524     27    27324    32172   9533   18843   18177  12943
993   279     59      6      814      706    473     405     385    257
20      0    296    116    13315    13570   6621    7924    8608   3732
23     56    241     61    11954    12788   5656    7082    8108   3896
1182    0     39      4      603      603    352     344     368    142
817     0     40      3      966      980    625     567     599    155
764     0     46      7     1095     1017    571     658     684    199
505   654     86      8     1635     1594    969     880     838    542
264     0     98     12     2754     2763   1675    1705    1601    536
856     0     53      3      923      908    588     543     519    181
2365    0      8      1      128      126     80      74      75     25
1977    0     15      0      233      211    147     122     141     34
766     0     53      5     1059     1045    607     674     702    121
415   336    100     14     1981     1916   1142    1106     997    652
1129    0     28     14      657      622    391     446     331    111
2040    0     20      0      217      187    130     123     101     50
382     0    106     25     2074     2031   1172    1210    1142    581
229     0     91     39     3142     3097   1694    1836    2073    636
1636    0     25      0      360      318    196     211     186     85
532     0     47     13     1549     1516    911     966     796    392
897  3419    102      7      935      802    511     499     480    247
908   940     35      2      889      812    506     474     465    256
2364    0     10      0      139      115     97      78      51     28
1786    0     24      1      293      274    157     179     167     64
298     0     83      8     2502     2508   1550    1251    1873    336
385     0     74     16     2018     2072   1113    1145    1225    607
468     0     57      8     1728     1755   1030    1018    1112    323
1519    0     34      2      390      398    253     217     246     72
1434    0     26      3      471      412    261     281     234    107
837     0     46     17      974      918    512     566     560    254
1078    0     41     11      661      681    364     353     414    211
489     0     63     20     1672     1639    895     977     981    458
543     0     36     14     1460     1536    630     906     929    531
852     0     36      3      945      906    554     475     660    162
960     0     47      4      832      751    458     462     486    177
2229    0      5      0      161      159    119      94      99      8
100     0    152     17     5501     5821   2940    3183    3840   1359
480   688    106     12     1717     1649    973     921     901    571
723     0     66      3     1153     1087    816     588     610    226
452   119    107      1     1721     1895    748     999    1169    700
1128    0     17     11      661      618    389     405     387     98
916     0     32      7      863      808    481     492     498    200
914     0     41      1      845      830    480     426     565    204
849     0     83      2      974      883    558     539     556    204
1605    0     19      0      354      355    251     201     198     59
622     0     58      9     1337     1324    733     889     763    276
244     0    147      3     2851     3072   1309    1395    2151   1068
1701    0     22      1      326      308    166     209     188     71
1675    0     18      1      330      323    214     200     171     68
1581    0     25      1      390      339    217     213     193    106
1986    0     12      0      230      209    152     129     120     38
451     0     58     43     1853     1772    928    1211    1066    420
1517    0     21      1      409      380    233     243     210    103
386     0     61      8     2050     2036   1137    1211    1121    617
1169    0     29      5      629      597    385     400     329    112
1494    0     27      1      425      383    267     226     217     98
1469    0     15      3      447      400    228     291     279     49
835     0     42     12      967      928    585     582     538    190
83      0    188     13     6200     6448   2568    3828    4190   2062
619     0     67     11     1341     1339    859     786     739    296
2402    0      8      2      131      108     72      65      78     24
1312    0     39      5      518      501    284     301     338     96
235     0    135     25     3136     2991   1836    1833    1691    767
680     0     80      4     1185     1205    608     628     795    359
301     0     48     18     2526     2450   1444    1504    1587    441
2419    0      9      0      121      112     76      61      63     33
1367    0     34      3      494      456    282     264     299    105
1680    0     18      2      330      318    213     175     152    108
709     0     42      9     1073     1220    704     647     640    302
226     0    144     16     3208     3054   1885    1954    1653    770
755     0     37     18     1091     1050    577     629     655    280
1280    0     31      1      538      520    330     292     324    112
577    23     70     19     1449     1401    872     899     746    333
810     0     41     16      968     1007    566     562     555    292
879     0     53     11      910      873    503     609     496    175
690     0     34      5     1179     1179    691     645     661    361
165     0     96     41     3574     4187   2170    2242    2149   1200
1165    0     47      3      655      574    353     385     353    138
801     0     46     19     1016     1001    650     591     537    239
1333    0     18     13      545      445    289     311     274    116
562     0     61     10     1488     1438    888     855     839    344
2235    0     10      1      168      149     96     102      78     41
2463    0      6      0      107      105     67      52      62     31
1296   90     42      1      518      521    258     264     338    179
1846    0     12      2      269      254    158     168     141     56
1364    0     22      4      464      494    303     304     269     82
161     0    105     54     3809     4250   1908    2513    2509   1129
617     0     86      2     1351     1331    750     914     793    225
2521    0     10      0       99       92     65      61      47     18
2487    0      9      0      112       90     66      48      69     19
1205  891     47      3      617      558    327     310     356    182
1145    0     31      0      632      624    347     354     448    107
2599    0     12      0       80       77     49      39      56     13
862    40     54      2      932      877    535     528     447    299
2084    0     16      0      197      189    130     136      77     43
1534   25     26      1      389      381    230     227     250     63
55      0    139     57     7881     8048   3962    4998    5098   1871
2103    0      8      0      185      189    123     110      87     54
2145    0     12      2      187      167    105     101     110     38
899     0     47      3      869      866    506     538     474    217
1654    0     27      1      341      321    199     204     178     81
551     0     58     10     1500     1466    779     913     872    402
1578  146     30      0      381      351    259     166     185    122
1627    0     17      2      335      351    168     163     201    154
1221    0     27      0      553      598    235     330     385    201
30      0    302      2     9467    10903   4039    4191    7097   5043
3       0   1023    213    78736    87822  28161   50349   53734  34314
56      0    212     36     7875     8007   3804    4487    5049   2542
1956   21     14      0      233      226    141     117     142     59
595    97     70      7     1392     1378    779     651     882    458
422     0     82      5     1906     1947    965     950    1395    543
1298    0     34      2      508      530    303     250     332    153
895     0     48      0      830      912    426     436     540    340
911     0     55      3      865      824    425     419     562    283
1297  447     29      3      542      496    338     264     280    156
695  1141     69      8     1183     1153    653     591     698    394
1438    0     26      0      452      429    273     208     269    131
1595    0     22      0      367      349    186     171     220    139
1087    0     31      0      697      626    353     392     400    178
743     0     60      4     1110     1071    552     547     725    357
1068    0     47      2      674      681    388     349     414    204
261     0    131      2     2685     2850   1243    1373    1848   1071
1018   18     50      3      745      723    456     369     433    210
906     0     43      4      858      852    468     440     531    271
951     0     28      7      785      813    397     438     522    241
846    18     47      0      939      928    550     412     622    283
391     0     79      8     1971     2092   1042     983    1330    708
824  2086     57      1     1000      935    605     522     529    279
1058  295     47      6      717      662    434     334     370    241
1228   49     38      4      587      559    398     290     287    171
873    33     39      4      903      893    495     437     554    310
564   914     76      7     1518     1397    915     804     776    420
270    44    106     14     2720     2727   1451    1509    1574    913
453     0     59     24     1821     1790    838     990    1339    444
1936   65     14      1      239      234    108     150     159     56
1572  678     34      4      389      346    199     195     209    132
1016    0     19     44      730      743    377     422     509    165
371     0     70      0     2038     2189   1048    1166    1442    571
110   480    179     19     5082     5447   2369    2678    3299   2183
2359  232     21      0      134      121     70      66      95     24
2059  354     23      0      199      199    110      95     139     54
1201  441     41      6      589      591    291     289     370    230
613   334     70     14     1286     1417    627     673     849    554
2070  191     14      0      201      193    126     109     131     28
2536   93     11      1       95       89     39      57      54     34
2147  320     21      1      186      168     88      89     118     59
991    39     34      3      794      727    477     469     396    179
1756  230     21      2      289      301    162     172     177     79
1839  255     23      3      264      265     97     128     172    132
884   103     59      3      918      857    410     454     599    312
1681  878     41      0      329      318    178     162     201    106
1112  403     35      3      618      679    350     341     350    256
883     0     45      2      849      929    383     467     609    319
2051    0     10      1      207      193     86     134     115     65
2520    0     12      0      101       90     49      44      73     25
1402  289     23      3      459      453    268     241     314     89
16      7    323     26    15542    17372   6356   11204    9576   5778
1594  148     20      0      353      365    190     212     197    119
946    44     29      1      777      828    388     469     495    253
1339  254     21      0      477      506    221     306     349    107
1543    0     24      0      375      390    176     209     280    100
411     0     86      6     1869     2045    923    1035    1237    719
1719   92     20      4      306      312    184     155     196     83
521     0     49      6     1512     1605    675     869    1035    538
1398   85     33      3      452      462    245     295     283     91
1468   92     27      2      413      436    251     197     274    127
2425   85     14      0      119      112     63      65      77     26
1687  318     19      5      301      343    144     175     187    138
1213  159     38      0      555      609    272     299     392    201
433    84     67      8     1781     1980    945     920    1131    765
2751    8      8      0       46       49     17      26      29     23
354     0     72     20     2145     2228   1024    1293    1449    607
273     0     66      6     2551     2836   1179    1653    1581    974
948   115     41      1      789      812    455     412     542    192
2244  305     15      2      160      153     82      94      90     47
1044  195     31      6      717      692    339     443     454    173
829   258     41      1      933      988    471     496     647    307
629     0     31      2     1293     1344    528     815     909    385
454    13     96      8     1795     1808    902     927    1224    550
307     0    101     10     2393     2504   1184    1270    1724    719
2454    0      5      1      105      113     50      51      54     63
985     0     30      1      758      775    311     376     552    294
2792    0      3      0       40       34     13      25      21     15
1771    0     18      2      297      283    183     173     189     35
1944   35     13      5      239      227    126     121     151     68
2438  172     17      0      118      107     78      53      73     21
1326  402     37      0      476      521    237     254     300    206
2125  293     14      0      190      173     89      89     118     67
1300  383     55      0      529      507    259     296     346    135
828     0     30     59      950      973    506     566     585    266
1410    0     27      4      459      444    263     267     293     80
280     0     93     17     2587     2724   1273    1579    1772    687
12    867    467     56    17535    19481   8319   10551   11258   6888
672    99     56      2     1178     1248    565     620     828    413
2289   77      9      0      143      146     71      72     114     32
H00PTOT H00P01 H00P02 H00P03 H00P04 POPTOT CT ID_unit      Prob
1560     249     59     62     30     98    746  3    1560 0.1923077
605      905    217    234    107    347   2731  3     605 0.1923077
1007     511    106    160     72    173   1489  3    1007 0.1923077
8      30586  15452   9280   2777   3077  59496  3       8 0.1923077
993      438     84    101     63    190   1520  3     993 0.1923077
20     11165   3720   3558   1483   2404  26885  3      20 0.1923077
23     10830   3760   3659   1358   2053  24742  3      23 0.1923077
1182     410    113    118     54    125   1206  3    1182 0.1923077
817      708    198    184    103    223   1946  3     817 0.1923077
764      791    197    233    138    223   2112  3     764 0.1923077
505     1133    342    274    138    379   3229  3     505 0.1923077
264     1949    458    590    261    640   5517  3     264 0.1923077
856      637    141    179     97    220   1831  3     856 0.1923077
2365      89     20     26     11     32    254  3    2365 0.1923077
1977     148     26     48     25     49    444  3    1977 0.1923077
766      766    160    236    133    237   2104  3     766 0.1923077
415     1297    329    355    172    441   3897  3     415 0.1923077
1129     461    131    116     57    157   1279  3    1129 0.1923077
2040     122     27     24     20     51    404  3    2040 0.1923077
382     1347    298    393    192    464   4105  3     382 0.1923077
229     2330    555    734    352    689   6239  3     229 0.1923077
1636     223     37     72     35     79    678  3    1636 0.1923077
532     1130    320    306    149    355   3065  3     532 0.1923077
897      595    177    163     67    188   1737  3     897 0.1923077
908      583    132    166     84    201   1701  3     908 0.1923077
2364      77     15     12     17     33    254  3    2364 0.1923077
1786     183     34     51     24     74    567  3    1786 0.1923077
298     1783    358    533    275    617   5010  3     298 0.1923077
385     1517    436    433    217    431   4090  3     385 0.1923077
468     1237    285    361    179    412   3483  3     468 0.1923077
1519     265     53     79     34     99    788  3    1519 0.1923077
1434     301     53     98     42    108    883  3    1434 0.1923077
837      697    188    214     88    207   1892  3     837 0.1923077
1078     514    159    152     73    130   1342  3    1078 0.1923077
489     1205    328    347    165    365   3311  3     489 0.1923077
543     1272    460    393    167    252   2996  3     543 0.1923077
852      663    135    208    110    210   1851  3     852 0.1923077
960      564    129    167     88    180   1583  3     960 0.1923077
2229     104     19     28     12     45    320  3    2229 0.1923077
100     4476   1337   1393    625   1121  11322  3     100 0.1923077
480     1157    326    325    131    375   3366  3     480 0.1923077
723      671    140    170     84    277   2240  3     723 0.1923077
452     1412    458    562    155    237   3616  3     452 0.1923077
1128     482    125    140     80    137   1279  3    1128 0.1923077
916      588    144    164     94    186   1671  3     916 0.1923077
914      594    120    195     94    185   1675  3     914 0.1923077
849      615    129    170     92    224   1857  3     849 0.1923077
1605     233     53     59     41     80    709  3    1605 0.1923077
622      991    281    283    140    287   2661  3     622 0.1923077
244     2479    752    895    333    499   5923  3     244 0.1923077
1701     237     58     68     46     65    634  3    1701 0.1923077
1675     217     47     56     25     89    653  3    1675 0.1923077
1581     242     39     82     39     82    729  3    1581 0.1923077
1986     148     31     42     17     58    439  3    1986 0.1923077
451     1432    427    452    198    355   3625  3     451 0.1923077
1517     267     67     67     37     96    789  3    1517 0.1923077
386     1440    375    417    206    442   4086  3     386 0.1923077
1169     442    119    121     53    149   1226  3    1169 0.1923077
1494     262     59     73     30    100    808  3    1494 0.1923077
1469     323     74    108     53     88    847  3    1469 0.1923077
835      672    169    165    117    221   1895  3     835 0.1923077
83      5426   1951   1888    652    935  12648  3      83 0.1923077
619      896    198    241    119    338   2680  3     619 0.1923077
2402      81     17     22     16     26    239  3    2402 0.1923077
1312     385    101    118     53    113   1019  3    1312 0.1923077
235     2197    615    602    300    680   6127  3     235 0.1923077
680      780    205    216    119    240   2390  3     680 0.1923077
301     1893    469    605    264    555   4976  3     301 0.1923077
2419      78     18     22      9     29    233  3    2419 0.1923077
1367     359     96    117     48     98    950  3    1367 0.1923077
1680     214     43     63     25     83    648  3    1680 0.1923077
709      780    198    218    129    235   2293  3     709 0.1923077
226     2136    529    566    293    748   6262  3     226 0.1923077
755      800    208    238    120    234   2141  3     755 0.1923077
1280     348     75     87     54    132   1058  3    1280 0.1923077
577      929    215    243    126    345   2850  3     577 0.1923077
810      655    163    163     85    244   1975  3     810 0.1923077
879      662    163    225     77    197   1783  3     879 0.1923077
690      849    253    230    115    251   2358  3     690 0.1923077
165     2774    815    812    351    796   7761  3     165 0.1923077
1165     414     77    121     72    144   1229  3    1165 0.1923077
801      682    144    188    101    249   2017  3     801 0.1923077
1333     330     79     75     51    125    990  3    1333 0.1923077
562     1028    262    289    137    340   2926  3     562 0.1923077
2235     102     16     33     16     37    317  3    2235 0.1923077
2463      75     20     17     15     23    212  3    2463 0.1923077
1296     433    149    139     51     94   1039  3    1296 0.1923077
1846     186     43     57     23     63    523  3    1846 0.1923077
1364     325     69     88     51    117    958  3    1364 0.1923077
161     3288   1135   1002    424    727   8059  3     161 0.1923077
617      985    279    278    140    288   2682  3     617 0.1923077
2521      60     13     12     11     24    191  3    2521 0.1923077
2487      65     15     16     10     24    202  3    2487 0.1923077
1205     444    134    121     59    130   1175  3    1205 0.1923077
1145     447     76    162     72    137   1256  3    1145 0.1923077
2599      45      6     14      4     21    157  3    2599 0.1923077
862      550    102    130     92    226   1809  3     862 0.1923077
2084     128     27     37     19     45    386  3    2084 0.1923077
1534     293     82     88     38     85    770  3    1534 0.1923077
55      6636   2218   2002    905   1511  15929  3      55 0.1923077
2103     123     26     35     14     48    374  3    2103 0.1923077
2145     132     41     34     18     39    354  3    2145 0.1923077
899      612    160    160     84    208   1735  3     899 0.1923077
1654     227     52     56     39     80    662  3    1654 0.1923077
551     1070    274    331    140    325   2966  3     551 0.1923077
1578     218     45     54     28     91    732  3    1578 0.1923077
1627     289     95     97     38     59    686  3    1627 0.1923077
1221     427    125    154     60     88   1151 12    1221 0.1923077
30      9201   3248   3438   1070   1445  20370 12      30 0.1923077
3      86371  44469  24838   7890   9174 166558 12       3 0.1923077
56      6784   2540   2079    781   1384  15882 15      56 0.1923077
1956     165     43     51     22     49    459 15    1956 0.1923077
595     1097    347    354    123    273   2770 15     595 0.1923077
422     1624    521    543    206    354   3853 15     422 0.1923077
1298     384    105    122     60     97   1038 15    1298 0.1923077
895      652    185    252     69    146   1742 15     895 0.1923077
911      694    214    234    100    146   1689 15     911 0.1923077
1297     326     83     98     30    115   1038 15    1297 0.1923077
695      903    314    260    108    221   2336 15     695 0.1923077
1438     327    100     85     39    103    881 15    1438 0.1923077
1595     277     93     84     33     67    716 15    1595 0.1923077
1087     476    131    158     71    116   1323 15    1087 0.1923077
743      875    314    259    109    193   2181 15     743 0.1923077
1068     509    138    162     60    149   1355 15    1068 0.1923077
261     2354    810    801    276    467   5535 15     261 0.1923077
1018     524    149    150     64    161   1468 15    1018 0.1923077
906      636    177    195     81    183   1710 15     906 0.1923077
951      639    189    221     88    141   1598 15     951 0.1923077
846      657    196    196     85    180   1867 15     846 0.1923077
391     1669    563    525    217    364   4063 15     391 0.1923077
824      686    175    185     93    233   1935 16     824 0.1923077
1058     447    124    115     52    156   1379 16    1058 0.1923077
1228     356     77     89     38    152   1146 16    1228 0.1923077
873      680    173    235     85    187   1796 16     873 0.1923077
564     1014    254    264    144    352   2915 16     564 0.1923077
270     2107    693    611    259    544   5447 16     270 0.1923077
453     1497    470    465    246    316   3611 24     453 0.1923077
1936     201     60     67     35     39    473 24    1936 0.1923077
1572     294    117     85     31     61    735 24    1572 0.1923077
1016     603    179    187     90    147   1473 24    1016 0.1923077
371     1800    602    582    225    391   4227 24     371 0.1923077
110     4713   1787   1568    565    793  10529 24     110 0.1923077
2359      94     23     30     17     24    255 24    2359 0.1923077
2059     144     34     42     26     42    398 24    2059 0.1923077
1201     493    148    174     75     96   1180 24    1201 0.1923077
613     1187    422    380    156    229   2703 24     613 0.1923077
2070     135     26     37     23     49    394 24    2070 0.1923077
2536      72     17     27     12     16    184 24    2536 0.1923077
2147     141     49     42     17     33    354 24    2147 0.1923077
991      567    164    162     85    156   1521 24     991 0.1923077
1756     238     73     76     29     60    590 24    1756 0.1923077
1839     219     85     74     27     33    529 24    1839 0.1923077
884      716    198    272    100    146   1775 24     884 0.1923077
1681     249     66     88     30     65    647 24    1681 0.1923077
1112     497    155    173     50    119   1297 24    1112 0.1923077
883      756    231    266    109    150   1778 24     883 0.1923077
2051     167     55     52     23     37    400 24    2051 0.1923077
2520      77     21     28     10     18    191 24    2520 0.1923077
1402     331     76     88     62    105    912 24    1402 0.1923077
16     15937   7348   4763   1817   2009  32914 24      16 0.1923077
1594     252     64     82     48     58    718 24    1594 0.1923077
946      681    218    236     82    145   1605 24     946 0.1923077
1339     384    125    125     57     77    983 24    1339 0.1923077
1543     283     61    114     38     70    765 24    1543 0.1923077
411     1688    565    579    217    327   3914 24     411 0.1923077
1719     238     71     69     26     72    618 24    1719 0.1923077
521     1337    405    485    198    249   3117 24     521 0.1923077
1398     340     85    109     49     97    914 24    1398 0.1923077
1468     353    117    110     37     89    849 24    1468 0.1923077
2425      83     20     28     10     25    231 24    2425 0.1923077
1687     265    103     77     26     59    644 24    1687 0.1923077
1213     482    142    177     63    100   1164 24    1213 0.1923077
433     1609    597    492    210    310   3761 24     433 0.1923077
2751      47     20     15      7      5     95 24    2751 0.1923077
354     1851    599    617    241    394   4373 24     354 0.1923077
273     2528   1050    764    304    410   5387 24     273 0.1923077
948      608    151    208     91    158   1601 24     948 0.1923077
2244     115     21     43     17     34    313 24    2244 0.1923077
1044     558    163    175     87    133   1409 24    1044 0.1923077
829      859    310    283     98    168   1921 24     829 0.1923077
629     1227    475    381    169    202   2637 24     629 0.1923077
454     1443    401    484    217    341   3603 24     454 0.1923077
307     2032    639    660    295    438   4897 24     307 0.1923077
2454      62     10     23     11     18    218 24    2454 0.1923077
985      680    242    237     75    126   1533 24     985 0.1923077
2792      31     11     10      6      4     74 24    2792 0.1923077
1771     187     26     53     41     67    580 24    1771 0.1923077
1944     193     64     61     28     40    466 24    1944 0.1923077
2438      72     16     22     10     24    225 24    2438 0.1923077
1326     374    108    124     42    100    997 24    1326 0.1923077
2125     148     40     51     22     35    363 24    2125 0.1923077
1300     393    111    134     59     89   1036 24    1300 0.1923077
828      699    178    201    114    206   1923 24     828 0.1923077
1410     338     80    108     50    100    903 24    1410 0.1923077
280     2156    711    682    303    460   5311 24     280 0.1923077
12     17207   7087   5293   2053   2774  37016 24      12 0.1923077
672     1008    336    332    125    215   2426 24     672 0.1923077
2289     110     21     48     11     30    289 24    2289 0.1923077

###Stratified sampling We use the same dataset and this time use the region (REG) as the stratification variable. (The function, as implemented, requires the ordering of the dataset according to the values of the stratification variable. This is an implementation flaw.)

Data = swissmunicipalities[order(swissmunicipalities$REG), ] st = strata(Data, "REG", size = rep(20, 7), description = TRUE) # we select 20 municipalities in each of the 7 regions Stratum 1 Population total and number of selected units: 589 20 Stratum 2 Population total and number of selected units: 913 20 Stratum 3 Population total and number of selected units: 321 20 Stratum 4 Population total and number of selected units: 171 20 Stratum 5 Population total and number of selected units: 471 20 Stratum 6 Population total and number of selected units: 186 20 Stratum 7 Population total and number of selected units: 245 20 Number of strata 7 Total number of selected units 140  ## 11.2 Experimental design Experimental designs were perhaps first developed in the context of agricultural experiments. So it is fitting that use the following package, which gathers a very large number of statistical tools for agricultural research. require(agricolae) require(knitr) The “plots” below stand for the experimental units. (These would be individuals in clinical trials.) The “book” is short for “field book”, and specifies the treatment assignments. Completely randomized design treat = c("A", "B", "C") # treatments r = c(13, 13, 14) # number of replicates per treatment outdesign = design.crd(treat, r) book = outdesign$book # field book (treatment assignments)
subset(book, select = c("plots", "treat"))
   plots treat
1    101     C
2    102     C
3    103     B
4    104     C
5    105     B
6    106     B
7    107     B
8    108     B
9    109     C
10   110     A
11   111     C
12   112     C
13   113     B
14   114     A
15   115     A
16   116     A
17   117     A
18   118     C
19   119     C
20   120     B
21   121     A
22   122     A
23   123     C
24   124     A
25   125     B
26   126     C
27   127     A
28   128     A
29   129     C
30   130     A
31   131     B
32   132     A
33   133     B
34   134     A
35   135     B
36   136     C
37   137     C
38   138     B
39   139     C
40   140     B
# book

Randomized complete block design

treat = c("A", "B", "C", "D", "E")
r = 4 # number of blocks
outdesign = design.rcbd(treat, r)
book = outdesign$book book  plots block treat 1 101 1 A 2 102 1 E 3 103 1 C 4 104 1 B 5 105 1 D 6 201 2 C 7 202 2 E 8 203 2 B 9 204 2 A 10 205 2 D 11 301 3 C 12 302 3 B 13 303 3 E 14 304 3 D 15 305 3 A 16 401 4 A 17 402 4 E 18 403 4 D 19 404 4 B 20 405 4 C Matched-pairs design (which is a special case, with blocks of size 2 representing the pairs) treat = c("A", "B") # two treatments r = 10 # number of pairs outdesign = design.rcbd(treat, r) book = outdesign$book
names(book) = c("subject", "pair", "treat")
book
   subject pair treat
1      101    1     B
2      102    1     A
3      201    2     B
4      202    2     A
5      301    3     B
6      302    3     A
7      401    4     A
8      402    4     B
9      501    5     A
10     502    5     B
11     601    6     A
12     602    6     B
13     701    7     B
14     702    7     A
15     801    8     B
16     802    8     A
17     901    9     B
18     902    9     A
19    1001   10     B
20    1002   10     A

Balanced incomplete block design (The number of blocks is determined automatically so that the design is balanced.)

treat = c("A", "B", "C", "D", "E" )
k = 4 # number of units per block
outdesign = design.bib(treat, k)

Parameters BIB
==============
Lambda     : 3
treatmeans : 5
Block size : 4
Blocks     : 5
Replication: 4

Efficiency factor 0.9375

<<< Book >>>
book = outdesign$book book  plots block treat 1 101 1 E 2 102 1 C 3 103 1 A 4 104 1 D 5 201 2 B 6 202 2 C 7 203 2 E 8 204 2 D 9 301 3 D 10 302 3 B 11 303 3 C 12 304 3 A 13 401 4 C 14 402 4 B 15 403 4 E 16 404 4 A 17 501 5 A 18 502 5 D 19 503 5 B 20 504 5 E Split plot design (When the number of replicates is set to $$r = 1$$, the number of whole plots is ) treat1 = c("A", "B", "C", "D") # treatment at the whole plot level treat2 = c("a", "b", "c") # treatment at the subplot level outdesign = design.split(treat1, treat2, r = 1) book = outdesign$book
# There seems to be a bug in that even when r = 1 there appear to be 3 replicates. To remedy the situation, we simply extract one replication.
number_splots = length(treat1) * length(treat2)
book[1:number_splots, -3]
   plots splots treat1 treat2
1    101      1      A      a
2    101      2      A      b
3    101      3      A      c
4    102      1      B      c
5    102      2      B      a
6    102      3      B      b
7    103      1      D      c
8    103      2      D      a
9    103      3      D      b
10   104      1      C      c
11   104      2      C      a
12   104      3      C      b

Latin square design This design can be used for simple repeated measures or a crossover designs. (There are packages dedicated to crossover designs.)

treat = c("A", "B", "C", "D")
outdesign = design.lsd(treat)
book = outdesign$book[, -1] names(book) = c("subject", "period", "treat") # for a crossover design book  subject period treat 1 1 1 A 2 1 2 D 3 1 3 B 4 1 4 C 5 2 1 C 6 2 2 B 7 2 3 D 8 2 4 A 9 3 1 D 10 3 2 C 11 3 3 A 12 3 4 B 13 4 1 B 14 4 2 A 15 4 3 C 16 4 4 D # or in matrix form book_matrix = matrix(book$treat, byrow = TRUE, 4, 4)
rownames(book_matrix) = paste("subject", 1:4)
colnames(book_matrix) = paste("period", 1:4)
prmatrix(book_matrix, quote = FALSE)
          period 1 period 2 period 3 period 4
subject 1 A        D        B        C
subject 2 C        B        D        A
subject 3 D        C        A        B
subject 4 B        A        C        D       

## 11.3 Observational studies

We simply look at how to perform a matching in R. Such a matching can be used in a matched-pairs experimental design, or in the analysis of an observational study.

require(optmatch)

We look at the following dataset on women of Pima Indian heritage. (The Pima Indians are known to have among the highest rates of type 2 diabetes in the world.) We merge the training and test sets into a single dataset (with a total of 532 observations).

require(MASS)
Pima = rbind(Pima.tr, Pima.te)

In a case-control framework, we might want to understand what distinguishes women with diabetes (cases) and women without (controls).

cases = which(Pima$type == "Yes") # 177 cases controls = which(Pima$type == "No") # 355 controls

We match one-to-one according to age. (This is a particularly simple problem as we are matching on a single, one-dimensional variable.) We built the matrix of differences in age between cases and controls.

D = abs(outer(Pima$age[cases], Pima$age[controls], "-"))
rownames(D) = cases
colnames(D) = controls
M = pairmatch(D)
print(M, grouped = TRUE) # each row corresponds to a pair (case, control) given in the 2nd column
 Group  Members
1.1   10, 15
1.10 118, 496
1.100 344, 312
1.101 345, 471
1.102  35, 305
1.103 355, 448
1.104 356, 169
1.105  357, 95
1.106 362, 446
1.107 371, 366
1.108 372, 179
1.109 377, 440
1.11 120, 251
1.110 378, 296
1.111 380, 444
1.112 384, 519
1.113 385, 461
1.114 386, 425
1.115 390, 192
1.116 391, 176
1.117 392, 185
1.118 396, 133
1.119 398, 433
1.12  123, 82
1.120 399, 513
1.121    41, 9
1.122 412, 266
1.123 415, 459
1.124 421, 182
1.125  428, 22
1.126 430, 409
1.127  431, 12
1.128 438, 285
1.129 439, 432
1.13  125, 80
1.130 442, 420
1.131  443, 36
1.132 457, 404
1.133 458, 245
1.134 465, 511
1.135 468, 244
1.136 470, 521
1.137  472, 64
1.138  478, 48
1.139 481, 500
1.14  13, 411
1.140 482, 379
1.141 484, 422
1.142 487, 403
1.143 488, 351
1.144  49, 436
1.145 493, 354
1.146 497, 405
1.147 498, 226
1.148  50, 494
1.149 501, 454
1.15 130, 111
1.150 503, 112
1.151 505, 308
1.152  506, 57
1.153 512, 429
1.154 517, 274
1.155 522, 275
1.156 523, 135
1.157 526, 326
1.158 527, 323
1.159 529, 427
1.16 131, 373
1.160  53, 449
1.161   6, 220
1.162  60, 129
1.163  61, 376
1.164  66, 408
1.165  67, 381
1.166  69, 492
1.167   71, 47
1.168  72, 402
1.169  73, 530
1.17  14, 502
1.170  75, 352
1.171  76, 227
1.172  79, 447
1.173  83, 214
1.174  84, 159
1.175   87, 56
1.176  93, 417
1.177  96, 292
1.18 141, 277
1.19 142, 456
1.2 100, 324
1.20 148, 516
1.21 152, 215
1.22 153, 524
1.23 154, 528
1.24 156, 395
1.25 157, 358
1.26 160, 499
1.27 161, 103
1.28 167, 514
1.29 171, 273
1.3 102, 531
1.30 173, 406
1.31 174, 140
1.32 175, 508
1.33  18, 163
1.34  184, 63
1.35 186, 504
1.36 187, 419
1.37  188, 23
1.38  19, 261
1.39 190, 426
1.4 104, 483
1.40 193, 486
1.41 197, 423
1.42   2, 116
1.43 200, 150
1.44 201, 101
1.45 204, 518
1.46 205, 106
1.47 206, 491
1.48  207, 88
1.49 210, 474
1.5 108, 450
1.50 216, 509
1.51 217, 480
1.52 218, 132
1.53 221, 507
1.54 222, 525
1.55 231, 264
1.56 234, 479
1.57 243, 178
1.58  248, 46
1.59   253, 3
1.6  11, 124
1.60 255, 510
1.61 257, 469
1.62 258, 239
1.63 259, 208
1.64  26, 340
1.65 260, 360
1.66 268, 393
1.67 270, 230
1.68 272, 315
1.69 276, 346
1.7 113, 202
1.70 278, 110
1.71 279, 400
1.72  28, 476
1.73  280, 29
1.74  281, 43
1.75 282, 387
1.76 286, 212
1.77 289, 495
1.78 291, 284
1.79 295, 463
1.8 114, 347
1.80 298, 515
1.81 300, 490
1.82 301, 532
1.83 304, 180
1.84 306, 241
1.85 311, 475
1.86 313, 151
1.87  317, 92
1.88 320, 441
1.89 325, 413
1.9 117, 290
1.90 328, 466
1.91  33, 145
1.92 330, 333
1.93 332, 460
1.94 334, 489
1.95 335, 455
1.96 336, 359
1.97 337, 467
1.98 339, 520
1.99 341, 168