11 Data Collection
11.1 Survey sampling
The following package accompanies the textbook Sampling Algorithms by Yves Tillé.
11.1.1 Sampling according to prescribed inclusion probabilities
We assume that the inclusion probabilities (i.e., the probabilities that the individuals in the population are sampled) are all equal, unless otherwise specified. These inclusion probabilities sum to the (expected) sample size.
Each sampling design that follows is in a form of a binary vector with 1 indicating that the individual is to be sampled.
Systematic sampling. This sampling design consists in sampling at a regular interval until the desired sample size is achieved.
[1] 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
[36] 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
[71] 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
Poisson sampling. This sampling scheme amounts to sampling each individual according to its inclusion probability, independently of everything else. This design has the property that the desired sample size is not always achieved. (Rather, the sample size is random, having the Poisson distribution with mean \(n\).)
[1] 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1
[36] 1 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
[71] 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0
Conditional Poisson sampling (CPS). This sampling scheme is Poisson sampling conditioned on the sample being of the desired size. This can be implemented by rejection sampling (repeatedly rejecting a Poisson sample until it has the desired size), but there are faster methods. (CPS is known to maximize the entropy given the inclusion probabilities, therefore the name of the function below.)
[1] 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0
[36] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
[71] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0
11.1.2 Cluster sampling
We use the following dataset on the Swiss municipalities in 2003, where clusters can be naturally defined. In particular, we use the cantons (CT) as clustering variable. (There are 26 cantons in total so that CT takes that many different values.)
data(swissmunicipalities)
cl = cluster(swissmunicipalities, "CT", size = 5, description = TRUE) # we select 5 cantons
Number of selected clusters: 5
Number of units in the population and number of selected units: 2896 198
REG COM Nom HApoly Surfacesbois Surfacescult
1560 6 1025 Ermensee 568 201 336
605 6 1086 Grosswangen 1971 302 1541
1007 6 1066 Schwarzenberg 3925 2109 963
8 6 1061 Luzern 1581 408 183
993 6 1135 Luthern 3779 1795 1529
20 6 1024 Emmen 2033 370 948
23 6 1059 Kriens 2734 1381 808
1182 6 1134 Langnau bei Reiden 860 357 410
817 6 1085 Geuensee 645 97 479
764 6 1023 Ballwil 875 102 670
505 6 1003 Escholzmatt 6141 2718 2414
264 6 1093 Neuenkirch 2548 499 1812
856 6 1037 Rain 939 158 706
2365 6 1035 Mosen 160 24 112
1977 6 1028 Hamikon 466 126 310
766 6 1099 Schenkon 670 115 459
415 6 1008 Schupfheim 3826 1258 1963
1129 6 1123 Altishofen 577 223 293
2040 6 1126 Ebersecken 856 149 675
382 6 1148 Willisau Land 3771 929 2584
229 6 1040 Rothenburg 1557 257 1052
1636 6 1129 Fischbach 805 111 649
532 6 1143 Schotz 1089 195 756
897 6 1004 Fluhli 10819 4096 1428
908 6 1005 Hasle (LU) 4029 1595 1100
2364 6 1105 Wilihof 240 29 195
1786 6 1100 Schlierbach 717 224 452
298 6 1051 Adligenswil 699 170 381
385 6 1140 Reiden 1118 448 528
468 6 1102 Sempach 891 130 615
1519 6 1106 Winikon 757 223 485
1434 6 1122 Altburon 677 201 411
837 6 1009 Werthenstein 1577 508 935
1078 6 1147 Wikon 830 427 327
489 6 1125 Dagmersellen 1402 552 660
543 6 1149 Willisau Stadt 338 210 57
852 6 1067 Udligenswil 623 142 417
960 6 1089 Knutwil 973 196 675
2229 6 1057 Honau 125 24 86
100 6 1054 Ebikon 968 232 372
480 6 1002 Entlebuch 5689 2427 2163
723 6 1032 Hohenrain 1959 343 1511
452 6 1069 Weggis 1180 459 425
1128 6 1053 Dierikon 278 63 167
916 6 1146 Wauwil 295 30 212
914 6 1088 Hildisrieden 705 83 545
849 6 1087 Gunzwil 2324 350 1818
1605 6 1027 Gelfingen 389 97 250
622 6 1095 Oberkirch 910 122 675
244 6 1063 Meggen 728 172 346
1701 6 1144 Uffikon 518 158 314
1675 6 1001 Doppleschwand 695 254 387
1581 6 1041 Schongau 1245 325 877
1986 6 1036 Muswangen 451 108 317
451 6 1065 Root 866 241 442
1517 6 1145 Ufhusen 1224 253 901
386 6 1107 Wolhusen 1430 466 829
1169 6 1127 Egolzwil 416 99 259
1494 6 1131 Grossdietwil 1019 225 739
1469 6 1055 Gisikon 109 22 55
835 6 1082 Buron 541 89 361
83 6 1058 Horw 1284 545 418
619 6 1083 Buttisholz 1675 285 1236
2402 6 1029 Herlisberg 262 37 208
1312 6 1064 Meierskappel 679 170 446
235 6 1062 Malters 2863 701 1849
680 6 1139 Pfaffnau 1763 445 1154
301 6 1052 Buchrain 481 82 237
2419 6 1101 Schwarzenbach 326 103 213
1367 6 1021 Aesch (LU) 462 90 315
1680 6 1142 Roggliswil 621 171 411
709 6 1030 Hitzkirch 358 51 241
226 6 1098 Ruswil 4520 949 3294
755 6 1137 Nebikon 373 149 142
1280 6 1092 Neudorf 1282 430 778
577 6 1136 Menznau 3038 1056 1777
810 6 1150 Zell (LU) 1390 316 943
879 6 1033 Inwil 1031 167 729
690 6 1081 Beromunster 285 26 187
165 6 1031 Hochdorf 963 98 653
1165 6 1039 Romerswil 1396 233 1075
801 6 1097 Rickenbach (LU) 940 246 587
1333 6 1130 Gettnau 608 225 319
562 6 1104 Triengen 849 180 534
2235 6 1138 Ohmstal 445 119 304
2463 6 1090 Kulmerau 362 122 213
1296 6 1068 Vitznau 891 502 197
1846 6 1121 Alberswil 354 65 264
1364 6 1091 Mauensee 716 123 491
161 6 1103 Sursee 586 128 180
617 6 1094 Nottwil 1031 92 803
2521 6 1038 Retschwil 261 64 172
2487 6 1034 Lieli 367 102 253
1205 6 1006 Marbach (LU) 4510 2136 1151
1145 6 1084 Eich 594 115 400
2599 6 1042 Sulz (LU) 385 114 251
862 6 1132 Hergiswil bei Willisau 3135 1150 1819
2084 6 1133 Kottwil 611 105 468
1534 6 1056 Greppen 332 131 143
55 6 1060 Littau 1325 280 695
2103 6 1124 Buchs (LU) 467 98 327
2145 6 1022 Altwis 292 84 185
899 6 1128 Ettiswil 646 112 454
1654 6 1141 Richenthal 723 185 484
551 6 1026 Eschenbach (LU) 1323 191 978
1578 6 1007 Romoos 3732 2303 1144
1627 6 1096 Pfeffikon 248 126 96
1221 3 2702 Bettingen 223 100 74
30 3 2703 Riehen 1086 276 283
3 3 2701 Basel 2391 97 93
56 5 3001 Herisau 2518 678 1428
1956 5 3003 Schonengrund 520 157 320
595 5 3022 Gais 2123 938 943
422 5 3023 Speicher 821 238 450
1298 5 3031 Grub (AR) 425 144 229
895 5 3034 Rehetobel 675 236 362
911 5 3038 Wolfhalden 695 212 404
1297 5 3002 Hundwil 2396 751 960
695 5 3006 Urnasch 4825 1977 1457
1438 5 3036 Wald (AR) 682 195 451
1595 5 3035 Reute (AR) 499 260 199
1087 5 3033 Lutzenberg 226 59 126
743 5 3037 Walzenhausen 698 241 367
1068 5 3005 Stein (AR) 937 207 642
261 5 3024 Teufen (AR) 1525 456 848
1018 5 3004 Schwellbrunn 1740 579 1052
906 5 3007 Waldstatt 674 174 417
951 5 3021 Buhler 561 145 350
846 5 3025 Trogen 1001 411 491
391 5 3032 Heiden 748 229 394
824 5 3105 Schwende 5753 1848 679
1058 5 3102 Gonten 2471 791 1230
1228 5 3104 Schlatt-Haslen 1792 544 1110
873 5 3111 Oberegg 1467 623 727
564 5 3103 Rute 4083 1301 1364
270 5 3101 Appenzell 1686 371 1079
453 2 6457 Marin-Epagnier 335 43 123
1936 2 6482 Montmollin 406 151 162
1572 2 6511 Les Verrieres 2868 1460 645
1016 2 6451 Cornaux 472 112 229
371 2 6455 Le Landeron 1031 454 425
110 2 6436 Le Locle 2314 701 753
2359 2 6433 Brot-Plamboz 1603 459 862
2059 2 6435 La Chaux-du-Milieu 1728 840 487
1201 2 6510 Travers 2465 1062 830
613 2 6505 Couvet 1641 733 401
2070 2 6486 Villiers 1059 577 269
2536 2 6411 Montalchez 641 249 274
2147 2 6501 Les Bayards 1915 981 559
991 2 6475 Dombresson 1277 466 714
1756 2 6509 Saint-Sulpice (NE) 1309 740 262
1839 2 6504 La Cote-aux-Fees 1285 529 451
884 2 6410 Gorgier 1398 788 382
1681 2 6432 La Brevine 4182 1803 1336
1112 2 6437 Les Ponts-de-Martel 1823 556 683
883 2 6403 Bole 258 119 72
2051 2 6485 Valangin 377 189 150
2520 2 6415 Vaumarcus 178 46 110
1402 2 6479 Fontaines (NE) 1014 292 353
16 2 6458 Neuchatel 1810 998 197
1594 2 6471 Boudevilliers 1257 410 621
946 2 6478 Fontainemelon 245 98 61
1339 2 6481 Les Hauts-Geneveys 795 362 131
1543 2 6477 Fenin-Vilars-Saules 648 322 289
411 2 6407 Corcelles-Cormondreche 486 202 163
1719 2 6474 Coffrane 649 169 328
521 2 6459 Saint-Blaise 887 479 240
1398 2 6456 Lignieres 1251 400 683
1468 2 6507 Motiers (NE) 642 322 168
2425 2 6483 Le Paquier (NE) 958 488 356
1687 2 6503 Buttes 1820 1091 339
1213 2 6431 Les Brenets 1153 571 321
433 2 6506 Fleurier 774 408 143
2751 2 6405 Brot-Dessous 495 403 58
354 2 6408 Cortaillod 369 42 195
273 2 6412 Peseux 343 241 10
948 2 6473 Chezard-Saint-Martin 1305 502 621
2244 2 6434 Le Cerneux-Pequignot 1567 634 555
1044 2 6480 Les Geneveys-sur-Coffrane 786 312 220
829 2 6472 Cernier 911 357 233
629 2 6454 Hauterive 212 127 31
454 2 6402 Bevaix 1078 516 398
307 2 6406 Colombier (NE) 452 92 185
2454 2 6409 Fresens 158 13 129
985 2 6401 Auvernier 168 7 82
2792 2 6476 Engollon 262 33 214
1771 2 6460 Thielle-Wavre 209 2 165
1944 2 6508 Noiraigue 638 445 97
2438 2 6422 Les Planchettes 1173 678 262
1326 2 6423 La Sagne 2555 954 1130
2125 2 6502 Boveresse 1290 587 355
1300 2 6413 Rochefort 2089 1302 311
828 2 6452 Cressier (NE) 855 466 247
1410 2 6484 Savagnier 853 377 430
280 2 6404 Boudry 1677 1063 405
12 2 6421 La Chaux-de-Fonds 5566 1572 2221
672 2 6414 Saint-Aubin-Sauges 771 301 280
2289 2 6453 Enges 959 508 353
Alp Airbat Airind P00BMTOT P00BWTOT Pop020 Pop2040 Pop4065 Pop65P
1560 0 14 3 364 382 248 194 235 69
605 0 70 6 1371 1360 844 856 698 333
1007 475 57 5 752 737 441 435 441 172
8 0 524 27 27324 32172 9533 18843 18177 12943
993 279 59 6 814 706 473 405 385 257
20 0 296 116 13315 13570 6621 7924 8608 3732
23 56 241 61 11954 12788 5656 7082 8108 3896
1182 0 39 4 603 603 352 344 368 142
817 0 40 3 966 980 625 567 599 155
764 0 46 7 1095 1017 571 658 684 199
505 654 86 8 1635 1594 969 880 838 542
264 0 98 12 2754 2763 1675 1705 1601 536
856 0 53 3 923 908 588 543 519 181
2365 0 8 1 128 126 80 74 75 25
1977 0 15 0 233 211 147 122 141 34
766 0 53 5 1059 1045 607 674 702 121
415 336 100 14 1981 1916 1142 1106 997 652
1129 0 28 14 657 622 391 446 331 111
2040 0 20 0 217 187 130 123 101 50
382 0 106 25 2074 2031 1172 1210 1142 581
229 0 91 39 3142 3097 1694 1836 2073 636
1636 0 25 0 360 318 196 211 186 85
532 0 47 13 1549 1516 911 966 796 392
897 3419 102 7 935 802 511 499 480 247
908 940 35 2 889 812 506 474 465 256
2364 0 10 0 139 115 97 78 51 28
1786 0 24 1 293 274 157 179 167 64
298 0 83 8 2502 2508 1550 1251 1873 336
385 0 74 16 2018 2072 1113 1145 1225 607
468 0 57 8 1728 1755 1030 1018 1112 323
1519 0 34 2 390 398 253 217 246 72
1434 0 26 3 471 412 261 281 234 107
837 0 46 17 974 918 512 566 560 254
1078 0 41 11 661 681 364 353 414 211
489 0 63 20 1672 1639 895 977 981 458
543 0 36 14 1460 1536 630 906 929 531
852 0 36 3 945 906 554 475 660 162
960 0 47 4 832 751 458 462 486 177
2229 0 5 0 161 159 119 94 99 8
100 0 152 17 5501 5821 2940 3183 3840 1359
480 688 106 12 1717 1649 973 921 901 571
723 0 66 3 1153 1087 816 588 610 226
452 119 107 1 1721 1895 748 999 1169 700
1128 0 17 11 661 618 389 405 387 98
916 0 32 7 863 808 481 492 498 200
914 0 41 1 845 830 480 426 565 204
849 0 83 2 974 883 558 539 556 204
1605 0 19 0 354 355 251 201 198 59
622 0 58 9 1337 1324 733 889 763 276
244 0 147 3 2851 3072 1309 1395 2151 1068
1701 0 22 1 326 308 166 209 188 71
1675 0 18 1 330 323 214 200 171 68
1581 0 25 1 390 339 217 213 193 106
1986 0 12 0 230 209 152 129 120 38
451 0 58 43 1853 1772 928 1211 1066 420
1517 0 21 1 409 380 233 243 210 103
386 0 61 8 2050 2036 1137 1211 1121 617
1169 0 29 5 629 597 385 400 329 112
1494 0 27 1 425 383 267 226 217 98
1469 0 15 3 447 400 228 291 279 49
835 0 42 12 967 928 585 582 538 190
83 0 188 13 6200 6448 2568 3828 4190 2062
619 0 67 11 1341 1339 859 786 739 296
2402 0 8 2 131 108 72 65 78 24
1312 0 39 5 518 501 284 301 338 96
235 0 135 25 3136 2991 1836 1833 1691 767
680 0 80 4 1185 1205 608 628 795 359
301 0 48 18 2526 2450 1444 1504 1587 441
2419 0 9 0 121 112 76 61 63 33
1367 0 34 3 494 456 282 264 299 105
1680 0 18 2 330 318 213 175 152 108
709 0 42 9 1073 1220 704 647 640 302
226 0 144 16 3208 3054 1885 1954 1653 770
755 0 37 18 1091 1050 577 629 655 280
1280 0 31 1 538 520 330 292 324 112
577 23 70 19 1449 1401 872 899 746 333
810 0 41 16 968 1007 566 562 555 292
879 0 53 11 910 873 503 609 496 175
690 0 34 5 1179 1179 691 645 661 361
165 0 96 41 3574 4187 2170 2242 2149 1200
1165 0 47 3 655 574 353 385 353 138
801 0 46 19 1016 1001 650 591 537 239
1333 0 18 13 545 445 289 311 274 116
562 0 61 10 1488 1438 888 855 839 344
2235 0 10 1 168 149 96 102 78 41
2463 0 6 0 107 105 67 52 62 31
1296 90 42 1 518 521 258 264 338 179
1846 0 12 2 269 254 158 168 141 56
1364 0 22 4 464 494 303 304 269 82
161 0 105 54 3809 4250 1908 2513 2509 1129
617 0 86 2 1351 1331 750 914 793 225
2521 0 10 0 99 92 65 61 47 18
2487 0 9 0 112 90 66 48 69 19
1205 891 47 3 617 558 327 310 356 182
1145 0 31 0 632 624 347 354 448 107
2599 0 12 0 80 77 49 39 56 13
862 40 54 2 932 877 535 528 447 299
2084 0 16 0 197 189 130 136 77 43
1534 25 26 1 389 381 230 227 250 63
55 0 139 57 7881 8048 3962 4998 5098 1871
2103 0 8 0 185 189 123 110 87 54
2145 0 12 2 187 167 105 101 110 38
899 0 47 3 869 866 506 538 474 217
1654 0 27 1 341 321 199 204 178 81
551 0 58 10 1500 1466 779 913 872 402
1578 146 30 0 381 351 259 166 185 122
1627 0 17 2 335 351 168 163 201 154
1221 0 27 0 553 598 235 330 385 201
30 0 302 2 9467 10903 4039 4191 7097 5043
3 0 1023 213 78736 87822 28161 50349 53734 34314
56 0 212 36 7875 8007 3804 4487 5049 2542
1956 21 14 0 233 226 141 117 142 59
595 97 70 7 1392 1378 779 651 882 458
422 0 82 5 1906 1947 965 950 1395 543
1298 0 34 2 508 530 303 250 332 153
895 0 48 0 830 912 426 436 540 340
911 0 55 3 865 824 425 419 562 283
1297 447 29 3 542 496 338 264 280 156
695 1141 69 8 1183 1153 653 591 698 394
1438 0 26 0 452 429 273 208 269 131
1595 0 22 0 367 349 186 171 220 139
1087 0 31 0 697 626 353 392 400 178
743 0 60 4 1110 1071 552 547 725 357
1068 0 47 2 674 681 388 349 414 204
261 0 131 2 2685 2850 1243 1373 1848 1071
1018 18 50 3 745 723 456 369 433 210
906 0 43 4 858 852 468 440 531 271
951 0 28 7 785 813 397 438 522 241
846 18 47 0 939 928 550 412 622 283
391 0 79 8 1971 2092 1042 983 1330 708
824 2086 57 1 1000 935 605 522 529 279
1058 295 47 6 717 662 434 334 370 241
1228 49 38 4 587 559 398 290 287 171
873 33 39 4 903 893 495 437 554 310
564 914 76 7 1518 1397 915 804 776 420
270 44 106 14 2720 2727 1451 1509 1574 913
453 0 59 24 1821 1790 838 990 1339 444
1936 65 14 1 239 234 108 150 159 56
1572 678 34 4 389 346 199 195 209 132
1016 0 19 44 730 743 377 422 509 165
371 0 70 0 2038 2189 1048 1166 1442 571
110 480 179 19 5082 5447 2369 2678 3299 2183
2359 232 21 0 134 121 70 66 95 24
2059 354 23 0 199 199 110 95 139 54
1201 441 41 6 589 591 291 289 370 230
613 334 70 14 1286 1417 627 673 849 554
2070 191 14 0 201 193 126 109 131 28
2536 93 11 1 95 89 39 57 54 34
2147 320 21 1 186 168 88 89 118 59
991 39 34 3 794 727 477 469 396 179
1756 230 21 2 289 301 162 172 177 79
1839 255 23 3 264 265 97 128 172 132
884 103 59 3 918 857 410 454 599 312
1681 878 41 0 329 318 178 162 201 106
1112 403 35 3 618 679 350 341 350 256
883 0 45 2 849 929 383 467 609 319
2051 0 10 1 207 193 86 134 115 65
2520 0 12 0 101 90 49 44 73 25
1402 289 23 3 459 453 268 241 314 89
16 7 323 26 15542 17372 6356 11204 9576 5778
1594 148 20 0 353 365 190 212 197 119
946 44 29 1 777 828 388 469 495 253
1339 254 21 0 477 506 221 306 349 107
1543 0 24 0 375 390 176 209 280 100
411 0 86 6 1869 2045 923 1035 1237 719
1719 92 20 4 306 312 184 155 196 83
521 0 49 6 1512 1605 675 869 1035 538
1398 85 33 3 452 462 245 295 283 91
1468 92 27 2 413 436 251 197 274 127
2425 85 14 0 119 112 63 65 77 26
1687 318 19 5 301 343 144 175 187 138
1213 159 38 0 555 609 272 299 392 201
433 84 67 8 1781 1980 945 920 1131 765
2751 8 8 0 46 49 17 26 29 23
354 0 72 20 2145 2228 1024 1293 1449 607
273 0 66 6 2551 2836 1179 1653 1581 974
948 115 41 1 789 812 455 412 542 192
2244 305 15 2 160 153 82 94 90 47
1044 195 31 6 717 692 339 443 454 173
829 258 41 1 933 988 471 496 647 307
629 0 31 2 1293 1344 528 815 909 385
454 13 96 8 1795 1808 902 927 1224 550
307 0 101 10 2393 2504 1184 1270 1724 719
2454 0 5 1 105 113 50 51 54 63
985 0 30 1 758 775 311 376 552 294
2792 0 3 0 40 34 13 25 21 15
1771 0 18 2 297 283 183 173 189 35
1944 35 13 5 239 227 126 121 151 68
2438 172 17 0 118 107 78 53 73 21
1326 402 37 0 476 521 237 254 300 206
2125 293 14 0 190 173 89 89 118 67
1300 383 55 0 529 507 259 296 346 135
828 0 30 59 950 973 506 566 585 266
1410 0 27 4 459 444 263 267 293 80
280 0 93 17 2587 2724 1273 1579 1772 687
12 867 467 56 17535 19481 8319 10551 11258 6888
672 99 56 2 1178 1248 565 620 828 413
2289 77 9 0 143 146 71 72 114 32
H00PTOT H00P01 H00P02 H00P03 H00P04 POPTOT CT ID_unit Prob
1560 249 59 62 30 98 746 3 1560 0.1923077
605 905 217 234 107 347 2731 3 605 0.1923077
1007 511 106 160 72 173 1489 3 1007 0.1923077
8 30586 15452 9280 2777 3077 59496 3 8 0.1923077
993 438 84 101 63 190 1520 3 993 0.1923077
20 11165 3720 3558 1483 2404 26885 3 20 0.1923077
23 10830 3760 3659 1358 2053 24742 3 23 0.1923077
1182 410 113 118 54 125 1206 3 1182 0.1923077
817 708 198 184 103 223 1946 3 817 0.1923077
764 791 197 233 138 223 2112 3 764 0.1923077
505 1133 342 274 138 379 3229 3 505 0.1923077
264 1949 458 590 261 640 5517 3 264 0.1923077
856 637 141 179 97 220 1831 3 856 0.1923077
2365 89 20 26 11 32 254 3 2365 0.1923077
1977 148 26 48 25 49 444 3 1977 0.1923077
766 766 160 236 133 237 2104 3 766 0.1923077
415 1297 329 355 172 441 3897 3 415 0.1923077
1129 461 131 116 57 157 1279 3 1129 0.1923077
2040 122 27 24 20 51 404 3 2040 0.1923077
382 1347 298 393 192 464 4105 3 382 0.1923077
229 2330 555 734 352 689 6239 3 229 0.1923077
1636 223 37 72 35 79 678 3 1636 0.1923077
532 1130 320 306 149 355 3065 3 532 0.1923077
897 595 177 163 67 188 1737 3 897 0.1923077
908 583 132 166 84 201 1701 3 908 0.1923077
2364 77 15 12 17 33 254 3 2364 0.1923077
1786 183 34 51 24 74 567 3 1786 0.1923077
298 1783 358 533 275 617 5010 3 298 0.1923077
385 1517 436 433 217 431 4090 3 385 0.1923077
468 1237 285 361 179 412 3483 3 468 0.1923077
1519 265 53 79 34 99 788 3 1519 0.1923077
1434 301 53 98 42 108 883 3 1434 0.1923077
837 697 188 214 88 207 1892 3 837 0.1923077
1078 514 159 152 73 130 1342 3 1078 0.1923077
489 1205 328 347 165 365 3311 3 489 0.1923077
543 1272 460 393 167 252 2996 3 543 0.1923077
852 663 135 208 110 210 1851 3 852 0.1923077
960 564 129 167 88 180 1583 3 960 0.1923077
2229 104 19 28 12 45 320 3 2229 0.1923077
100 4476 1337 1393 625 1121 11322 3 100 0.1923077
480 1157 326 325 131 375 3366 3 480 0.1923077
723 671 140 170 84 277 2240 3 723 0.1923077
452 1412 458 562 155 237 3616 3 452 0.1923077
1128 482 125 140 80 137 1279 3 1128 0.1923077
916 588 144 164 94 186 1671 3 916 0.1923077
914 594 120 195 94 185 1675 3 914 0.1923077
849 615 129 170 92 224 1857 3 849 0.1923077
1605 233 53 59 41 80 709 3 1605 0.1923077
622 991 281 283 140 287 2661 3 622 0.1923077
244 2479 752 895 333 499 5923 3 244 0.1923077
1701 237 58 68 46 65 634 3 1701 0.1923077
1675 217 47 56 25 89 653 3 1675 0.1923077
1581 242 39 82 39 82 729 3 1581 0.1923077
1986 148 31 42 17 58 439 3 1986 0.1923077
451 1432 427 452 198 355 3625 3 451 0.1923077
1517 267 67 67 37 96 789 3 1517 0.1923077
386 1440 375 417 206 442 4086 3 386 0.1923077
1169 442 119 121 53 149 1226 3 1169 0.1923077
1494 262 59 73 30 100 808 3 1494 0.1923077
1469 323 74 108 53 88 847 3 1469 0.1923077
835 672 169 165 117 221 1895 3 835 0.1923077
83 5426 1951 1888 652 935 12648 3 83 0.1923077
619 896 198 241 119 338 2680 3 619 0.1923077
2402 81 17 22 16 26 239 3 2402 0.1923077
1312 385 101 118 53 113 1019 3 1312 0.1923077
235 2197 615 602 300 680 6127 3 235 0.1923077
680 780 205 216 119 240 2390 3 680 0.1923077
301 1893 469 605 264 555 4976 3 301 0.1923077
2419 78 18 22 9 29 233 3 2419 0.1923077
1367 359 96 117 48 98 950 3 1367 0.1923077
1680 214 43 63 25 83 648 3 1680 0.1923077
709 780 198 218 129 235 2293 3 709 0.1923077
226 2136 529 566 293 748 6262 3 226 0.1923077
755 800 208 238 120 234 2141 3 755 0.1923077
1280 348 75 87 54 132 1058 3 1280 0.1923077
577 929 215 243 126 345 2850 3 577 0.1923077
810 655 163 163 85 244 1975 3 810 0.1923077
879 662 163 225 77 197 1783 3 879 0.1923077
690 849 253 230 115 251 2358 3 690 0.1923077
165 2774 815 812 351 796 7761 3 165 0.1923077
1165 414 77 121 72 144 1229 3 1165 0.1923077
801 682 144 188 101 249 2017 3 801 0.1923077
1333 330 79 75 51 125 990 3 1333 0.1923077
562 1028 262 289 137 340 2926 3 562 0.1923077
2235 102 16 33 16 37 317 3 2235 0.1923077
2463 75 20 17 15 23 212 3 2463 0.1923077
1296 433 149 139 51 94 1039 3 1296 0.1923077
1846 186 43 57 23 63 523 3 1846 0.1923077
1364 325 69 88 51 117 958 3 1364 0.1923077
161 3288 1135 1002 424 727 8059 3 161 0.1923077
617 985 279 278 140 288 2682 3 617 0.1923077
2521 60 13 12 11 24 191 3 2521 0.1923077
2487 65 15 16 10 24 202 3 2487 0.1923077
1205 444 134 121 59 130 1175 3 1205 0.1923077
1145 447 76 162 72 137 1256 3 1145 0.1923077
2599 45 6 14 4 21 157 3 2599 0.1923077
862 550 102 130 92 226 1809 3 862 0.1923077
2084 128 27 37 19 45 386 3 2084 0.1923077
1534 293 82 88 38 85 770 3 1534 0.1923077
55 6636 2218 2002 905 1511 15929 3 55 0.1923077
2103 123 26 35 14 48 374 3 2103 0.1923077
2145 132 41 34 18 39 354 3 2145 0.1923077
899 612 160 160 84 208 1735 3 899 0.1923077
1654 227 52 56 39 80 662 3 1654 0.1923077
551 1070 274 331 140 325 2966 3 551 0.1923077
1578 218 45 54 28 91 732 3 1578 0.1923077
1627 289 95 97 38 59 686 3 1627 0.1923077
1221 427 125 154 60 88 1151 12 1221 0.1923077
30 9201 3248 3438 1070 1445 20370 12 30 0.1923077
3 86371 44469 24838 7890 9174 166558 12 3 0.1923077
56 6784 2540 2079 781 1384 15882 15 56 0.1923077
1956 165 43 51 22 49 459 15 1956 0.1923077
595 1097 347 354 123 273 2770 15 595 0.1923077
422 1624 521 543 206 354 3853 15 422 0.1923077
1298 384 105 122 60 97 1038 15 1298 0.1923077
895 652 185 252 69 146 1742 15 895 0.1923077
911 694 214 234 100 146 1689 15 911 0.1923077
1297 326 83 98 30 115 1038 15 1297 0.1923077
695 903 314 260 108 221 2336 15 695 0.1923077
1438 327 100 85 39 103 881 15 1438 0.1923077
1595 277 93 84 33 67 716 15 1595 0.1923077
1087 476 131 158 71 116 1323 15 1087 0.1923077
743 875 314 259 109 193 2181 15 743 0.1923077
1068 509 138 162 60 149 1355 15 1068 0.1923077
261 2354 810 801 276 467 5535 15 261 0.1923077
1018 524 149 150 64 161 1468 15 1018 0.1923077
906 636 177 195 81 183 1710 15 906 0.1923077
951 639 189 221 88 141 1598 15 951 0.1923077
846 657 196 196 85 180 1867 15 846 0.1923077
391 1669 563 525 217 364 4063 15 391 0.1923077
824 686 175 185 93 233 1935 16 824 0.1923077
1058 447 124 115 52 156 1379 16 1058 0.1923077
1228 356 77 89 38 152 1146 16 1228 0.1923077
873 680 173 235 85 187 1796 16 873 0.1923077
564 1014 254 264 144 352 2915 16 564 0.1923077
270 2107 693 611 259 544 5447 16 270 0.1923077
453 1497 470 465 246 316 3611 24 453 0.1923077
1936 201 60 67 35 39 473 24 1936 0.1923077
1572 294 117 85 31 61 735 24 1572 0.1923077
1016 603 179 187 90 147 1473 24 1016 0.1923077
371 1800 602 582 225 391 4227 24 371 0.1923077
110 4713 1787 1568 565 793 10529 24 110 0.1923077
2359 94 23 30 17 24 255 24 2359 0.1923077
2059 144 34 42 26 42 398 24 2059 0.1923077
1201 493 148 174 75 96 1180 24 1201 0.1923077
613 1187 422 380 156 229 2703 24 613 0.1923077
2070 135 26 37 23 49 394 24 2070 0.1923077
2536 72 17 27 12 16 184 24 2536 0.1923077
2147 141 49 42 17 33 354 24 2147 0.1923077
991 567 164 162 85 156 1521 24 991 0.1923077
1756 238 73 76 29 60 590 24 1756 0.1923077
1839 219 85 74 27 33 529 24 1839 0.1923077
884 716 198 272 100 146 1775 24 884 0.1923077
1681 249 66 88 30 65 647 24 1681 0.1923077
1112 497 155 173 50 119 1297 24 1112 0.1923077
883 756 231 266 109 150 1778 24 883 0.1923077
2051 167 55 52 23 37 400 24 2051 0.1923077
2520 77 21 28 10 18 191 24 2520 0.1923077
1402 331 76 88 62 105 912 24 1402 0.1923077
16 15937 7348 4763 1817 2009 32914 24 16 0.1923077
1594 252 64 82 48 58 718 24 1594 0.1923077
946 681 218 236 82 145 1605 24 946 0.1923077
1339 384 125 125 57 77 983 24 1339 0.1923077
1543 283 61 114 38 70 765 24 1543 0.1923077
411 1688 565 579 217 327 3914 24 411 0.1923077
1719 238 71 69 26 72 618 24 1719 0.1923077
521 1337 405 485 198 249 3117 24 521 0.1923077
1398 340 85 109 49 97 914 24 1398 0.1923077
1468 353 117 110 37 89 849 24 1468 0.1923077
2425 83 20 28 10 25 231 24 2425 0.1923077
1687 265 103 77 26 59 644 24 1687 0.1923077
1213 482 142 177 63 100 1164 24 1213 0.1923077
433 1609 597 492 210 310 3761 24 433 0.1923077
2751 47 20 15 7 5 95 24 2751 0.1923077
354 1851 599 617 241 394 4373 24 354 0.1923077
273 2528 1050 764 304 410 5387 24 273 0.1923077
948 608 151 208 91 158 1601 24 948 0.1923077
2244 115 21 43 17 34 313 24 2244 0.1923077
1044 558 163 175 87 133 1409 24 1044 0.1923077
829 859 310 283 98 168 1921 24 829 0.1923077
629 1227 475 381 169 202 2637 24 629 0.1923077
454 1443 401 484 217 341 3603 24 454 0.1923077
307 2032 639 660 295 438 4897 24 307 0.1923077
2454 62 10 23 11 18 218 24 2454 0.1923077
985 680 242 237 75 126 1533 24 985 0.1923077
2792 31 11 10 6 4 74 24 2792 0.1923077
1771 187 26 53 41 67 580 24 1771 0.1923077
1944 193 64 61 28 40 466 24 1944 0.1923077
2438 72 16 22 10 24 225 24 2438 0.1923077
1326 374 108 124 42 100 997 24 1326 0.1923077
2125 148 40 51 22 35 363 24 2125 0.1923077
1300 393 111 134 59 89 1036 24 1300 0.1923077
828 699 178 201 114 206 1923 24 828 0.1923077
1410 338 80 108 50 100 903 24 1410 0.1923077
280 2156 711 682 303 460 5311 24 280 0.1923077
12 17207 7087 5293 2053 2774 37016 24 12 0.1923077
672 1008 336 332 125 215 2426 24 672 0.1923077
2289 110 21 48 11 30 289 24 2289 0.1923077
###Stratified sampling We use the same dataset and this time use the region (REG) as the stratification variable. (The function, as implemented, requires the ordering of the dataset according to the values of the stratification variable. This is an implementation flaw.)
Data = swissmunicipalities[order(swissmunicipalities$REG), ]
st = strata(Data, "REG", size = rep(20, 7), description = TRUE) # we select 20 municipalities in each of the 7 regions
Stratum 1
Population total and number of selected units: 589 20
Stratum 2
Population total and number of selected units: 913 20
Stratum 3
Population total and number of selected units: 321 20
Stratum 4
Population total and number of selected units: 171 20
Stratum 5
Population total and number of selected units: 471 20
Stratum 6
Population total and number of selected units: 186 20
Stratum 7
Population total and number of selected units: 245 20
Number of strata 7
Total number of selected units 140
11.2 Experimental design
Experimental designs were perhaps first developed in the context of agricultural experiments. So it is fitting that use the following package, which gathers a very large number of statistical tools for agricultural research.
The “plots” below stand for the experimental units. (These would be individuals in clinical trials.) The “book” is short for “field book”, and specifies the treatment assignments. Completely randomized design
treat = c("A", "B", "C") # treatments
r = c(13, 13, 14) # number of replicates per treatment
outdesign = design.crd(treat, r)
book = outdesign$book # field book (treatment assignments)
subset(book, select = c("plots", "treat"))
plots treat
1 101 C
2 102 C
3 103 B
4 104 C
5 105 B
6 106 B
7 107 B
8 108 B
9 109 C
10 110 A
11 111 C
12 112 C
13 113 B
14 114 A
15 115 A
16 116 A
17 117 A
18 118 C
19 119 C
20 120 B
21 121 A
22 122 A
23 123 C
24 124 A
25 125 B
26 126 C
27 127 A
28 128 A
29 129 C
30 130 A
31 131 B
32 132 A
33 133 B
34 134 A
35 135 B
36 136 C
37 137 C
38 138 B
39 139 C
40 140 B
Randomized complete block design
treat = c("A", "B", "C", "D", "E")
r = 4 # number of blocks
outdesign = design.rcbd(treat, r)
book = outdesign$book
book
plots block treat
1 101 1 A
2 102 1 E
3 103 1 C
4 104 1 B
5 105 1 D
6 201 2 C
7 202 2 E
8 203 2 B
9 204 2 A
10 205 2 D
11 301 3 C
12 302 3 B
13 303 3 E
14 304 3 D
15 305 3 A
16 401 4 A
17 402 4 E
18 403 4 D
19 404 4 B
20 405 4 C
Matched-pairs design (which is a special case, with blocks of size 2 representing the pairs)
treat = c("A", "B") # two treatments
r = 10 # number of pairs
outdesign = design.rcbd(treat, r)
book = outdesign$book
names(book) = c("subject", "pair", "treat")
book
subject pair treat
1 101 1 B
2 102 1 A
3 201 2 B
4 202 2 A
5 301 3 B
6 302 3 A
7 401 4 A
8 402 4 B
9 501 5 A
10 502 5 B
11 601 6 A
12 602 6 B
13 701 7 B
14 702 7 A
15 801 8 B
16 802 8 A
17 901 9 B
18 902 9 A
19 1001 10 B
20 1002 10 A
Balanced incomplete block design (The number of blocks is determined automatically so that the design is balanced.)
treat = c("A", "B", "C", "D", "E" )
k = 4 # number of units per block
outdesign = design.bib(treat, k)
Parameters BIB
==============
Lambda : 3
treatmeans : 5
Block size : 4
Blocks : 5
Replication: 4
Efficiency factor 0.9375
<<< Book >>>
plots block treat
1 101 1 E
2 102 1 C
3 103 1 A
4 104 1 D
5 201 2 B
6 202 2 C
7 203 2 E
8 204 2 D
9 301 3 D
10 302 3 B
11 303 3 C
12 304 3 A
13 401 4 C
14 402 4 B
15 403 4 E
16 404 4 A
17 501 5 A
18 502 5 D
19 503 5 B
20 504 5 E
Split plot design (When the number of replicates is set to \(r = 1\), the number of whole plots is )
treat1 = c("A", "B", "C", "D") # treatment at the whole plot level
treat2 = c("a", "b", "c") # treatment at the subplot level
outdesign = design.split(treat1, treat2, r = 1)
book = outdesign$book
# There seems to be a bug in that even when r = 1 there appear to be 3 replicates. To remedy the situation, we simply extract one replication.
number_splots = length(treat1) * length(treat2)
book[1:number_splots, -3]
plots splots treat1 treat2
1 101 1 A a
2 101 2 A b
3 101 3 A c
4 102 1 B c
5 102 2 B a
6 102 3 B b
7 103 1 D c
8 103 2 D a
9 103 3 D b
10 104 1 C c
11 104 2 C a
12 104 3 C b
Latin square design This design can be used for simple repeated measures or a crossover designs. (There are packages dedicated to crossover designs.)
treat = c("A", "B", "C", "D")
outdesign = design.lsd(treat)
book = outdesign$book[, -1]
names(book) = c("subject", "period", "treat") # for a crossover design
book
subject period treat
1 1 1 A
2 1 2 D
3 1 3 B
4 1 4 C
5 2 1 C
6 2 2 B
7 2 3 D
8 2 4 A
9 3 1 D
10 3 2 C
11 3 3 A
12 3 4 B
13 4 1 B
14 4 2 A
15 4 3 C
16 4 4 D
# or in matrix form
book_matrix = matrix(book$treat, byrow = TRUE, 4, 4)
rownames(book_matrix) = paste("subject", 1:4)
colnames(book_matrix) = paste("period", 1:4)
prmatrix(book_matrix, quote = FALSE)
period 1 period 2 period 3 period 4
subject 1 A D B C
subject 2 C B D A
subject 3 D C A B
subject 4 B A C D
11.3 Observational studies
We simply look at how to perform a matching in R. Such a matching can be used in a matched-pairs experimental design, or in the analysis of an observational study.
We look at the following dataset on women of Pima Indian heritage. (The Pima Indians are known to have among the highest rates of type 2 diabetes in the world.) We merge the training and test sets into a single dataset (with a total of 532 observations).
In a case-control framework, we might want to understand what distinguishes women with diabetes (cases) and women without (controls).
We match one-to-one according to age. (This is a particularly simple problem as we are matching on a single, one-dimensional variable.) We built the matrix of differences in age between cases and controls.
D = abs(outer(Pima$age[cases], Pima$age[controls], "-"))
rownames(D) = cases
colnames(D) = controls
M = pairmatch(D)
print(M, grouped = TRUE) # each row corresponds to a pair (case, control) given in the 2nd column
Group Members
1.1 10, 15
1.10 118, 496
1.100 344, 312
1.101 345, 471
1.102 35, 305
1.103 355, 448
1.104 356, 169
1.105 357, 95
1.106 362, 446
1.107 371, 366
1.108 372, 179
1.109 377, 440
1.11 120, 251
1.110 378, 296
1.111 380, 444
1.112 384, 519
1.113 385, 461
1.114 386, 425
1.115 390, 192
1.116 391, 176
1.117 392, 185
1.118 396, 133
1.119 398, 433
1.12 123, 82
1.120 399, 513
1.121 41, 9
1.122 412, 266
1.123 415, 459
1.124 421, 182
1.125 428, 22
1.126 430, 409
1.127 431, 12
1.128 438, 285
1.129 439, 432
1.13 125, 80
1.130 442, 420
1.131 443, 36
1.132 457, 404
1.133 458, 245
1.134 465, 511
1.135 468, 244
1.136 470, 521
1.137 472, 64
1.138 478, 48
1.139 481, 500
1.14 13, 411
1.140 482, 379
1.141 484, 422
1.142 487, 403
1.143 488, 351
1.144 49, 436
1.145 493, 354
1.146 497, 405
1.147 498, 226
1.148 50, 494
1.149 501, 454
1.15 130, 111
1.150 503, 112
1.151 505, 308
1.152 506, 57
1.153 512, 429
1.154 517, 274
1.155 522, 275
1.156 523, 135
1.157 526, 326
1.158 527, 323
1.159 529, 427
1.16 131, 373
1.160 53, 449
1.161 6, 220
1.162 60, 129
1.163 61, 376
1.164 66, 408
1.165 67, 381
1.166 69, 492
1.167 71, 47
1.168 72, 402
1.169 73, 530
1.17 14, 502
1.170 75, 352
1.171 76, 227
1.172 79, 447
1.173 83, 214
1.174 84, 159
1.175 87, 56
1.176 93, 417
1.177 96, 292
1.18 141, 277
1.19 142, 456
1.2 100, 324
1.20 148, 516
1.21 152, 215
1.22 153, 524
1.23 154, 528
1.24 156, 395
1.25 157, 358
1.26 160, 499
1.27 161, 103
1.28 167, 514
1.29 171, 273
1.3 102, 531
1.30 173, 406
1.31 174, 140
1.32 175, 508
1.33 18, 163
1.34 184, 63
1.35 186, 504
1.36 187, 419
1.37 188, 23
1.38 19, 261
1.39 190, 426
1.4 104, 483
1.40 193, 486
1.41 197, 423
1.42 2, 116
1.43 200, 150
1.44 201, 101
1.45 204, 518
1.46 205, 106
1.47 206, 491
1.48 207, 88
1.49 210, 474
1.5 108, 450
1.50 216, 509
1.51 217, 480
1.52 218, 132
1.53 221, 507
1.54 222, 525
1.55 231, 264
1.56 234, 479
1.57 243, 178
1.58 248, 46
1.59 253, 3
1.6 11, 124
1.60 255, 510
1.61 257, 469
1.62 258, 239
1.63 259, 208
1.64 26, 340
1.65 260, 360
1.66 268, 393
1.67 270, 230
1.68 272, 315
1.69 276, 346
1.7 113, 202
1.70 278, 110
1.71 279, 400
1.72 28, 476
1.73 280, 29
1.74 281, 43
1.75 282, 387
1.76 286, 212
1.77 289, 495
1.78 291, 284
1.79 295, 463
1.8 114, 347
1.80 298, 515
1.81 300, 490
1.82 301, 532
1.83 304, 180
1.84 306, 241
1.85 311, 475
1.86 313, 151
1.87 317, 92
1.88 320, 441
1.89 325, 413
1.9 117, 290
1.90 328, 466
1.91 33, 145
1.92 330, 333
1.93 332, 460
1.94 334, 489
1.95 335, 455
1.96 336, 359
1.97 337, 467
1.98 339, 520
1.99 341, 168