********** Announcements ************
· Please read along the examples in Survival_Notes.pdf, to enhance understanding of the theories we talk about in the lectures.
· There is a part-time statistician’s position open in the School of Medicine for 1st year MS students who are completing the 281, 282 sequences.
· There are also unpaid summer intern opportunities.
· Final papers are listed below.
Overview: we will continue from the previous quarter(s) and discuss the large sample property of the likelihood-based tests, then we’ll talk about common nonparametric rank based tests, and the rest of the course (about 7 weeks at least) will be on semi-parametric survival analysis methods. The topics covered below by the week are from previous time I taught this course, which will remain similar but be revised as the course progresses along.
Important Note: You are strongly encouraged to attend lectures and take notes. Since there is no TA or lab, you are also strongly encouraged to take advantage of the office hours to discuss any questions/problems that you have - Note that you can make appointments for office hours!
Lecture: MWF1:00-1:50pm, AP&M 5402
Instructor: Ronghui (Lily) Xu
Office: APM 5856
M: 2:00pm - 3:00pm
Or, by appointment
Reference books: (first 4 on reserve at S&E library)
1. Cox and Oakes, Analysis of Survival Data, Chapman & Hall,1984
2. Kalbfleisch and Prentice, The Statistical Analysis of Failure Time Data (2nd ed), Wiley, 2002
3. Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991
4. Kendall and Gibbons, Rank Correlation Methods, Oxford UniversityPress, 1990 (5th edition)
5. O'Quigley, Proportional Hazards Regression, Springer, 2008
6. Lehmann and Romano, Testing Statistical Hypothesis, Springer, 2005
1. Xu R, Li X. Comparison of parametric versus permutation methods with applications to microarray gene expression data. Bioinformatics. 2003; 19(10): 1284-1289. [pdf]
2. Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika, 1982; 69(3): 553-566.
3. Gill R. Understanding Cox’s regression model: a martingale approach. J Amer Stat Assoc (JASA). 1984; 79: 441-447.
4. Murphy SA, van der Vaart AW. On profile likelihood (with discussion). JASA. 2000; 95: 449-485.
5. O’Quigley J, Xu R, Stare J. Explained randomness in proportional hazards models. Statistics in Medicine, 2005; 24: 479-489. [pdf]
6. Vaida F, Xu R. Proprotional hazards model with random effects. Statistics in Medicine, 2000; 19: 3309-3324. [pdf]
7. Tsai, Jewell and Wang, A note on the product-limit estimator under right censoring and left truncation. Biometrika, 1987; 74: 883-6.
8. Struthers and Farewell. A mixture model for time to AIDS data with left truncation and an uncertain origin. Biometrika, 1989; 76: 814-7.
9. Tsiatis A A. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Science USA, 1975; 72: 20-22.
10. Andersen PK and Gill RD. Cox’s regression model for counting processes: a large sample theory. The Annals of Statistics, 1982; 10: 1100-1120.
11. Johansen S. An extension of Cox’s regression model. International Statistical Review, 1983; 51: 165-174.
12. Laird NM. Nonparametric maximum likelihood estimation of a mixing distribution. JASA, 1978; 73: 805-811.
13. Struthers CA, Kalbfleisch JD. Misspecified proportional hazards models. Biometrika, 1986; 73: 363-369.
14. Lagakos SW, Schoenfeld DA. Properties of proportional-hazards score tests under misspecified regression models. Biometrics, 1984; 40: 1037-1048.
15. Xu R, O’Quigley J. Estimating average regression effect under non-proportional hazards. Biostatistics, 2000; 1: 423-439.
16. Kent J. Information gain and a general measure of correlation. Biometrika, 1983; 70: 163-173.
17. Prentice RL. On non-parametric maximum likelihood estimation of the bivariate survivor function. Statistics in Medicine, 1999; 18: 2517-2527.
18. Wei LJ, Lin DY, Weissfeld L. Failure time data by modeling marginal distributions. JASA 1989; 84: 1065-1073.
19. Gamst A, Donohue M, Xu R. Asymptotic properties and empirical evaluation of the NPMLE in the proportional hazards mixed-effects model. Statistica Sinica, 2009; 19: 997-1011.
Week 1: Review of the Neyman-Pearson paradigm/lemma, then likelihood-based tests (Wald, score, likelihood ratio); contiguous alternatives.
Week 2: Wilcoxon rank tests, permutation test; no UMP rank tests; asymptotic relative efficiency
Week 3: Right-censored data, left truncation, likelihoods
Week 4: Kaplan-Meier estimate of survival; (weighted) Log-rank test; counting process.
Week 5: G^rho family and their asymptotic relative efficiency; proportional hazards regression
Week 6: partial likelihood and martingale theory; predicting survival; case study
Week 7: stratified Cox model; goodness-of-fit; misspecified PH model
Week 8: estimate beta(t); R-squared measures for Cox model; design of a survival study
Week 9: AFT model; multivariate survival
Week 10: proportional hazards mixed model
Homework: (will be collected 3 times during the quarter)
* Due Monday of week 4, Apr 18:
Mar 30: In your own notation and words describe the hypothesis testing problem for r-by-c contingency tables: what the data are like (you can give an example), what the hypotheses are, what the test statistics is, and derive an approximate distribution of the test statistic under the null hypothesis, and finally, give the rejection region of a level-alpha test. Feel free to consult a textbook if you need.
Apr 4: for the one-parameter Weibull distribution problem 3.21 on page 503 of "Theory of Point Estimation" (chapter 6), derive the Wald, score and likelihood ratio test for testing theta=1 (i.e. standard exponential distribution) versus otherwise.
Apr 6: Mimic the simulation in Table 1 of reference paper #1 above, generate two samples of size 5 each, under N(0,1) and N(1,1), carry out 100 simulations to compare the power of t-test and permuation t-test. Repeat the above under Exp(1) and Exp(0.5). Comment on anything that you have noticed. You need to turn in your code as well as the results.
* Due Monday of week 7, May 9:
Apr 20: Verify that when there is no censoring, the Kaplan-Meier (KM) estimate is the same as the empirical survival function. Is the KM estimator unbiased, why or why not? Also compare the Greenwood’s variance formula with the variance of the empirical survival function. Finally, try to describe the relationship between comparing the KM estimates for two groups and the log-rank test.
Apr 25: Fill out the math details between Theorems 2.2 and 2.3 of the Harrington and Fleming paper, as outlined in the lecture.
May 2: For sample size of 50 in each of two groups, simulate event times that follow Exp(1) and Exp(0.5), respectively. For both groups also simulate censoring times that follow Exp(0.25). Do the following: 1) Calculate the probability of censoring in each group, compare that to the observed percentage of censoring in your simulated data; 2) Which weighted log-rank test should you use to test the equality of the two group event time distributions, and why? Carry out your chosen weighted log-rank test; 3) Fit a Cox proportional hazards model to your data, explain the output and compare with the results from part 2). As before, you need to turn in your code as well as the results.
NOTE: please take a moment and look back at what we have talked about so far this quarter, let me know which materials are clear to you, which are not clear but you would like to understand better. Also please let me know on average how many hours a week you spend on this course outside of the lectures.
* Due Wednesday of week 10, June 1:
May 13: Consider linear regression with normal error and two covariates X1 and X2. Suppose that X1 is the main predictor of interest, so you may or may not include X2 in the model. Discuss how this affects the estimation of the effect of X1, i.e. its regression coefficient beta1, in terms of the bias and variance of beta1^hat. Hint: you might consider whether X1 and X2 are independent or not, you might also center X1 and X2 so that they have mean 0; feel free to consult any textbook or literature, but you need to justify all your arguments.
May 20: Consider omitting covariates in the Cox, and do the following simulation (100 runs, each of sample size n=200, and as always please attach your codes):
1) First generate two independent covariates Z1 and Z2, where Z1 is binary 0 or 1 with probability 0.5 each, Z2 is uniform (-2, 2). Let beta1=beta2=1, and let the baseline hazard function \lambda_0(t)=1 constant, so you can generate T. For each run, fit the Cox model with both Z1 and Z2, and fit the Cox model with only Z1. Compare the estimates of beta1 under both models, in terms of bias, variance, and the coverage probability of the nominal 95% confidence interval.
2) Then repeat the above, but making the distribution of Z2 dependent on Z1.
3) Finally bonus: can you generate 30% censoring in your data, and repeat the above?
Papers to present for final projects: (see email announcement please)
1. [taken!] Ford et al. Model inconsistency illustrated by the Cox proportional hazards model. Statistics in Medicine, 1995; vol.14, p. 735-746.
2. [taken!] Xu and O’Quigley. Proportional hazards estimate of the conditional survival function. Journal of the Royal Statistical Society, Series B, 2000; vol.62, p. 667-680.
3. [taken!] Gray R. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA, 1992; 87: 942-951.
4. [taken!] Xu R, Adak S. Survival analysis with time-varying regression effects using a tree-based approach. Biometrics, 2002; 58: 305-315.
5. Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. International Stat Review, 1991; 58: 227-240.
6. [taken!] Cox DR, McCullagh P. Some aspects of analysis of covariance. Biometrics, 1982; 38: 541-561.
7. [taken!] Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 1998; 17: 1623-1634.
Papers 2 (section 4, simulation), 5[taken!], 15[taken!], 16[taken!], 17[taken!] in the reference list above.
Grading: 60% Homework
+40% Final project/presentation