********** Announcements ************
· Open source statistical programming language and environment R will be used from time to time during the course, including in homework assignments where the codes will be provided (by TA’s). It is available in the UCSD computing labs and “virtual lab”, http://acms.ucsd.edu/students/govirtual/index.html (you’ll need to register first).
· For each week at the end of Friday’s lecture, students enrolled in different sections A01-A08 can take turns to turn in a slip summarizing: 1) what you have learned this week; 2) what you don’t understand. So week 1 for those in A01, week 2 for A02, etc., and we will skip the midterm week (so A06 will turn in theirs on Friday of week 7, A07 for week8, and A08 for week 9). This gives each student a chance to earn 3 bonus points (out of 100) for your total score in the class.
· Electronic files from the lectures are posted below; homework solutions will be posted on piazza (please enroll using your ucsd email address).
· Midterm is set on Wed, May 6, in class. Scientific calculators are allowed, but not graphing calculators. See also the push back of feedback slips above for after the midterm. One ‘cheat’ sheet is allowed. Materials covered will be from the beginning to Section 4.5 (included). A review sheet is posted under the e-files link below, which we will go over on Monday.
· Textbook is available at the library on reserve.
· We are in the 2nd half of the course, which really deals with statistics (starting chapter 7, as opposed to probability, chapter 3-6), and has a distinct flavor from before. Also we don't follow the book closely. You should try to attend lectures, or ask and borrow notes if you have to miss. Otherwise you can easily fall behind, and risk a poor grade.
· There is a glossary pdf file under the e-files link below.
· You are encouraged to fill out the CAPE evaluation, which should provide useful feedbacks for future iterations of this course. If we get a response rate of 75%, we will drop one lowest score on the homework; if we get a response rate of 85%, we will drop two lowest scores on the homework. (We won’t have a fixed distribution of grades A, B, C etc, so such a scheme should not be disadvantageous to anyone.) (The final response rate is just above 85% - congrats, you made the 2nd cut!!)
· Final review is posted under the e-files link below. One ‘cheat’ sheet is allowed during the exam.
Overview: we will follow the textbook (see below) from Chapter 1, 2, etc., hoping to get to Chapter 9 Regression by the end of the quarter. As such we will not cover every detail in these chapters. Students are expected to attend lectures (which will also be podcast to enrolled students), and the homework assignments should generally reflect what is being taught. See also progress of the course below that is constantly updated.
Progress of the course: chapters and sections we have covered so far (or will be covering soon) –
Chapter 2: 2.1-2.3;
Chapter 3: 3.1-3.4, 3.5 (selected material), 3.6 – 3.8;
Chapter 4: 4.1 – 4.3.1, 4.4 – 4.7;
Chapter 5: 5.1, 5.2, 5.4 – 5.6;
Chapter 6: 6.1 – 6.3;
Chapter 7: in place of 7.2 we will discuss a more intuitive estimation method that is based on the sample mean, 7.3 – 7.3.1,
7.5, in place of 7.6 we will discuss a general CI based on the CLT [see also week 8 problem 2) below];
Chapter 8: 8.1 – 8.4.2;
Chapter 9: 9.1 – 9.4.1.
Ross, "Introduction to Probability and Statistics for Engineers and Scientists" (5th Edition)
There is an online version of the 4th edition through our library (http://libraries.ucsd.edu), but please be aware that the 5th edition is the official version we refer to during the course (including section numbers, homework assignments, etc.).
“OpenIntro Statistics” (2nd Edition) by Diez et al.
Files from lectures (slides, R codes which are *.R files, etc. through this link)
About polio: “Belonging to a lower social class usually means having a large family and living under less favourable conditions, resulting in early exposure and reduced risk of being infected with severe polio, and poliomyelitis is therefore believed to be a middle and upper class phenomenon.” – Nielson et al., International Journal of Epidemiology 2002; vol.31(1): 181-186. doi: 10.1093/ije/31.1.181
Grading: 30% Homework + 30% Midterm + 40% Final
Or 40% Homework + 20% Midterm + 40%
Final, whichever is higher.
Homework: due each (following) week at TA sessions or in TA dropbox by the end of that day (11:59pm – note that doors to the building generally lock at 7pm, and the handicap doors lock shortly after 9pm); so week 1 assignment is due Tuesday of week 2, etc.
On the top of each HW please put: FIRST NAME space LAST NAME + PID + SECTION NUMBER.
Week 1: Get R or R studio work on your computer, or find it on the computers in one of our labs.
Then try the codes shown in the 1st lecture. Chapter 2: 8, 13, 17 (b-e) [suggest using R to do computing parts of 13 and 17 – entering data is part of the training!]; Chapter 3: 2, 5, 7, 8, 12.
Week 2: Use R function ‘rnorm()’ from week 1 lectures to
generate rnorm(n=100, mean=0, sd=1), rnorm(n=100, mean=0, sd=10) and rnorm(n=100, mean=0, sd=100), respectively. For each of them, plot the histogram and calculate the sample variance and sample standard deviation (SD). In comparing the three, describe the meanings of sample variance and sample SD.
Chpater 3: 15, 16, 20, 21, 26, 34.
Week 3: Chapter 3: 37, 39 – 41, Chapter 4: 4, 6, 10 (plus reading more examples from the book).
Week 4: Chapter 4: 13, 14, 16, 24, 27, 28, 30(b), 32, 34.
Related to #30 above, do the following in R: generate 100 X’s that is distributed as Uniform (0, 1) using the function runif(). Compute X^10 (to the power of 10) for each of these 100 numbers, plot their histogram, then compute their sample mean. Comment on your results.
Week 5, 6 (due Tuesday of week 7): Do the following in R, using rnorm() and other functions we talked about before:
1) Generate 100 X’s that is distributed as N(0, 1), and 100 Y’s that is distributed as N(1, 2). Plot the histograms of X, Y and X+Y, compute the sample means and sample variances of X, Y and X+Y. Comment on your results.
2) Plot density functions of N(0, 1), N(0, 10) and N(0, 100) on the same figure, using the dnorm() function.
Chapter 4: 39, 42, 43, 45, 50, 52; Chapter 5: 11, 16, 21, 23
Week 7: Chapter 6: 3, 11, 16, 17 (instead of text disk use R); Chapter 7: 6.
Week 8: Chapter 7: 8, 13, 14, 26(a)(b), 48, 55;
1) For #13 above, assume that the true mean is 1. Run 1000 simulations as follows. For each simulation, generate a random sample of size 20 from N(1, 0.04), and compute a 99% confidence interval (CI) based on that set of simulated data. Record and comment on how many CI’s out of the 1000 contain the true mean of 1.
2) Now assume a random sample of size n from Poisson(lambda). Use the central limit theorem (CLT) to derive an approxmiate 95% CI for lambda based on the sample mean. Take n=100 and lambda=5, run 1000 simulations similar to problem 1) above to assess the accuracy of the 95% CI that you have derived.
Week 9: please fill out your CAPE evaluation for this course;
Chapter 8: 2, 3, 5, 14, 17, 26.
Week 10 (not due): you are encouraged to also try some of the following using R –
Chapter 8: 32, 35, 38; Chapter 9 – 5, 6, 12.
Lecture: MWF 12, CENTER 101
Instructor: Ronghui (Lily) Xu
Office: APM 5856
Teaching Assistants: A01, A08: Jiaqi Guo <email@example.com>; A02: Tingyi Zhu <firstname.lastname@example.org>; A03: Pengbo Li <email@example.com>; A04: Hanbo Li <firstname.lastname@example.org>; A05: Jiao Chen <email@example.com>; A06: Kuangyi Yang <firstname.lastname@example.org>; and A07: Yan Yang <email@example.com>.