Spring 2015

** ********** ****Announcements**** **************

· **Open source statistical programming language and
environment R **will be used from time to time during the
course, including in homework assignments where the codes will be provided (by
TA’s). It is available in the UCSD
computing labs and “virtual lab”, http://acms.ucsd.edu/students/govirtual/index.html (you’ll need to register first).

· For each
week at the end of Friday’s lecture, students enrolled in different sections
A01-A08 can take turns to turn in a slip summarizing: 1) what you have learned
this week; 2) what you don’t understand. So week 1 for those in A01, week 2 for
A02, etc., and we will skip the midterm week (**so A06 will turn in theirs on Friday of week 7, A07 for week8, and A08
for week 9**). This gives each student a chance to **earn 3 bonus points** (out of 100) for your total score in the class.

· Electronic
files from the lectures are posted below; homework solutions will be posted on
piazza (please enroll using your ucsd email address).

· Midterm
is set on Wed, May 6, in class. Scientific calculators are allowed, but not
graphing calculators. See also the push back of feedback slips above for after
the midterm. One ‘cheat’ sheet is allowed. Materials covered will be from the
beginning to Section 4.5 (included). __A review sheet is posted under the
e-files link below__, which we will go over on Monday.

· Textbook
is available at the library on reserve.

· **We are in the 2nd half of the course, which
really deals with statistics (starting chapter 7, as opposed to probability,
chapter 3-6), and has a distinct flavor from before. Also we don't follow the
book closely. You should try to attend lectures, or ask and borrow notes if you
have to miss. Otherwise you can easily fall behind, and risk a poor grade.**

· There is
a glossary pdf file under the e-files link below.

· **You are encouraged to fill out the CAPE evaluation,
which should provide useful feedbacks for future iterations of this
course. If we get a response rate of
75%, we will drop one lowest score on the homework; if we get a response rate
of 85%, we will drop two lowest scores on the homework. **(We won’t
have a fixed distribution of grades A, B, C etc, so
such a scheme should not be disadvantageous to anyone.) **(The final response rate is just above 85% - congrats, you made the 2nd
cut!!)**

· **Final review is posted under the e-files link
below. **One ‘cheat’ sheet is allowed during the exam.

** Overview**: we will
follow the textbook (see below) from Chapter 1, 2, etc., hoping to get to
Chapter 9 Regression by the end of the quarter. As such we will not cover every
detail in these chapters. Students are expected to attend lectures (which will
also be podcast to enrolled students), and the homework assignments should
generally reflect what is being taught.
See also progress of the course below that is constantly updated.

**Progress
of the course: **chapters and sections we have
covered so far (or will be covering soon) –

Chapter 1;

Chapter 2: 2.1-2.3;

Chapter 3: 3.1-3.4, 3.5 (selected material), 3.6 –
3.8;

Chapter 4: 4.1 – 4.3.1, 4.4 – 4.7;

Chapter 5: 5.1, 5.2, 5.4 – 5.6;

Chapter 6: 6.1 – 6.3;

Chapter 7: in place of 7.2 we will discuss a more
intuitive estimation method that is based on the sample mean, 7.3 – 7.3.1,

7.5, in place of 7.6 we will
discuss a general CI based on the CLT [see also week 8 problem 2) below];

Chapter 8: 8.1 – 8.4.2;

Chapter 9: 9.1 – 9.4.1.

**Textbook:**

Ross,
"Introduction to Probability and Statistics for Engineers and
Scientists" (5^{th} Edition)

There
is an online version of the 4^{th} edition through our library (http://libraries.ucsd.edu), but please be aware
that the 5^{th} edition is the official version we refer to during the
course (including section numbers, homework assignments, etc.).

**Reference:**

“OpenIntro Statistics” (2^{nd}
Edition) by Diez et al.

https://www.openintro.org/stat/textbook.php

**Files from lectures **(slides, R codes which are *.R files, etc. through this link)

About
polio: “Belonging
to a lower social class usually means having a large family and living under
less favourable conditions, resulting in early
exposure and reduced risk of being infected with severe polio, and
poliomyelitis is therefore believed to be a middle and upper class phenomenon.” – Nielson et
al., International Journal of Epidemiology 2002; vol.31(1):
181-186. doi: 10.1093/ije/31.1.181

**Grading:** 30% Homework +
30% Midterm + 40% Final

Or 40% Homework + 20% Midterm + 40%
Final, whichever is higher.

**Homework: **due each
(following) week at TA sessions or in TA dropbox by
the end of that day (11:59pm – note that doors to the building generally lock
at 7pm, and the handicap doors lock shortly after 9pm); so week 1 assignment is
due Tuesday of week 2, etc.

On
the top of each HW please put: FIRST NAME space LAST NAME + PID + SECTION
NUMBER.

__Week 1__: Get R or R studio
work on your computer, or find it on the computers in one of our labs.

Then
try the codes shown in the 1^{st} lecture. Chapter 2: 8, 13, 17 (b-e) [suggest
using R to do computing parts of 13 and 17 – entering data is part of the
training!]; Chapter 3: 2, 5, 7, 8, 12.

__Week 2__: Use R function ‘rnorm()’ from week 1 lectures to

generate rnorm(n=100, mean=0, sd=1), rnorm(n=100, mean=0, sd=10) and rnorm(n=100, mean=0, sd=100), respectively. For each of them, plot the histogram
and calculate the sample variance and sample standard deviation (SD). In
comparing the three, describe the meanings of sample variance and sample SD.

Chpater 3: 15, 16, 20,
21, 26, 34.

__Week
3__: Chapter 3:
37, 39 – 41, Chapter 4: 4, 6, 10 (plus reading more examples from the book).

__Week 4__: Chapter 4: 13, 14, 16, 24, 27, 28, 30(b), 32,
34.

Related
to #30 above, do the following in R: generate 100 X’s that is distributed as Uniform (0, 1) using the
function runif(). Compute X^10 (to the power of 10) for each of these 100
numbers, plot their histogram, then compute their
sample mean. Comment on your results.

__Week
5, 6 (due Tuesday of week 7)__: Do the following in R, using rnorm()
and other functions we talked about before:

1)
Generate 100 X’s that is distributed as N(0,
1), and 100 Y’s that is distributed as N(1, 2). Plot the histograms of X, Y and
X+Y, compute the sample means and sample variances of X, Y and X+Y. Comment on your
results.

2)
Plot density functions of N(0, 1),
N(0, 10) and N(0, 100) on the same figure, using the dnorm()
function.

Chapter 4: 39, 42, 43, 45, 50, 52; Chapter 5: 11, 16, 21, 23

__Week 7__: Chapter 6: 3, 11, 16, 17 (instead of text disk use R);
Chapter 7: 6.

__Week
8__:
Chapter 7: 8, 13, 14, 26(a)(b), 48, 55;

1) For #13 above,
assume that the true mean is 1. Run 1000 simulations as follows. For each simulation, generate a random sample
of size 20 from N(1, 0.04), and compute a 99%
confidence interval (CI) based on that set of simulated data. Record and comment on how many CI’s out of
the 1000 contain the true mean of 1.

2) Now assume a
random sample of size n from Poisson(lambda). Use the central limit theorem
(CLT) to derive an approxmiate 95% CI for lambda
based on the sample mean. Take n=100 and
lambda=5, run 1000 simulations similar to problem 1) above to assess the
accuracy of the 95% CI that you have derived.

__Week
9__:
please fill out your CAPE evaluation for this course;

Chapter 8: 2, 3, 5, 14, 17, 26.

__Week
10 (not due)__:
you are encouraged to also try some of the following using R –

Chapter 8: 32, 35, 38; Chapter 9
– 5, 6, 12.

**Lecture:** MWF 12

**Instructor: **Ronghui (Lily) Xu

**Office:** APM 5856

**Phone:** 534-6380

**Email: **rxu@ucsd.edu

**Office
Hours:**

**M:
**4:00pm
– 5:00pm

**F: **

Or,
by appointment

**Teaching
Assistants: **A01, A08:** **Jiaqi Guo <jig026@ucsd.edu>;
A02: Tingyi Zhu <t8zhu@ucsd.edu>;
A03: Pengbo Li <pel034@ucsd.edu>;
A04: Hanbo Li <hal123@ucsd.edu>; A05: Jiao Chen <jic103@ucsd.edu>;
A06: Kuangyi Yang <kuy006@ucsd.edu>;
and A07: Yan Yang <yay046@ucsd.edu>.