BCB720: Introduction to Statistical Modeling
Last Updated: 20131021
Course identifiers: This document describes the syllabus for BCB720 in the Bioinformatics and Computational Biology Curriculum of BBSP.
Time: 10am  11.15am, Tue/Thu
Location: MacNider 322 (primary) and Bondurant G74 (Oct 8 and Oct 15 only).
Materials: All learning materials will be posted on Sakai.
Restrictions: Class is limited to 25 students.
Lead instructor: Prof William Valdar, Room 5113,120 Mason Farm Road, Genetic Medicine Building, Campus Box # 7264, Chapel Hill NC 27599. Tel: +1 919 843 2833. Email: <william.valdar@unc.edu> Web: <http://valdarlab.unc.edu>
Coinstructor: Prof Ethan Lange, Room 5111, 120 Mason Farm Road, Genetic Medicine Building, Campus Box # 7264, Chapel Hill NC 27599. Tel: +1 919 966 3356. Email: <ethan_lange@med.unc.edu>. Web: <http://genetics.unc.edu/faculty/ethanlange>
Teaching Assistants: Andrew Morgan <andrew_morgan@med.unc.edu>, Greg Keele <gkeele@email.unc.edu>. Office hours for the TA will be arranged and posted at the beginning of the first class and updated as necessary.
This module introduces foundational statistical concepts and models that motivate a wide range of analytic methods in bioinformatics, statistical genetics, statistical genomics, and related fields. It is an intensive course, packing a yearÕs worth of probability and statistics into 2/3 of a semester. It covers probability, common distributions, Bayesian inference, maximum likelihood and frequentist inference, linear models, generalized and hierarchical linear models, and causal inference.
This course is targeted at graduate students in BBSP with either a quantitative background or strong quantitative interests who would like to understand and/or develop statistical methods for analyzing complex biological/biomedical data. In particular, it is intended to provide a springboard for BBSP who would subsequently like to take graduatelevel statistical courses elsewhere on campus.
Students are expected to know singlevariable calculus (differentiation and integration in 1 dimension), be familiar with matrix algebra and have some programming experience. The course will include material on partial differentiation of multiparameter functions, and use the statistical package R extensively. Familiarity with these will be an advantage but is not assumed. Introductory statistics may or may not be an advantage (depending on how it was taught), but is not assumed.
The course is open to all graduate students of the Biological and Biomedical Sciences Program (BBSP) at UNC Chapel Hill. Other students, staff, or faculty may attend for credit, on an auditor basis or informally only if
á They have prior permission from the lead instructor, and
á There is space: that is, if they are not taking up a spot that would be otherwise used by a nonauditing (ie, full credit) BBSP student.
Moreover, graduate students from the Department of Biostatistics (BIOS) or the Department of Statistics and Operations Research (STOR) may audit only, and may not receive credit for this course.
1. Probability and distributions
2. Properties of random variables
3. Bayesian and frequentist approaches to statistical inference
4. Hypothesis testing
5. Linear models
6. Generalized linear models
7. Hierarchical models
To obtain full credit, students must attend at least 80% of the lectures, complete all homeworks, and achieve at least a passing overall grade.
Homework assignments will typically be distributed on Tuesdays or Wednesdays after class, with a deadline for electronic submission at least a week later, typically noon on the Friday of the following week. Anonymous student evaluations, required for 5% of the course marks, will be distributed for completion on Sakai within approximately a week of course completion. Students will have a week to complete the student evaluation.
Grades for the course (F,L,P,H) will be based on performance in the homeworks and on completion of the course evaluation. Specifically, the homeworks collectively account for 95% of the course marks, and completion of the anonymous evaluation accounts for the remaining 5%. Each homework will include multiple questions each providing a stated maximum number of points. The total number of points achieved by a student divided by the total possible will be scaled to the range 0 to 95 and used as the percentage of the grade arising from coursework. There is no final exam.
Students must attend the entire duration of at least 80% of the lectures unless they have permission of the lead instructor to do otherwise. Students are expected to be prompt, polite, collaborative when (and only when) asked, and to answer questions in class. Failure to hand in a homework on time without reasonable justification (eg, sickness) will result in automatic loss of 10% of that homeworkÕs maximum allowable points for each day over the deadline.
Key: (B) = Class held in Bondurant G74 rather than MacNider 322; (C) = Students should bring (or be prepared to share) a laptop
Week 
Date 
Instructor 
Lecture 
Description 
Homework 
1 
Tue, Aug 20 
AM 
1 (C) 
Introduction to R 
Homework 1 (WV) 

Thu, Aug 22 
WV 
2 
Set theory and probability 

2 
Tue, Aug 27 
WV 
3 
Conditional Probability 
Homework 2 (WV) 

Thu, Aug 29 
WV 
4 
Distribution, Mass and Density functions 

3 
Tue, Sep 3 
WV 
5 
Expectation and Variance 
Homework 3 (WV) 

Thu, Sep 5 
WV 
6 
Discrete distributions 

4 
Tue, Sep 10 
WV 
7 
Continuous distributions 
Homework 4 (WV) 

Thu, Sep 12 
WV 
8 
Bayesian inference 

5 
Tue, Sep 17 
WV 
9 
Estimation 
Homework 5 (WV) 

Thu, Sep 19 
EL 
10 
MLEs, bias 

6 
Tue, Sep 24 
EL 
11 
Confidence intervals 


Thu, Sep 26 
EL 
12 
Hypothesis testing 
Homework 6 (EL) 
7 
Tue, Oct 1 
EL 
13 
Introduction to regression 


Thu, Oct 3 
EL 
14 
Multiple regression/ANOVA 

8 
Tue, Oct 8 
EL 
15 (B) 
Multiple regression/ANOVA 


Thu, Oct 10 
EL 
16 
Multiple regression in class examples 
Homework 7 (EL) 
9 
Tue, Oct 15 
GK 
17 (B) 
Logistic regression 


Thu, Oct 17 


FALL BREAK 
Homework 8 (WV) 
10 
Tue, Oct 22 
WV 
18 
Causal inference 


Tue, Oct 24 
WV 
19 (C) 
Bayesian and frequentist regression 
Homework 10 (WV) 
11 
Tue, Aug 29 
WV 
20 (C) 
Decisions about modeling 


Thu, Aug 31 
WV 
21 
Hierarchical and penalized regression 

The lead and/or coinstructors reserve to right to make changes to the syllabus, including homework due dates.
There is no course textbook as such because no textbook covers all the material in this course. Some textbooks that may be useful for supplemental reading are given below.
Westfall & Henning (2013) "Understanding Advanced Statistical Methods"  chatty, top recommendation
DeGroot & Schervish (2011) "Probability and Statistics"  thorough explanation of first half of course
Wasserman (2009) "All of Statistics"  last year's recommendation but can be a bit terse
Gelman & Hill (2007)  fantastic for understanding linear models and estimation, but not hypothesis testing
More basic than this course:
Verzani (2004) "Using R for introductory statistics"  friendly chatty book on R, used for GNET course
Johnsen & Wichern (2004) "Applied Multivariate Statistical Analysis"  good intro to matrix algebra (chapter 2)
Venables & Ripley (2002) "Modern Applied Statistics with S"  very terse but comprehensive on R (available free online)
More references (eg, for specific subjects) will be given during and at the end of the course. Students are encouraged to ask the instructors for recommendations for books/resources on specific subjects or books/resources aimed at different levels.
Students may collaborate in class, but each studentÕs homework should be their own. In completing the homework, however, students are nonetheless encouraged to consult the lecture notes, online material, books and any other ÒpassiveÓ sources. They may discuss general strategies and concepts with their classmates and with the TA, and may ask the TA for clarification about the content of questions. The TA may provide guidance as to where they might be able to find example material that addresses problems similar (but not identical) to those posed in the homework
"BCB 720 is an accelerated and concise overview of probability and statistics from both a frequentist and bayesian perspective. This course introduces students to probability theory, probability distributions, hypothesis testing, and linear modeling."
"Strenuous course which outlines many of the fundamental elements of statistics used in common biological problems with large datasets. Includes both a Bayesian perspective and traditional methods as well, including hypothesis testing and linear and logistic regression formulation."
"A really hard crashcourse in probability and statistics for modernday bioinformatics."
"The course has heavy workload but the material does come up in research and in other classes therefore it can be very valuable."