CS395T Computational Statistics with Application to Bioinformatics

Course data:
Department: The University of Texas at Austin, Department of Computer Sciences
Instructor: Professor William H. Press
Last offered: Spring, 2008
Next offered: Spring, 2009
Unique#: 55899
Meets: MW 1:30 - 3:00 PM in PAR 201

Course description:
This is a practical course in applying (mostly) modern statistical techniques to (mostly) real data, particularly bioinformatic data and large data sets. There is only a small amount of theorem proving; the emphasis is on efficient computation and concise coding, mostly in MATLAB (where we learn various data-parallel language idioms) and C++ (which we learn to interface seamlessly to MATLAB for convenience and computational power).

Topics covered:
Topics covered include probability theory and Bayesian inference; univariate distributions; Central Limit Theorem; generation of random deviates; tail (p-value) tests; multiple hypothesis correction; empirical distributions; model fitting; error estimation; contingency tables; multivariate normal distributions; phylogenetic clustering; Gaussian mixture models; EM methods; maximum likelihood estimation; Markov Chain Monte Carlo; principal component analysis; dynamic programming; hidden Markov models; performance measures for classifiers; support vector machines; Wiener filtering; wavelets; multidimensional interpolation; information theory.

Lecture notes and detailed course outline:
A detailed course outline, with links to complete lecture notes (PDF slide files) from Spring, 2008, is here. (Instructors at other institutions may obtain PowerPoint versions of these files on request.)

Prerequisites:
Graduate standing, or upper-division undergraduate with consent of instructor. Mathematics at least including undergraduate multivariable calculus and linear algebra is assumed, as well as some programming experience in MATLAB, C++, and/or Java (or, possibly, Mathematica, Fortran, or C). A previous course in undergraduate level probability and statistics is helpful, but not required.

Texts:
There is no required text. However, many lectures will utilize methods in Numerical Recipes, Third Edition. Enrolled students will be provided with a free electronic subscription to this book, as well as access to its source code.

Some other relevant books are:

Course requirements:
Enrolled students are expected to attend lectures. Occasional (not regular) problem sets or computer exercises will be assigned. Students are expected to contribute to the course wiki. A student project or paper is required (may be collaborative). No written exams, but there will be individual final oral interviews (20-30 min.) covering the lecture material.

Course wiki:
The 2008 course wiki has an earlier version of the lecture notes, with some discussion threads and contributions by students. (Unfortunately, some of these are hard to find, since they are attached to individual slide links in the lectures.)