
CS395T Computational Statistics with Application to Bioinformatics
Prof. William H. Press
Course Lecture Notes (Spring, 2008)
Concepts: probability theorems and examples; inference,
Bayesian inference; marginalization; nuisance parameter; posterior;
Bernoulli trials; conjugate prior
MATLAB: syms, int, ezplot, diff, simplify, solve, pretty
Mathematica: Integrate, GenerateConditions, D, Plot, Simplify, Solve
Concepts: measures of central tendency, mean, median;
normal (Gaussian), Student, Cauchy, lognormal, exponential, gamma, chi-square;
PDF, CDF, characteristic function; Central Limit Theorem
NR3 (C++): Normaldist
Concepts: random number generator (RNG); multiplicative RNG,
p-values, t-values; binomial distribution; chi-square test; 1- vs. 2-point
distribution; Xorshift RNG; combinations of generators; p-value paradigm
MATLAB: uint32, mod, accumarray, betainc, normcdf,
ceil, zeros, chi2cdf
MATLAB API (C): mex functions, mxGetData, mxGetM, mxGetN
NR3 (C++): nr3.h, struct Toyran1, Chisqdist, Ran
Concepts: moments of chi-square variable, how
chi-square becomes normal; chi-square failure for Poisson events; linear
constraints; multiple hypothesis correction, Bonferroni, FDR; stopping
rule paradoxes
MATLAB: symsum, betapdf, quad, linspace
Mathematica: Sum
Concepts: Xorshift generators; matrix powers by successive
squaring; GCD and Gorilla randomness tests; transformation method;
rejection method; ratio of uniforms method; squeezes; Leva's algorithm
MATLAB: ndgrid, eye, spy, jacobian, abs, det
Mathematica: FactorInteger
Concepts: empirical distributions, samples;
Kolmogorov-Smirnov (KS) test; IQagent data structure; genestats.dat
data file, intron and exon lengths; plotting PDFs, uniformity of
errors, PDFs on log scales; resampling; statistical significance vs.
data quantity
MATLAB: readgenestats (custom), fopen, fclose, repmat, cell,
textscan, dataset, error, cell2mat, plot, hold, log10, cdfplot, kstest2,
arrayfun, loglog, semilogy
MATLAB API (C): mxCreateDoubleMatrix
NR3 (C++): IQagent
Concepts: binned data; nonlinear leaset squares (NLS)
fits; covariance matrix; goodness of fit; linear propagation of errors;
Jacobian matrix; sampling the posterior distribution; bootstrap resampling
MATLAB: hist, bar, nlinfit, nlinfitw (custom), diag, randn,
numel, chi2cdf, jacobian, subs, mvnrand, mean, std,
randsample, arrayfun
Concepts: contingency tables; null hypothesis;
Pearson statistic; retrospective or case-control; prospective
or longitudinal; cross-sectional or snapshot; nuisance parameters,
marginalization;
hypergeometric distribution; multinomial distribution; Fisher Exact
Test; Wald statistic; nominal, ordinal, cardinal tables; permutation
test; bootstrap resampling; Dirichlet distribution
MATLAB: crosstab, contingencytable (custom), sum,
repmat, size, squeeze, permute, ndgrid, repmat, accumarray,
arrayfun, randperm, hist, randsample, gamrnd, mnrnd, reshape
Concepts: multivariate normal distribution;
covariance matrix; spliceosome; linear correlation matrix;
Cholesky decomposition; error ellipses
MATLAB: mean, cov, randsample, mvnrand, corrcoef,
chol, errorellipse (custom)
Concepts: phylogenetic trees; cladograms, additive
trees, ultrametric trees; distance matrix, neighbor joining; agglomerative
method; vertebrate species; gene chip;
Hamming distance; rooted vs. unrooted; gene co-expression; Pearson r;
TreeView
NR3 (C++): Phylo_nj, newick
Concepts: Gaussian mixture model (GMM); E-step, M-step,
EM method; log-sum-exp; k-means clustering; Jensen's inequality;
missing data problems
MATLAB: sum, repmat, arrayfun, ksdensity, mvnrnd
NR3 MATLAB interface: nr3_matlab.h, mxScalar, mxT, MatDoub, VecDoub
Concepts: likelihood function; Fisher Information
Matrix, Hessian; centered second difference; outliers; Student-t; AIC, BIC;
MATLAB: hist, bar, fminsearch, hessian (custom), inv,
jacobian, subs, arrayfun
Concepts: unnormalized distribution, posterior; Markov chain;
detailed balance, ergodicity; Metropolis-Hastings algorithm,
proposal distribution, acceptance probability; Poisson process,
fluctuations
MATLAB: rand, subfunction
Concepts: data matrix, design matrix; standardize;
Singular Value Decomposition (SVD); orthogonal basis; low-rank
approximation; Principal Component Analysis (PCA); main effects;
Gaussian random matrix; order statistic; dimensional reduction;
eigengenes, eigenarrays; non-negative matrix factorization (NMF)
MATLAB: prctile, repmat, colormap, image, svd, axis,
semilogy, randn, cumsum
Concepts: Bellman-Dijkstra-Viterbi algorithm,
forward pass, backward pass; error-correcting code; trellis graph;
soft decision decoding; sequence alignment; Needleman-Wunsch algorithm;
multiple alignment
NR3 (C++): stringalign
Concepts: Markov model; transition probability;
irreducibility, aperiodicity, ergodicity; successive squaring method;
Hidden Markov Model (HMM); symbol probability; state estimation;
forward-backward algorithm, alpha pass, beta pass; Baum-Welch
re-estimation; likelihood;
EM method; Generalized HMM, Hidden Semi-Markov Model
NR3 (C++): HMM
NR3 MATLAB interface: hmmmex
Concepts: confusion matrix, TP, FP, TN, FN;
conservative, liberal; performance curve; TPR, FPR, PPV, NPV, FDR;
accuracy, sensitivity, specificity, precision, recall; ROC curve;
convex hull; precision-recall curve
Mathematica: Solve, FullSimplify, substitution operator
(./)
Concepts: linear separation; fat plane;
maximum margin SVM; quadratic programming; primal vs. dual
problem; soft-margin SVM;
embedding; the kernel trick; linear, power, polynomial,
sigmoid, Gaussian radial basis kernels; mitochondrial genes
Software: SVMlight
Concepts: signal, noise, filter; Wiener filter;
best estimate in L2 norm; Fourier basis; Nyquist frequency; low-pass filter;
signal and noise models; spatial (pixel) basis; smoothed image; wavelet
basis; quadrature mirror filter; orthogonality conditions; moment
conditions; pyramidal algorithm; DAUB; left- and right-derivative
MATLAB: fopen, fread, fclose, flipud, image, axis, fft2,
ndgrid, randsample, ifft2, wiener2, wavelet2 (custom)
Concepts: dimensional explosion; Shepherd interpolation;
Radial Basis Function interpolation; multiquadric, inverse multiquadric,
thin plate spline, Gaussian; over- and under-smoothing; Laplace
interpolation; boundary conditions; biconjugate gradient method;
Gaussian process regression; linear prediction; Kriging; variogram
MATLAB: interp1, meshgrid, arrayfun, contour, cell, cellfun,
shepinterp (custom), \-operator, std,
laplaceinterp (custom), krig (custom)
Concepts: character, alphabet, message;
entropy; compression; log cut-down; fair game; payoff odds;
protein, amino acid; monographic, digraphic entropy;
flattened; conditional entropy; mutual information; Lagrange
multiplier; Kelly's formula, proportional betting; CG richness,
3rd codon; Kullbach-Leibler distance; log odds