COMPSCI 561/CBB 561 - Computational Sequence Biology

Spring 2022

Mon & Wed 8:30am - 9:45am, virtual and/or LSRC D106

Overview
Syllabus

Course Description:

Algorithmic and computational issues in analysis of biological sequences: DNA, RNA, and protein. Emphasizes probabilistic approaches and machine learning methods, e.g. Hidden Markov models. Explores applications in analysis of high-throughput sequencing data, protein and DNA homology detection, gene finding, motif discovery, comparative genomics and phylogenetics, genome segmentation, DNA/RNA/protein structure prediction, with a strong focus on algorithmic aspects. Prerequisites: basic knowledge of algorithmic design (COMPSCI 330 or equivalent), probability and statistics (STA 611 or equivalent), molecular biology (BIO 201L or equivalent), basic computer programming skills (preferred programming languages: Python, Java, C/C++, Perl, R, or Matlab).

Course materials, homeworks and quizzes are avalaible through Sakai.

Instructor:
Raluca Gordan
Office hours: Wed 9:45am-10:45am (right after class)
Zoom link: same as the class meeting for that day
(In person: D211)
Email: raluca.gordan at duke dot edu

TA:
Kuei-Yueh Ko
Office hours: TBP
Zoom link: TBD
Email: kuei.yueh.ko at duke dot edu

Grading:
Course grade is based on homeworks (70%), pre-class quizzes (15%), and class participation (15%). Homeworks and quizzes will be distributed through Sakai.
You will have 2 weeks to complete each homework. Late homeworks will not be accepted; however, you are allowed one late homework for the course, for a maximum of 1 week.
Pre-class quizzes will be due 1 hour before class. The quizzes will test either your background on a subject (to make sure you will be able to follow and participate in the lecture) or your understanding of a subject or paper presented in a previous lecture. You can take each quiz twice; only the highest grade will be considered.

Collaboration policy:
All homeworks and pre-class quizzes should be completed individually, unless otherwise stated. However, if you have worked for a while on a particular problem and have encountered a mental wall, and if you have banged your head against the wall for a while, you should consult others to make progress—that is better than giving up entirely. Your first course of action is to speak to the instructor or TA. If for any reason you consult your peers, it should remain understood that such an interaction must be one of consultation and not collaboration: hints rather than answers; after consultation, it is expected that you should still have some thinking to do (otherwise this course will not be very useful for you!). In addition, if you happen to consult with another student, both of you must cite this.

Readings/textbook:
We will have readings for the course (which will be available on Sakai), but there is no formal textbook. Useful resources include:

•    Durbin, Eddy, Krogh, Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
•    Cristianini and Hahn, Introduction to Computational Genomics: A Case Studies Approach
•    Jones and Pevzner, An Introduction to Bioinformatics Algorithms
•    Majoros, Methods for Computational Gene Prediction
•    Alberts, Johnson, Lewis, Raff, Roberts, Walter, Molecular Biology of the Cell
•    Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms


Syllabus


This syllabus is tentative and may change (slighly) during the semester. Please check Sakai for the latest version.

1 Jan-5 Introduction; DNA sequencing



2 Jan-10 Global sequence alignment; Needleman-Wunsch
3 Jan-12 Local sequence alignment; Smith-Waterman



Jan-17 NO CLASS - MLK DAY
4 Jan-19 Heuristic search; FASTA; BLAST



5 Jan-24 String matching; suffix arrays
6 Jan-26 Short read alignment; BWA; Bowtie



7 Jan-31 Probabilistic models for biological sequences
8 Feb-2 HMM parsing; Viterbi



9 Feb-7 HMM training; Baum-Welch
10 Feb-9 HMM applications



11 Feb-14 Profile HMMs; PSIBLAST
12 Feb-16 Phylogenetic trees: UPGMA; NJ



13 Feb-21 Unsupervised learning
14 Feb-23 Clustering; non-negative matrix factorization



15 Feb-28 Algorithms in single-cell data analysis
16 Mar-3 Supervised learning; classification and regression



Mar-7 NO CLASS - SPRING BREAK
Mar-9 NO CLASS - SPRING BREAK



17 Mar-14 SVM; string kernels
18 Mar-16 Naive Bayes; logistic regression



19 Mar-21 Deep neural networks
20 Mar-23 Deep neural networks



21 Mar-28 Motif finding: EM and Gibbs sampling
22 Mar-30 Motif finding: Bayesian networks



23 Apr-4 Student presentation
24 Apr-6 Student presentation



25 Apr-11 Student presentation
26 Apr-13 Student presentation






Link to Sakai