Huge amounts of valuable data are stored outside of structured databases as human language: text and speech. This course covers modern techniques to extract useful information from this language data.
Week | Date | Topic | Reading | Materials | Assignments |
---|---|---|---|---|---|
1 | Sep 21 | introduction; regular expressions; finite-state automata | SLP3 2.1 | slides 1, slides 2 | HW0 out |
2 | Sep 28 | finite-state transducers; text normalization (e.g., tokenization, stemming) | SLP2 2.2–2.3, IR 2.1–2.2 | slides 1, slides 2, slides 3 | HW0 due (Tue), HW1 out |
3 | Oct 5 | n-gram models; frequency analysis; cooccurrence analysis; edit distance; spelling correction; noisy channel models | SLP3 4, SLP3 5 | slides 1, slides 2 | |
4 | Oct 12 | document classification; naive Bayes; logistic regression; sentiment analysis | SLP3 6 | slides 1, slides 2 | HW1 due |
5 | Oct 19 | indexing and retrieval; Lucene | IR 1, 6 | slides 1, slides 2, slides 3 | HW2 out |
6 | Oct 26 | similarity and clustering; latent semantic analysis; latent Dirichlet allocation; distributed word representations | SLP3 15, SLP3 16, Blei (2012) | slides 1, slides 2 | |
7 | Nov 2 | Class cancelled | HW2 due, HW3 out | ||
8 | Nov 9 | part-of-speech tagging; hidden Markov models; Viterbi algorithm; maximum entropy models | SLP3 9, SLP3 10 | slides 1, slides 2 | |
9 | Nov 16 | named entity recognition; relation extraction; advanced maximum entropy models; coreference; formal grammars; syntactic parsing; wrap-up | SLP3 21 | slides 1, slides 2, slides 3, slides 4, slides 5 | HW3 due, HW4 out |
10 | Nov 23 | Thanksgiving holiday! | |||
11 | Nov 30 | speech recognition for automatic transcription | slides 1, slides 2 | ||
12 | Dec 4 | HW4 due 5pm |