Computational linguistics

Klinton Bicknell /// Spring 2017

Computational linguistics allows computers to use language, such as recognizing speech, correcting spelling, and translating. This course introduces students to the field using a modern statistical approach.

Schedule Piazza forum

Schedule

Week Date Topic Reading Materials Assignments
1.1 Mar 28 What is computational linguistics, unix/linux JM 1, unix/linux tutorial (through tutorial 5) Connect to the SSCC [Instructions]
1.2 Mar 30 No class. Klinton traveling.
2.1 Apr 4 Programming in python 1 NLTK 1 python transcript
2.2 Apr 6 Finite-state automata, regular expressions JM 2 nltk setup, nano tutorial, transcript, optional: emacs tutorial hw1 out
3.1 Apr 11 Programming in python 2 NLTK 2–3
3.2 Apr 13 Programming in python 3 NLTK 4 lecture notes
4.1 Apr 18 Probability theory, maximum likelihood estimation (MLE), unigram models hw1 due
4.2 Apr 20 Graphical models, n-gram models, Markov chains JM 4.1–4.2, Levy appendix
5.1 Apr 25 Perplexity, training and test sets, basic information theory JM 4.3–4.4 hw2 out
5.2 Apr 27 More smoothing, part-of-speech tagging JM 4.5–4.7, 4.9.1, 4.10
6.1 May 2 Bayesian inference, Hidden Markov models (HMMs), part-of-speech tagging JM 5.1–5.3, 5.5–5.5.2, 5.7 hw2 due; hw3 out
6.2 May 4 Forward algorithm JM 5.5.3, 6.1–6.4
7.1 May 9 Viterbi decoding; Programming: best practices Wilson et al. (2014) scripts archive, slides
7.2 May 11 Supervised and unsupervised learning, noisy channel models for spelling correction/autocorrect JM 5.5.4, JM 5.9
8.1 May 16 Context-free grammars (CFGs) for syntax, classes of grammars, regular expressions on trees, basic parsing JM 12.1–12.6, 13.1–13.4.2 hw3 due; project proposals due; hw4 out
8.2 May 18 Probabilistic CFGs (PCFGs), statistical parsing JM 14.1–14.4
9.1 May 23 Automatic speech recognition (ASR), Machine translation (MT) JM 9.1–9.2, 9.5–9.6; 25.1–25.3
9.2 May 25 Computational psycholinguistics Bicknell & Levy (2010) hw4 due
Jun 5 Final project reports due 5pm

Logistics

Course

Time
Tuesdays & Thursdays 12:30–1:50
Location
Annenberg G29
Textbooks
Website
www.klintonbicknell.com/ling334spring2017

Instructor

Name
Klinton Bicknell
Office hours
Thursdays 1:50–2:50 (i.e., immediately following class)
Office
Linguistics [2016 Sheridan Road] Office 107

Policies

Email
Questions that are not personal should be posted on the Piazza forum (where they can be posted anonymously if desired). To contact the instructor directly, coming to office hours is encouraged. For questions that are personal, students can email the instructor at kbicknell@northwestern.edu.
Description
Hands-on introduction to computational linguistics, viewed from a modern probabilistic perspective. The class begins with an introduction to programming and probability theory, goes through language modeling, hidden Markov models, and syntactic parsing, and ends with state-of-the-art methods in machine translation and automatic speech recognition. Students will also learn practical skills for extracting information from large linguistic datasets using natural language processing techniques, as well as good programming practices.
Academic integrity
Violations of academic integrity will be referred to the Dean’s office, per WCAS policies. Sanctions can be quite severe, including suspension or permanent expulsion from the university. For details and discussion of how to avoid plagiarism, see the Academic Integrity section of the WCAS undergraduate handbook.

Requirements

Course Grade
  • 70% homeworks (4)
  • 30% final project
Homeworks
There will be four homework assignments throughout the quarter. These assignments will involve a combination of programming exercises and short answer responses. Working together in pairs or small groups when discussing the assignments is encouraged, but each student must code and write up their own homework independently. In addition, students must list on each assignment all students they discussed the assignment with. Homework must be handed in through Canvas.
Final project
Students will complete a final project on a topic related to the course content. This project will either investigate a language research question using computational techniques or will implement a computational linguistic model beyond those covered in class. These projects should generally be completed in pairs or individually, but for especially ambitious projects, groups of three can be allowed with advance instructor permission. Students will write short project proposals due as marked on the calendar, and then a final paper on the project will be due on the first day of finals week.
Keeping up
The syllabus (topics, assignments, due dates) may change. These changes will be announced in class, over email (via Piazza), and on the course website. It is students' responsibility to keep up with them.
Deadlines
All assignments are due at 5pm. For late work turned in between this deadline and 11:59pm the following day (i.e., the first 31 hours after the deadline), I will deduct one percentage point per hour (or partial hour). After the following day, I will give comments and suggestions on work turned in, but you will not receive credit for the assignment. (Of course, if some unusual external circumstance arises which will cause you have trouble meeting a deadline, please contact the instructor as soon as possible.)
AccessibleNU
Any student requesting accommodations related to a disability or other condition is required to register with AccessibleNU (accessiblenu@northwestern.edu; 847-467-5530) and provide professors with an accommodation notification from AccessibleNU, preferably within the first two weeks of class. All information will remain confidential.