COL772: Natural Language Processing - Spring 2018
Monday, Thursday 2-3:20 pm in Bharti 201


Instructor: Mausam
(mausam at cse dot iitd dot ac dot in)
Office hours: by appointment, SIT Building Room 402
TAs (Office hours, by appointment):
Prachi Jain, p6.jain AT gmail.com

Course Contents

NLP concepts: Tokenization, lemmatization, part of speech tagging, noun phrase chunking, named entity recognition, coreference resolution, parsing, information extraction, sentiment analysis, question answering, text classification, document clustering, document summarization, discourse, machine translation.

Machine learning concepts: Naive Bayes, MaxEnt classifiers, Hidden Markov Models, Conditional Random Fields, Probabilistic Context Free Grammars, Word2vec models, RNN-based neural models, Sequence to sequence neural models.

Schedule

Start End Slides Required Readings Recommended Readings
Jan 4Jan 18 Introduction J&M Ch 1 Advances in NLP
Jan 18Jan 18 Regular Languages and Finite State Automata SLP3 Ch 2  
Jan 29Jan 29 Morphology with Finite State Transducers J&M Ch 3  
Feb 1Feb 12 Text Categorization using Naive Bayes Notes (Sections 1-4)
SLP3 (Upto Section 7.3)
Gender in Job Postings
Improvements to Multinomial Naive Bayes
Performance Measures
Error Correcting Output Codes
Feb 12Feb 15 Sentiment Mining and Lexicon Generation Survey (Sections 1-4.5)
Tutorial (Sections 1-5)
Semantic Orientation of Adjectives
Unsupervised Classification of Reviews
Feb 15Feb 19 Log Linear Models for Classification Notes (Section 2)
SLP3 (7.4-7.6)
Max Entropy models for WSD
Feb 19Feb 19 Generative vs. Max Entropy Models Max Entropy Tutorial Intro to Max Entropy Models
Feb 19Feb 22 Information Retrieval and Topic Models SLP3 Ch 15
LSA and PLSA
Detailed Tutorial on LDA
Feb 27Mar 27 Assignment 1 Resources

Mar 2Mar 19 Project (Part 1)    
Mar 5Mar 8 Representation Discovery for Words and Documents Goldberg 8.1-8.4, 10, 11
Doc2VecC (Sections 1-3)
Embeddings vs. Factorization
Trends and Future Directions on Word Embeddings
Mar 8Mar 12 N-gram Features with CNNs Goldberg 13
Practitioner's Guide to CNNs
Mar 12Mar 15 RNNs for Variable Length Sequences Goldberg 14.1-14.3.1,14.4-14.5
Goldberg 15, 16.1.1, 16.2.2
Understanding LSTMs
Recurrent Additive Networks
RNNs and Vanishing Gradients (Section 4.3)
Mar 15Mar 15 Tricks for Training RNNs Deep Learning for NLP Best Practices
Mar 15Mar 15 Domain Adaptation Paper

Mar 15Mar 19 Language Models SLP3 Ch 4
Goldberg 9, 10.5.5
Character Aware Neural LMs
Exploring limits of Language modeling
Mar 22Apr 2 POS Tagging with Hidden Markov Models SLP3 (Ch 9, 10.1-10.4)

Apr 2Apr 5 Named Entity Recognition with CRFs Notes (Section 4)
Detailed Notes
Non-Local Features and Knowledge in NER
Apr 5Apr 5 BiLSTM+CRF and Other Neural Models for Sequence Labeling Goldberg 19.1-19.3, 19.4.2
Bidirectional LSTM-CRF Models

Apr 5Apr 9 Constrained Conditional Models for Sequence Labeling Paper on CCM Learning (Sections 2, 3.2)


Apr 7Apr 21 Assignment 2  
Apr 9Apr 16 Statistical Natural Language Parsing SLP3 Ch 12.1-12.2, 13.1-13.5,13.8
Lectures Notes on PCFGs
Lectures Notes on Lexicalized PCFGs

Apr 16Apr 16 Neural Models over Tree Structures Goldberg 18
Tree LSTMs

Apr 19Apr 23 Seq2Seq Models & Attention Goldberg 17.1, 17.2, 17.4
Attention is All You Need
Apr 23Apr 26 Dialog Systems

Apr 26Apr 26 Wrap Up


Textbook and Readings

Yoav Goldberg Neural Network Methods for Natural Language Processing,
Morgan and Claypool (2017) (required).

Dan Jurafsky and James Martin Speech and Language Processing, 3nd Edition,
(under development).

Grading

Assignments: 30%; Project: 20%; Minors: 20%; Final: 30%; Class participation, online discussions: extra credit.

Course Administration and Policies

Cheating Vs. Collaborating Guidelines

As adapted from Dan Weld's guidelines.

Collaboration is a very good thing. On the other hand, cheating is considered a very serious offense. Please don't do it! Concern about cheating creates an unpleasant environment for everyone. If you cheat, you get a zero in the assignment, and additionally you risk losing your position as a student in the department and the institute. The department's policy on cheating is to report any cases to the disciplinary committee. What follows afterwards is not fun.

So how do you draw the line between collaboration and cheating? Here's a reasonable set of ground rules. Failure to understand and follow these rules will constitute cheating, and will be dealt with as per institute guidelines.