Course slot: AA (Mon., Thu. 2:00-3:30)
Information retrieval -aka “search”- plays a central role in our modern digital lives. In this course we cover the fundamental concepts of information retrieval as well as some of the recent advances in the field such as the use of knowledge graphs for retrieval, neural methods for retrieval tasks, and the use of succinct data-structures in building efficient search systems.
The course is mostly lecture-driven, with a few student-presentations, homework assignments and paper-reading tasks spread throughout the semester.
Part I: Basic Concepts
- Basic retrieval models - Boolean, tf-idf & vector-space models
- Inverted Indexes - basic structure, ranked retrieval DAAT/TAAT
- Retrieval effectiveness evaluation - precision/recall, nDCG
- Benchmarking - Cranfield paradigm, pooled evaluations, TREC collections
Part II: Improving Retrieval Models * Probabilistic retrieval, Language models, * Relevance feedback - Rocchio’s method, implicit relevance feedback, query expansion * Revisiting inverted indexes - compact representation, top-k processing * Latent Semantic Indexing and Topic Models
Part III: Advanced topics in IR (if time permits and topics are flexible) * Learning to rank - pointwise, pairwise, listwise approaches * Knowledge graphs for retrieval * Neural retrieval models * Web search engine architecture
- datastructures and algorithms
- comfortable with programming in Java/C++, probability and statistics, linear algebra
- background in machine learning and/or NLP is desired but not mandatory
Textbook and references
Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze, Cambridge University Press.
I strongly encourage you to own a copy of this book. A high-quality preprint of the book is available from the book website
Apart from the text book, the course will use material from the following books (among others):
- Modern Information Retrieval : The Concepts and Technology behind Search by Ricardo Baeza-Yates and Ribeiro-Neto, 2010.
- Search Engines: Information Retrieval in Practice by Croft, Metzler and Strohman, 2149.
We will follow the commonly followed Mark weight scheme followed in many other CS courses in IITD (following description courtesy Prof. Arun Kumar) 1. We propose attendance to carry individual weight in the course marks breakup, maybe 4 to 5% 2. Less than 75% leads to one grade demarcation. 3. Less than 50% leads to course failure. 4. Proven to work in classes like: Prof. Naveen Garg - TOC 2016-2017 Semester 2 and Prof. Vinay Ribeiro - Computer Networks 2016-2017 Semester 1