COL 764 - Information Retrieval and Web Search (2023-24 Sem 1)

Table of Contents



Organization

Credit Structure (L-T-P):3-0-2 (4 credits)

Course slot: AB (Mon., Thu. 3:30-5:00)

Lecture location: LH 606

Teaching Assistants: Rajat Singh

Slides will be posted at regular intervals in the URL (accessible from within IITD)


Grading scheme (tentative)

Activity Weight
Mid-term 15%
Major 20%
Assignments (3 x 10) 30%
Project 30%
Creative Participation, Novel Dataset Creation, Short Video preparation, etc. 5%

FAQ

  1. I really want to take this course, but there are no vacancies, what should I do?

    • The course registration limit will not be increased. You may have to wait for someone to drop out –invariably someone will in the next few weeks, and try to register.
    • Only if you are a Ph.D./ M.S.® student from CSE / SIT departments, then send me a mail, including some details about why you need / want to take this course. We may make an exception and add you into the course.
  2. What background is required to do well in this course?

    • Strong programming proficiency is essential (Java, C++ or Python) along with good knowledge of data-structures.
    • If you are weak in Probability and Statistics, then you should not take this course. MTL106 (or its equivalent) is recommended.
    • ML/NLP/AI will help, but not essential for succeeding in this course.
  3. I know I am registered to the course, but I can not see it in my Teams?

    • Check if you are using xxx@iitd.ac.in credentials for Teams login where xxx is your LDAP id. (Note the domain carefully).
    • If you have registered on eadmin, and your registration request is confirmed, then you will be automatically added to the course team. You may have to wait a couple of days for these to be synchronized. please do not send emails about it.
  4. Can I sit-through the course or audit the course?

    • Only Ph.D. students can sit-through the course. They can send me an email so that they can be manually added to the team.
    • Auditing the course is possible, although not encouraged (by me).

NEWS / UPDATES

[13 Aug 2023] Release of Assignment 1. Deadline 28th August 2023.

[31 Jul 2023] Collecting HPC account details of students.

[21 Jul 2023] There will be no class on 27th July 2023.

[21 Jul 2023] First lecture will be held on 24th July 2023 at 3:30pm in Room LH606. Please attend even if you are not able to register.



About the Course

Information retrieval -aka “search”- plays a central role in our modern digital lives. In this course we cover the fundamental concepts of information retrieval as well as some of the recent advances in the field such as the use of knowledge graphs for retrieval, neural methods for retrieval tasks, issues of fairness and fake news, and the use of succinct data-structures in building efficient search systems.

Objectives

Understand and be able to discuss concepts such as document representation methods, information needs, search result effectiveness metrics and web search engine architectures. Implement and use retrieval algorithms; test them on standard and large-scale data collections. Apply information retrieval and web search methods to solve real-world problems, appreciate their impact on modern everyday life.

Contents

(Tentative, may slightly deviate to focus on recent advances)

  • Retrieval models (Boolean, vector-space, probabilistic, language-model, Markov random fields, diversity-aware);
  • Design of test collections (TREC, crowd-sourcing) and retrieval effectiveness measures (micro-/macro-F measure, nDCG, BPref);
  • Collection models (multinomial repr.; topic mixtures) and topic modeling (LSA/LSI, LDA);
  • Search engine architecture (crawling, indexing, and web-page ranking);
  • Learning to rank including neural ranking;
  • Knowledge graphs;
  • Responsible IR (e.g., handling bias and fake-news, privacy, etc.);
  • Use of LLMs for retrieval

NOTE: The course will involve a significant level of programming to process large datasets with focus on efficiency as well as quality of results.

Prerequisites:

  • datastructures and algorithms (COL106)
  • probability and statistics (MTL106 or equivalent)
  • comfortable with programming in Java/C++/Python, and with linear algebra.
  • background in machine learning and/or NLP is not mandatory (although it will help if you have)

Textbooks

  1. Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze, Cambridge University Press. I strongly encourage you to own a copy of this book. A high-quality preprint of the book is available from the book website
  2. Modern Information Retrieval : The Concepts and Technology behind Search by Ricardo Baeza-Yates and Ribeiro-Neto, 2010.
  3. Search Engines: Information Retrieval in Practice by Croft, Metzler and Strohman, 2010.
  4. Information Retrieval – Implementing and Evaluating Search Engines by Büttcher, Clarke and Cormack, MIT Press, 2010.

Calendar

Date Topic
24-Jul-2023 Organization and Introduction
31-Jul-2023 Document to Document-term Matrix
03-Aug-2023 Inverted Index-based Retrieval
07-Aug-2023 Implementation Details of Inverted Indexes
10-Aug-2023 Vector-space Retrieval
14-Aug-2023 Evaluation: Approaches and Metrics
Avatar
Srikanta Bedathur
DS Chair of Artificial Intelligence

Related