COL 362/632 - Introduction to Database Management Systems (2022-23 Sem 2)

Table of Contents

All material in this website is subject to change. Please keep checking at regular intervals.

Organization

Credit Structure (L-T-P):3-0-2 (4 credits)

Course slot: D (Tue.,Wed., Fri. 9:00-10:00)

Lecture location: LH 410

Teaching Assistants: Anjali, Rohan Debbarma, Harsh Pandey, Nikhil Sharma, Neeraj Chaudhary, Sahil Manchanda (part-time), Rakshita Chaudhary (UG-TA), Aditya (UG-TA), Tushar Singla (UG-TA), Mohit Sharma (UG-TA)


Office hours

Location: tentatively it is Room 425 in Bharti Building. Please contact the TA ahead of time over Piazza to confirm the location.

Instructor / TA Office hour
Anjali 11:30 - 12:30 Thu.
Garima 11:00 - 12:00 Tue.
Neeraj 10:00 - 11:00 Mon.
Rohan 10:00 - 11:00 Tue.
Nikhil 10:00 - 11:00 Mon.
Harsh 16:30 - 17:30 Tue.
Srikanta 17:00 - 18:00 Tue.

Grading scheme

Activity Weight
Minor1 10%
Minor2 10%
Major 25%
Assignments (3 x 10% each) 30%
Project 25%

NEWS / UPDATES

[01 Mar 2023] Minor 1 Marks uploaded to Moodle (and frozen)

[01 Mar 2023] Demo for interpretation of query diagrams on Picasso (recording link on Piazza)

[27 Feb 2023] Assignment 1 Marks - Initial Release

[23 Feb 2023] Picasso Help Session

[18 Feb 2023] Assignment 2 Released

[17 Feb 2023] Project Proposal Submission Deadline

[06 Feb 2023] Minor-1 Begins

[27 Jan 2023] Slides for Week 4 uploaded in the shared folder

[25 Jan 2023] Assignment 1 released on Moodle. Deadline 3rd February 2023.

[22 Jan 2023] Only from Assignment 1 onwards, we will be using PostgreSQL 8.4.22. Please note the change in the recommended version. The build process remains the same.

[20 Jan 2023] Slides for Week 3 uploaded in the shared folder

[13 Jan 2023] Slides for Week 2 uploaded in the shared folder (link below)

[10 Jan 2023] Ungraded homework 1 is uploaded on Moodle

[6 Jan 2023] Slides for Week 1 uploaded on Moodle

[4 Jan 2023] Attendance policy was announced. A minimum attendance of 75% is required. Failure to meet this will result in a grade reduction.

[3 Jan 2023]

  • Piazza link: signup code will be shared only in the class Signups will close after early drop deadline.

  • We will be using PostreSQL 8.3.23 (note the version). You can download it from PostreSQL website.


Calendar

Date             Topic Self-study
03-Jan Organization and Introduction
04-Jan ER Modeling
06-Jan ER Modeling (contd.) + Intro to Relational Model Chapter 6 of DB-Book: Subclass, Weak Entitysets, Cardinality / Domain constraints, multi-way -> binary
10-Jan Relational model
11-Jan Intro to FDs
13-Jan FDs and intro. to Normalization
17-Jan Normalization Decomposition algorithms for BCNF
18-Jan Relational Algebra
20-Jan Relational Algebra and intro. SQL Data defn. in SQL
24-Jan Composition of operators, and Subqueries SQL for Modifying Tables
25-Jan Subqueries & Recursion JDBC/ODBC for SQL within programming languages (Chapter 5.1 of DB-Book)
27-Jan Constraints Authorization and ACL in SQL
31-Jan Constraints, Views, Indexes in SQL
01-Feb Intro to SQL Query Processing - Logical Plan Diagrams and Algebraic Optimization
03-Feb Algebraic Plans to Physical Plans
06-Feb-09-Feb Minor 1
10-Feb No Class (will be compensated by video lecture)
14-Feb Intro to Physical Plan Selection
15-Feb Intro to Query Optimization and Storage
17-Feb Database Storage
21-Feb Storage and Buffer Replacement Strategies
22-Feb Database Storage Organization
24-Feb No Class due to Project Presentations (will be compensated by video lecture)
28-Feb Introduction to Database Indexes
01-Mar Index Updates and Intro to B+-Trees
03-Mar B+-Trees: Query processing, Insert/Delete

About the Course

Objectives

Understand fundamentals of database management systems – with emphasis on relational data modeling, querying using SQL, and internals of RDBMS. Introduction to relational algebra/calculus, query optimization techniques, and transaction processing will be given. With this course the students will be equipped to develop database-backed applications and have a good understanding of the internal workings of a database management system.

Contents

Data models (ER, relational models, constraints, normalization), declarative querying (relational algebra, datalog, SQL), query processing/optimization (basics of indexes, logical/physical query plans, views) and transaction management (introduction to concurrency control and recovery). Overview of big data systems (Hadoop, Spark, etc.), no-sql systems (document, graph, key-value stores), in-memory and column stores. Course project – choice of web-based application, or database system extension.

Prerequisites:

  • Datastructures and algorithms (COL106)
  • Comfortable with programming in C/C++ (and Java)

Textbooks

  1. Database System Concepts (7 ed.) by Silberschatz, Korth and Sudarshan, McGraw-Hill. (resource website)
  2. Database Systems: The Complete Book by Garcia-Molina, Ullman, Widom, Prentice Hall.

Projects

Logistics

  1. Each project team can have up to 3 members. The teams once formed and registered will not be changed. If someone drops out of the course, then the remainder of the team will continue on. The deadline for team formation was 22nd January 2023.
  2. Two kinds of projects possible:
    • Data-base Backed Application: Develop a web-based application that will have a back-end database running over a PostgreSQL instance. We will soon share the PostgreSQL instance where you will develop you database.
    • Database Systems Internals: Develop a novel extension / feature on top of a fully-functional, open-source database system (preferably a relational database system, but others also can be explored with instructor’s permission).
  3. Your team must decide on the project type and an initial project description by 14th February 2023. Instructor will confirm / reject the proposal by 17th February 2023. You may proceed accordingly.

Guidelines

Application Project

  1. Following are the necessary features of the application:
    • A web-based front-end (can be developed using any Web application framework such as Flask, Ruby on Rails, Django, Laravel, etc.)
    • The backend will be PostgreSQL 12. There will be shared instance that we will set up, and give access to each project team to set up their own database on this shared instance. The specific instructions for this will be posted over Piazza by mid February, 2023.
  2. Following are the minimum requirements on the database you use:
    • There should be at least 5-6 relations
    • The database should be real (not synthetically generated) and consist of at least 5,000 tuples (the larger the better)
    • There should be usecases for analytics as well as transactional aspects to the database – it can not be ``read-only” database
    • You must identify and justify all the functional dependencies in your setting, show how your database preserves all the FDs, and develop an E-R / schema diagram
  3. Projects will evaluated (partially) based on the database features that you make use of – such as, triggers, constraints, indexes, views, etc.
  4. There will be multiple intermediate submission deadlines (to be announced soon) at regular intervals (typicaly 10 day intervals).
  5. Some example projects include:
    • COVID data management, with regular updates from various health agencies
    • A fully functional social network application
    • Database systems for specialist domains like healthcare, judiciary, education, stock markets, jobsearch, public utilities etc.
    • Do not flog the dead horse – e.g., library book lending application, bibliography management system, sports-related applicatin etc. These have been developed repeatedly and with no interesting features.

Systems Project

  1. Following are the necessary features:
    • Your project should functionally extend in a non-trivial manner, a state-of-the-art (only exception is an older version of PostgreSQL) fully-functional, open-source database system.
    • Extensions proposed should not break any existing features of the database system, and should not involve a simple porting of an existing extension (for e.g., do not take PostgreSQL extension for version 9 and reimplement it on a different version).
    • You must gather large dataset which can be used to demonstrate the functioning of new features.
    • There should be clear, measureable metrics that can be used for evaluating your extensions – for e.g., you can not have a metric more beautiful visualizations or support storing images (without any operators on them).
  2. Projects are expected to non-trivially extend the underlying database system (in C/C++/Java) in all aspects such as transactions, query processing, storage, indexing, language extensions etc. You are encouraged to base your work on research paper(s) appearing in venues like SIGMOD, PVLDB and CIDR within the last 4-5 years.
  3. Although you are strongly encouraged to use a relational database management system, you are not disallowed from using a RDF, graph, document databases.
  4. Projects will evaluated (partially) based on the sophistication and utility of the added feature, implementation effort and design/implementation choices made, understanding of the underlying database system, etc.
  5. Some example projects include:
    • Database cracking project – physical reorganisation of data so that queries are optimized without having to build indexes separately (and the data is physically clustered)
    • Reasoner on top of PostgreSQL - add support for query-time OWL like reasoning.
    • Support image data in PostgreSQL – add support in terms of adding image object detection like semantics for querying images.
    • Integrate pre-trained foundational models into enriching semantic querying capabilities of PostgreSQL.

FAQ

  1. I really want to take this course, but there are no vacancies, what should I do?

    • The course registration limit will not be increased. You may have to wait for someone to drop out –invariably someone will in the next few weeks, and try to register.
  2. Which programming language(s) are used in this course?

    • We will use C/C++ (and in some rare cases, Java) for all programming assignments. Python or any other language will not be used.
  3. What background is required to do well in this course?

    • Strong programming proficiency is essential (in C/C++) along with good knowledge of data-structures.
    • You are also expected to know (or be willing to learn) how to read, modify, debug large-scale code.
  4. Can I sit-through the course or audit the course?

    • Auditing the course is possible, although not encouraged (by me).
    • Audit-pass criteria is: at least B- grade, and you must submit all assignments.

Avatar
Srikanta Bedathur
DS Chair of Artificial Intelligence

Related