Table of Contents
All material in this website is subject to change. Please keep checking at regular intervals.
Credit Structure (L-T-P):3-0-2 (4 credits)
Course slot: D (Tue.,Wed., Fri. 9:00-10:00)
Lecture location: LH 410
Teaching Assistants: Anjali, Rohan Debbarma, Harsh Pandey, Nikhil Sharma, Neeraj Chaudhary, Sahil Manchanda (part-time), Rakshita Chaudhary (UG-TA), Aditya (UG-TA), Tushar Singla (UG-TA), Mohit Sharma (UG-TA)
Location: tentatively it is Room 425 in Bharti Building. Please contact the TA ahead of time over Piazza to confirm the location.
|Instructor / TA||Office hour|
|Anjali||11:30 - 12:30 Thu.|
|Garima||11:00 - 12:00 Tue.|
|Neeraj||10:00 - 11:00 Mon.|
|Rohan||10:00 - 11:00 Tue.|
|Nikhil||10:00 - 11:00 Mon.|
|Harsh||16:30 - 17:30 Tue.|
|Srikanta||17:00 - 18:00 Tue.|
|Assignments (3 x 10% each)||30%|
NEWS / UPDATES
01 Mar 2023] Minor 1 Marks uploaded to Moodle (and frozen)
01 Mar 2023] Demo for interpretation of query diagrams on Picasso (recording link on Piazza)
27 Feb 2023] Assignment 1 Marks - Initial Release
23 Feb 2023] Picasso Help Session
18 Feb 2023] Assignment 2 Released
17 Feb 2023] Project Proposal Submission Deadline
06 Feb 2023] Minor-1 Begins
27 Jan 2023] Slides for Week 4 uploaded in the shared folder
25 Jan 2023] Assignment 1 released on Moodle. Deadline 3rd February 2023.
22 Jan 2023] Only from Assignment 1 onwards, we will be using PostgreSQL 8.4.22. Please note the change in the recommended version. The build process remains the same.
20 Jan 2023] Slides for Week 3 uploaded in the shared folder
13 Jan 2023] Slides for Week 2 uploaded in the shared folder (link below)
10 Jan 2023] Ungraded homework 1 is uploaded on Moodle
6 Jan 2023] Slides for Week 1 uploaded on Moodle
4 Jan 2023] Attendance policy was announced. A minimum attendance of 75% is required. Failure to meet this will result in a grade reduction.
3 Jan 2023]
Piazza link: signup code will be shared only in the class Signups will close after early drop deadline.
We will be using PostreSQL 8.3.23 (note the version). You can download it from PostreSQL website.
||Organization and Introduction|
||ER Modeling (contd.) + Intro to Relational Model||Chapter 6 of DB-Book: Subclass, Weak Entitysets, Cardinality / Domain constraints, multi-way -> binary|
||Intro to FDs|
||FDs and intro. to Normalization|
||Normalization||Decomposition algorithms for BCNF|
||Relational Algebra and intro. SQL||Data defn. in SQL|
||Composition of operators, and Subqueries||SQL for Modifying Tables|
||Subqueries & Recursion||JDBC/ODBC for SQL within programming languages (Chapter 5.1 of DB-Book)|
||Constraints||Authorization and ACL in SQL|
||Constraints, Views, Indexes in SQL|
||Intro to SQL Query Processing - Logical Plan Diagrams and Algebraic Optimization|
||Algebraic Plans to Physical Plans|
||No Class||(will be compensated by video lecture)|
||Intro to Physical Plan Selection|
||Intro to Query Optimization and Storage|
||Storage and Buffer Replacement Strategies|
||Database Storage Organization|
||No Class due to Project Presentations||(will be compensated by video lecture)|
||Introduction to Database Indexes|
||Index Updates and Intro to B+-Trees|
||B+-Trees: Query processing, Insert/Delete|
All slides will be uploaded at the following link at the end of each week: Slides
About the Course
Understand fundamentals of database management systems – with emphasis on relational data modeling, querying using SQL, and internals of RDBMS. Introduction to relational algebra/calculus, query optimization techniques, and transaction processing will be given. With this course the students will be equipped to develop database-backed applications and have a good understanding of the internal workings of a database management system.
Data models (ER, relational models, constraints, normalization), declarative querying (relational algebra, datalog, SQL), query processing/optimization (basics of indexes, logical/physical query plans, views) and transaction management (introduction to concurrency control and recovery). Overview of big data systems (Hadoop, Spark, etc.), no-sql systems (document, graph, key-value stores), in-memory and column stores. Course project – choice of web-based application, or database system extension.
- Datastructures and algorithms (COL106)
- Comfortable with programming in C/C++ (and Java)
- Database System Concepts (7 ed.) by Silberschatz, Korth and Sudarshan, McGraw-Hill. (resource website)
- Database Systems: The Complete Book by Garcia-Molina, Ullman, Widom, Prentice Hall.
- Each project team can have up to 3 members. The teams once formed and registered will not be changed. If someone drops out of the course, then the remainder of the team will continue on. The deadline for team formation was 22nd January 2023.
- Two kinds of projects possible:
- Data-base Backed Application: Develop a web-based application that will have a back-end database running over a PostgreSQL instance. We will soon share the PostgreSQL instance where you will develop you database.
- Database Systems Internals: Develop a novel extension / feature on top of a fully-functional, open-source database system (preferably a relational database system, but others also can be explored with instructor’s permission).
- Your team must decide on the project type and an initial project description by 14th February 2023. Instructor will confirm / reject the proposal by 17th February 2023. You may proceed accordingly.
- Following are the necessary features of the application:
- A web-based front-end (can be developed using any Web application framework such as Flask, Ruby on Rails, Django, Laravel, etc.)
- The backend will be PostgreSQL 12. There will be shared instance that we will set up, and give access to each project team to set up their own database on this shared instance. The specific instructions for this will be posted over Piazza by mid February, 2023.
- Following are the minimum requirements on the database you use:
- There should be at least 5-6 relations
- The database should be real (not synthetically generated) and consist of at least 5,000 tuples (the larger the better)
- There should be usecases for analytics as well as transactional aspects to the database – it can not be ``read-only” database
- You must identify and justify all the functional dependencies in your setting, show how your database preserves all the FDs, and develop an E-R / schema diagram
- Projects will evaluated (partially) based on the database features that you make use of – such as, triggers, constraints, indexes, views, etc.
- There will be multiple intermediate submission deadlines (to be announced soon) at regular intervals (typicaly 10 day intervals).
- Some example projects include:
- COVID data management, with regular updates from various health agencies
- A fully functional social network application
- Database systems for specialist domains like healthcare, judiciary, education, stock markets, jobsearch, public utilities etc.
- Do not flog the dead horse – e.g., library book lending application, bibliography management system, sports-related applicatin etc. These have been developed repeatedly and with no interesting features.
- Following are the necessary features:
- Your project should functionally extend in a non-trivial manner, a state-of-the-art (only exception is an older version of PostgreSQL) fully-functional, open-source database system.
- Extensions proposed should not break any existing features of the database system, and should not involve a simple porting of an existing extension (for e.g., do not take PostgreSQL extension for version 9 and reimplement it on a different version).
- You must gather large dataset which can be used to demonstrate the functioning of new features.
- There should be clear, measureable metrics that can be used for evaluating your extensions – for e.g., you can not have a metric
more beautiful visualizations or support storing images(without any operators on them).
- Projects are expected to non-trivially extend the underlying database system (in C/C++/Java) in all aspects such as transactions, query processing, storage, indexing, language extensions etc. You are encouraged to base your work on research paper(s) appearing in venues like SIGMOD, PVLDB and CIDR within the last 4-5 years.
- Although you are strongly encouraged to use a relational database management system, you are not disallowed from using a RDF, graph, document databases.
- Projects will evaluated (partially) based on the sophistication and utility of the added feature, implementation effort and design/implementation choices made, understanding of the underlying database system, etc.
- Some example projects include:
- Database cracking project – physical reorganisation of data so that queries are optimized without having to build indexes separately (and the data is physically clustered)
- Reasoner on top of PostgreSQL - add support for query-time OWL like reasoning.
- Support image data in PostgreSQL – add support in terms of adding image object detection like semantics for querying images.
- Integrate pre-trained foundational models into enriching semantic querying capabilities of PostgreSQL.
I really want to take this course, but there are no vacancies, what should I do?
- The course registration limit will not be increased. You may have to wait for someone to drop out –invariably someone will in the next few weeks, and try to register.
Which programming language(s) are used in this course?
- We will use C/C++ (and in some rare cases, Java) for all programming assignments. Python or any other language will not be used.
What background is required to do well in this course?
- Strong programming proficiency is essential (in C/C++) along with good knowledge of data-structures.
- You are also expected to know (or be willing to learn) how to read, modify, debug large-scale code.
Can I sit-through the course or audit the course?
- Auditing the course is possible, although not encouraged (by me).
- Audit-pass criteria is: at least B- grade, and you must submit all assignments.