Introduction to Machine Learning (ELL784)


General Information

No one shall be permitted to audit the course for an off-line (contact semester). People are welcome to sit through it, however. For an online semester, the special IITD audit rules apply. Owing to number constraints, we are compelled to open this course primarily to M.Tech, M.S.(R) and Ph.D. students of the EE Department and the Bharti School: for students with the following programme codes:
EEN/EEE/EEA/EET/EES/EEP/EEY/BSY/EEZ/BSZ/JVL/JOP/JTM/JVY
For others: it will be a first-come-first-served process. This course is not open to B.Tech and Dual Degree students, who are supposed to opt for ELL409 (Machine Intelligence and Learning). This is a Departmental Elective (DE), one of the `essential electives' for the Cognitive and Intelligent Systems (CIS) stream of the Computer Technology Group, Department of Electrical Engineering. A general note for all EE Machine Learning courses: students will be permitted to take only one out of the following courses: ELL409 (Machine Intelligence and Learning), and the two CSE Machine Learning courses: COL341 Machine Learning and COL774 Machine Learning. Those who do not fulfil the above criteria, and are found enrolled after the completion of the add-drop period will be forcibly removed from the course.

For those who fulfil the above criterion, you may note a cap of 50, on a first-come-first-served basis.
People are welcome to sit-through the lectures without a formal registration. Please drop an email to the instructor with your preferred email address for communication. You can additionally join the WhatsApp group for the course: https://chat.whatsapp.com/FmFkLWe6NuO4d1AKRMbxss

Credits: 3 (LTP: 3-0-0) [Slot C]

Schedule for Classes:

Please note that unless any further information is received, the classes will be in-person classes, conducted over MS-Teams. The instructor will teach on MS-Teams in the classroom (LH-318) with a USB tablet attachment and an USB microphone connected to his computer, which will enable automatic attendance logging, and lecture recording. All participants are to carry a device to the classroom (laptop computer/tablet PC (`Tab')/mobile phone) which has MS-Teams, either through an app, or on a browser (such as google-chrome). Participants are to keep the microphone (`mic') muted on their devices to prevent variable-delay enchoing and reverberations. Headphones/headsets are optional.
Tuesday
08:00 am - 09:00 am
LH-318 + MS-Teams (online)
Wednesday
08:00 am - 09:00 am
LH-318 + MS-Teams (online)
Friday
08:00 am - 09:00 am
LH-318 + MS-Teams (online)

Schedule for Examinations:

Minor-1: 07 Feb (Tue), 08:00am-09:00am, LH-408
Minor-2: 24 Mar (Fri), 07:30am-08:30am, LH-408
Major: 09 May (Tue), 08:00am-09:00am, LH-408

Teaching Assistants: 

Bidisha Dhara
Ravali Kuchibhotla

Books, Papers and other Documentation

Textbook:

Reference Books:

Papers:

Some Interesting Web Links:


Lecture Schedule, Links to Material

Please see the link to the II Sem (Spring) 2011-2022 offering of this course, for an idea of the approximate structure of the course.
S.No.
Topics
Lectures
Instructor
References/Notes
0
Introduction to Machine Learning
01-01
SDR
Flavours of Machine Learning: Unsupervised, Supervised, Reinforcement, Hybrid models. Decision Boundaries: crisp, and non-crisp, optimisation problems. Examples of unsupervised learning.
03 Jan (Tue) {lecture#01}
SDR
[Online-only class: MS-Teams: 07:00am-08:00am]

MS-Teams folder: slides_k_means_em1_03jan23.pdf, video_k_means_em1_03jan23.mp4
1
Unsupervised Learning:
K-Means, Gaussian Mixture Models, EM
01-06
SDR
[Bishop Chap.9], [Do: Gaussians], [Do: More on Gaussians], [Ng: K-Means], [Ng: GMM], [Ng: EM], [Smyth: EM]
The K-Means algorithm: Introduction. Algorithms: history, flavours. A mathematical formulation of the K-Means algorithm. The Objective function to minimise. The K-Means algorithm: The Objective function to minimise.
03 Jan (Tue) {lecture#01}
SDR
MS-Teams folder: slides_k_means_em1_03jan23.pdf, video_k_means_em1_03jan23.mp4
The basic K-Means algorithm, computation complexity issues: each step, overall. Limitations of K-Means.
04 Jan (Wed) {lecture#02}
SDR
MS-Teams folder: video_k_means_em2_04jan23.mp4, slides_k_means_em2_04jan23.pdf
Limitations of K-Means. K-Means: Alternate formulation with a distance threshold. An introduction to Gaussian Mixture Models.
06 Jan (Fri) {lecture#03}
SDR
MS-Teams folder: video_k_means_em3_06jan23.mp4, slides_k_means_em3_06jan23.pdf, lecture_notes_k_means_em3_06jan23.pdf
The Bayes rule, and Responsibilities. Maximum Likelihood Estimation. Parameter estimation for a mixture of Gaussians, starting with a simple 1-D single Gaussian case. ML-Estimation: the simple case of one 1-D Gaussian, to the general case of K D-dimensional Gaussians. The Mahalanobis Distance.
10 Jan (Tue) {lecture#04}
SDR
MS-Teams folder: video_k_means_em4_10jan23.mp4, slides_k_means_em4_10jan23.pdf
The general case of K D-dimensional Gaussians (contd). Getting stuck, using Lagrange Multipliers. The EM Algorithm for Gaussian Mixtures.
Application: Assignment 1: The Stauffer and Grimson Adaptive Background Subtraction Algorithm. An introduction to the basic set of interesting heuristics!
11 Jan (Wed) {lecture#05}
SDR
MS-Teams folder: video_k_means_em5_11jan23.mp4, slides_k_means_em5_11jan23.pdf
The Stauffer and Grimson algorithm (contd)
13 Jan (Fri) {lecture#06}
SDR
MS-Teams folder:
2
Unsupervised Learning: EigenAnalysis:
PCA, LDA and Subspaces
06-10
SDR
[Ng: PCA], [Ng: ICA], [Burges: Dimension Reduction], [Bishop Chap.12]
Introduction to Eigenvalues and Eigenvectors. Properties of Eigenvalues and Eigenvectors.
17 Jan (Tue) {lecture#07}
SDR
MS-Teams folder: video_eigen1_17jan23.mp4, slides_eigen1_17jan23.pdf
Properties of Eigenvalues and Eigenvectors (contd). Gram-Schmidt Orthogonalisation, other properties. The KL Transform (contd). The SVD and its properties.
18 Jan (Wed) {lecture#08}
SDR
MS-Teams folder: video_eigen2_18jan23.mp4, slides_eigen2_18jan23.pdf
The SVD and its properties (contd). Application: Assignment 2: Eigenfaces and Fisherfaces
20 Jan (Fri) {lecture#09}
SDR
MS-Teams folder: video_eigen3_linear1_20jan23.mp4, slides_eigen3_20jan23.pdf
3
Linear Models for Regression, Classification
10-14
SDR
[Bishop Chap.3], [Bishop Chap.4], [Ng: Supervised, Discriminant Analysis], [Ng: Generative]
General introduction to Regression and Classification.
20 Jan (Fri) {lecture#09}
SDR
MS-Teams folder: video_eigen3_linear1_20jan23.mp4, slides_linear1_20jan23.pdf
Linearity and restricted non-linearity. Maximum Likelihood and Least Squares. The Moore-Penrose Pseudo-inverse. Regularised Least Squares.
Three approaches to classification: restricted non-linear models (linear combination of possible non-linear feature transformations). Introduction to linear models: equation of a line in terms of the physical significance of the space, and the weights w.
24 Jan (Tue) {lecture#10}
SDR
MS-Teams folder: video_linear2_24jan23.mp4, slides_linear2_24jan23.pdf, lecture_notes_linear2_24jan23.pdf
Introduction to linear models: equation of a line in terms of the physical significance of the space, and the weights w (contd). Linear Discriminant Functions: 2 classes, and K classes. Fisher's Linear Discriminant (basic build-up).
25 Jan (Wed) {lecture#11}
SDR
MS-Teams folder: video_linear3_25jan23.mp4, slides_linear3_25jan23.pdf
Fisher's Linear Discriminant. Application: Assignment 2: Eigenfaces and Fisherfaces
26 Jan (Thu) {lecture#12}
SDR
[Online-only class: MS-Teams: 07:00am-08:00am]

(Make-up lecture for the missed 27 Jan (Fri) lecture)


MS-Teams folder: video_linear4_svm1_26jan23.mp4, slides_linear4_26jan23.pdf,
---
27 Jan (Fri) {lecture#xx}
SDR
(No class)

4
SVMs and Kernels
12-18
SDR
[Bishop Chap.7], [Alex: SVMs], [Ng: SVMs], [Burges: SVMs], [Bishop Chap.6]
SVMs: the concept of the margin.
26 Jan (Thu) {lecture#12}
SDR
[Online-only class: MS-Teams: 07:00am-08:00am]

(Make-up lecture for the missed 27 Jan (Fri) lecture)


MS-Teams folder: video_linear4_svm1_26jan23.mp4, slides_svm1_26jan23.pdf
SVMs: the optimisation problem, getting the physical significance of the y = +1 and y = -1 lines. The two `golden' regions for the 2-class perfectly separable case. The generalised canonical representation in terms of one inequation. The basic SVM optimisation: the primal and the dual problems. An illustration of the kernel trick.
29 Jan (Sun) {lecture#13}
SDR
[Online-only class: MS-Teams: 07:30pm-08:30pm]

(Make-up lecture for the missed 03 Feb (Fri) lecture)


MS-Teams folder: video_svm2_29jan23.mp4, slides_svm2_29jan23.pdf
Lagrange Multipliers and the KKT Conditions. An illustration of the kernel trick.
31 Jan (Tue) {lecture#14}
SDR
MS-Teams folder: video_svm3_31jan23.mp4, slides_svm3_31jan23.pdf
Lagrange Multipliers and the KKT Conditions (contd). The Soft-Margin SVM.
01 Feb (Wed) {lecture#15}
SDR
MS-Teams folder: video_svm4_01feb23.mp4, slides_svm4_01feb23.pdf
---
03 Feb (Fri) {lecture#xx}
SDR
(No class)

---
Minor-1
Minor-1: 07 Feb (Tue), 08:00am-09:00am, LH-408
---
The Soft-Margin SVM (contd). Abstracting the basic concepts of the hard-margin SVM, to use in a similar formulation. The function to optimise, the inequality constraints, the KKT conditions from Lagrange's theory. The Primal and Dual formulations. Lagrange Multipliers and the KKT Conditions.

Introduction to Kernels.
10 Feb (Fri) {lecture#16}
SDR
MS-Teams folder: video_svm5_kernel1_10feb23.mp4, slides_svm5_kernel1_10feb23.pdf, lecture_notes_kernel1_10feb23.pdf
Introduction to Kernels. Kernels in Regression.
14 Feb (Tue) {lecture#17}
SDR
MS-Teams folder: video_kernel2_14feb23.mp4, lecture_notes_kernel2_14feb23.pdf
Kernels: construction, properties, examples.
15 Feb (Wed) {lecture#18}
SDR
MS-Teams folder: video_kernel3_nn1_15feb23.mp4, lecture_notes_kernel3_15feb23.pdf
5
Neural Networks
18-41
SDR
[Bishop Chap.5], [DL Chap.6], [DL Chap.9]
Introduction to Neural Networks. Perceptron: a linear classifier. A non-neural network interpretation.
15 Feb (Wed) {lecture#18}
SDR
MS-Teams folder: video_kernel3_nn1_15feb23.mp4, slides_nn1_15feb23.pdf
The Perceptron: a neural interpretation. The Perceptron optimisation. Weight update. Conventions of the Multi-layer Perceptron (MLP). The X-OR problem with the Perceptron.
17 Feb (Fri) {lecture#19}
SDR
MS-Teams folder: video_nn2_17feb23.mp4, lecture_notes_nn2_17feb23.pdf, slides_nn2_17feb23.pdf
The XOR problem with the Perceptron with a kernel function. Examples of neural network activation functions. Factorisation: probability, differential calculus.
21 Feb (Tue) {lecture#20}
SDR
MS-Teams folder: video_nn3_21feb23.mp4, slides_nn3_21feb23.pdf, lecture_notes_nn3_21feb23.pdf
Pre-backpropagation mathematical fundamentals (contd). Factorisation, and the chain rule in differential calculus. The Perceptron and the Multi-Layer Perceptron (MLP).
22 Feb (Wed) {lecture#21}
SDR
MS-Teams folder: video_nn4_22feb23.mp4, lecture_notes_nn4_22feb23.pdf
The BackPropagation Algorithm: a computational mechanism for the chain rule in differential calculus.
28 Feb (Tue) {lecture#22}
SDR
MS-Teams folder: video_nn5_28feb23.mp4, lecture_notes_nn5_28feb23.pdf
Interesting interpetations of MLPs. Three ways to have kernel functions/feature trasnformations. The interpretation of a hidden layer of an MLP as a kernel function or a (possibly non-linear) feature transformation. Trying the XOR problem as a regression problem. It will not work, as expected. Now, trying this with a hand-crafted example of an MLP with hand-crafted weights: a single hidden layer with 2 neurons, and the hidden layer having the ReLU activation function.
Emboldened by this, we try implementing other "basic" Boolean functions with handcrafted neural networks. Attempts with a traditional sigmoid activation function. Getting some hands-on feel with weight values and sigmoid-specific constraints for ranges of inputs. The NOT and AND functions.
01 Mar (Wed) {lecture#23}
SDR
MS-Teams folder: video_nn6_01mar23.mp4, lecture_notes_nn6_01mar23.pdf
Other "basic" Boolean functions with hand-crafted neural networks (contd).
From MLPs, a few steps towards deep neural networks. Inputs as images. The interpretation of a weight vector as an image. Empirical observations regarding the initial layers of neural networks, and the overall number of connections. The initial layers compute directional derivatives. The overall connections are such that only a small number of local connections are effectively there, in the neighbourhood of a neuron.
03 Mar (Fri) {lecture#24}
SDR
MS-Teams folder: video_nn7_03mar23.mp4, lecture_notes_nn7_03mar23.pdf
Some insight into 2-D inputs, and intepreting first layer weights as images, and its importance for CNNs (contd): an empirical observation, and evidence for the idea of local receptive fields, low-level differentiation/edge features, with some biological motivation as well, from the visual cortex. The empirical result about the interpretation of weights in the first layer of a neural network with many layers.
Yet another example of an XOR implementation (semi-hand-crafted), with a different architecture, different activation function, and different inputs. The minterm connotation of the hidden layer. iSome insight into the expressive power of feed-forward neural networks: the insight from shallow networks with a large width. An example with asymmetric values for inputs and outputs for a digital circuit, and estimating neural network parameters, for D input neurons and 2^D neurons in one hidden layer, and one output neuron.
Convolution and Correlation: A domain-independent introduction. Linear Shift/Time Invariant Systems
14 Mar (Tue) {lecture#25}
SDR
MS-Teams folder: video_nn8_14mar23.mp4, lecture_notes_nn8_14mar23.pdf
Linear Shift/Time Invariant Systems: characterisation of a system in terms of a `standard' input (the single impulse, the Kronecker Delta for instance, in the discrete domain, and the Dirac Delta, in the continuous domain). The impulse response gives a hardware-invariant characterisation of a system. Examples from Electrical and Civil engineering.
Why is Convolution so fundamental in Linear Shift/Time Invariant systems? A graphical proof.
Introduction to invariance in neural networks: translational invariance (`Where's Wally?' or `Where's Waldo?').
15 Mar (Wed) {lecture#26}
SDR
MS-Teams folder: video_nn9_15mar23.mp4, lecture_notes_nn9_15mar23.pdf
Some characteristics of deep neural networks: local receptive fields, strided convolutions, parameter sharing, pooling.
Two basic deep neural network architectures: the contractive structure, and the bow-tie structure.
Introduction to LeNet-5 (1989), the first successful basic deep neural network architecture. A basic contractive architecture. Illustration of the basic operations in the first two layers.
Introduction to an Auto-encoder: a basic bow-tie structure.
17 Mar (Fri) {lecture#27}
SDR
MS-Teams folder: video_nn10_17mar23.mp4, lecture_notes_nn10_17mar23.pdf
Autoencoder basics: (contd.)
Summarising a contractive architecture: convolution, pooling and finally, a fully connected layer. A timeline of deep networks and trends.
An example of the first successful contractive architecture, LeNet-5 (1989).
19 Mar (Sun) {lecture#28}
SDR
[Online-only class: MS-Teams: 08:00am-09:00am]

(Make-up lecture for the missed 31 Mar (Fri) lecture)

MS-Teams folder: video_nn11_19mar23.mp4, lecture_notes_nn11_19mar23.pdf
LeNet-5 detailed description (contd). Layers C1, S2, the weird C3 with the strange asymmetric mapping.
21 Mar (Tue) {lecture#29}
SDR
MS-Teams folder: video_nn12_21mar23.mp4, lecture_notes_nn12_21mar23.pdf
LeNet-5 detailed description (contd). Completing the LeNet-5 layers: including the fully connected layers at the end, and the SoftMax activation function in the output.
An introduction to some deep learning concepts prior to exploring the second successful representative architectures, AlexNet. Pooling. Local Response Normalisation.
22 Mar (Wed) {lecture#30}
SDR
MS-Teams folder: video_nn13_22mar23.mp4, lecture_notes_nn13_22mar23.pdf
---
Minor-2
Minor-2: 24 Mar (Fri), 07:30am-08:30am, LH-408
---
An introduction to some deep learning concepts (contd): some concepts in deep networks.
Batch Normalisation, Residual/Skip/Highway Connections
28 Mar (Tue) {lecture#31}
SDR
MS-Teams folder: video_nn14_28mar23.mp4, lecture_notes_nn14_28mar23.pdf
An introduction to some deep learning concepts (contd): some concepts in deep networks. Residual/Skip/Highway Connections, Dropout.
A detailed description of the second class of successful deep architectures, AlexNet.
30 Mar (Thu) {lecture#32}
SDR
[Online-only class: MS-Teams: 07:00am-08:00am]

(Make-up lecture for the missed 01 Apr (Sat) lecture)

MS-Teams folder: video_nn15_30mar23.mp4, lecture_notes_nn15_30mar23.pdf

31 Mar (Fri) {lecture#xx}
SDR
(No class)

01 Apr (Sat) {lecture#xx}
SDR
(No class)
AlexNet (contd).
05 Apr (Wed) {lecture#33}
SDR
MS-Teams folder: video_nn16_05apr23.mp4, lecture_notes_nn16_05apr23.pdf
AlexNet (contd).
The VGG family: VGG-16 and VGG-19.
11 Apr (Tue) {lecture#34}
SDR
MS-Teams folder: video_nn17_11apr23.mp4, lecture_notes_nn17_11apr23.pdf
The VGG family: VGG-16 and VGG-19 (contd): the baisc idea of small 3x3 convolutions to replace larger convolutions while keeping a small parameter count.
The ResNet family: the main points have been covered before, in the discussion on residual conenctions and their use.
Introduction to the Inception architecture and its facets. Carrying over the small convolution kernel idea from VGG-16/VGG-19. The significance of concatenating scale information from the input to get a possibly larger output size. Asymmetric convolutions.
12 Apr (Wed) {lecture#35}
SDR
MS-Teams folder: video_nn18_12apr23.mp4, lecture_notes_nn18_12apr23.pdf
The Inception architecture (contd): the use of asymmetric convolutions, and feature concatenation (to get richer information in terms of scale). The physical significance of 1x1 convolutions.
Recurrent Neural Networks: introduction in terms of hardware saving, advantage in terms of modelling sequential processes, and inputs which are not of a fixed size, and disadvantage in terms of the short-term memory problem: backpropagation in time with the vanishing gradient problem.
14 Apr (Fri) {lecture#36}
SDR
[Online-only class: MS-Teams: 08:00am-09:00am]
(Make-up lecture for the missed 28 Apr (Fri) lecture)

MS-Teams folder: video_nn19_14apr23.mp4, lecture_notes_nn19_14apr23.pdf
Recurrent Neural Networks (contd.): more on the `backpropagation through time' concept, leading to `Short-Term Memory'.
RNNs: dealing with text inputs and outputs: 1-hot-encoding.
RNNs: what situations can an RNN model?
An introduction to the two ways out: LSTMs and GRUs. LSTM cell basics.
18 Apr (Tue) {lecture#37}
SDR
MS-Teams folder: video_nn20_18apr23.mp4, lecture_notes_nn20_18apr23.pdf
LSTM cell basics (contd).
19 Apr (Wed) {lecture#38}
SDR
MS-Teams folder: video_nn21_19apr23.mp4, lecture_notes_nn21_19apr23.pdf
21 Apr (Fri) {lecture#xx}
SDR
(No class)
LSTMs and GRUs (contd.)
Region CNNs (R-CNNs): basics
23 Apr (Sun) {lecture#39}
SDR
[Online-only class: MS-Teams: 08:00am-09:00am]

(Make-up lecture for the missed 29 Apr (Sat) lecture)

MS-Teams folder: video_nn22_23apr23.mp4, lecture_notes_nn22_23apr23.pdf
R-CNNs (contd). A bit on the Viola-Jones Face detector (a bit of history on filters applied to regions of images).
25 Apr (Tue) {lecture#40}
SDR
MS-Teams folder: video_nn23_25apr23.mp4, lecture_notes_nn23_25apr23.pdf
RNNs: an example with categorical data with different feature vector lengths.
26 Apr (Wed) {lecture#41}
SDR
MS-Teams folder: video_nn24_26apr23.mp4, lecture_notes_nn24_26apr23.pdf
28 Apr (Fri) {lecture#xx}
SDR
(No class)
29 Apr (Sat) {lecture#xx}
SDR
(No class)
---
Major
09 May (Tue), 08:00am-09:00am, LH-408
---
---
xx
Mathematical Basics for Machine Learning
xx-xx
xx
[Burges: Math for ML], [Do, Kolter: Linear Algebra Notes],

The above list is (obviously!) not exhaustive. Other reference material will be announced in the class. The Web has a vast storehouse of tutorial material on AI, Machine Learning, and other related areas.



Assignments

... A combination of theoretical work as well as programming work.
Both will be scrutinized in detail for original work and thoroughness.
For programming assignments, there will be credit for good coding.
Sphagetti coding will be penalized.
Program correctness or good programming alone will not fetch you full credit ... also required are results of extensive experimentation with varying various program parameters, and explaining the results thus obtained.
Assignments will have to be submitted on or before the due date and time.
Late submissions will not be considered at all.
Unfair means will be result in assigning as marks, the number said to have been discovered by the ancient Indians, to both parties (un)concerned.
Assignment 1
Assignment 2
Assignment 3

Examinations and Grading Information

The marks distribution is as follows (out of a total of 100):
Minor I
25
Minor II
25
Assignments
25
Major
25
Grand Total
100

ELL784 Marks and Grades (Anonymised)

Some points about examinations, including the honour code:

Instructions for online examinations
Unfair means will be result in assigning as marks, the number said to have been discovered by the ancient Indians, to both parties (un)concerned.

Attendance Requirements:

Attendance requirements for Online Semesters: in accordance with the IIT Delhi rules for an online semester.
Illness policy: illness to be certified by a registered medical practioner.
Attendance in Examinations is Compulsory.


Course Feedback

Link to Course Feedback Form

Sumantra Dutta Roy, Department of Electrical Engineering, IIT Delhi, Hauz Khas,
New Delhi - 110 016, INDIA. sumantra@ee.iitd.ac.in