Introduction to Machine Learning (ELL784)

General Information

Owing to number constraints, we are compelled to open this course primarily to M.Tech, M.S.(R) and Ph.D. students of the EE Department and the Bharti School: for students with the following programme codes:
EEN/EEE/EEA/EET/EES/EEP/EEY/BSY/EEZ/BSZ/JVL/JOP/JTM/JVY
For others: it will be a first-come-first-served process. This course is not open to B.Tech and Dual Degree students, who are supposed to opt for ELL409 (Machine Intelligence and Learning). This is a Departmental Elective (DE), one of the `essential electives' for the Cognitive and Intelligent Systems (CIS) stream of the Computer Technology Group, Department of Electrical Engineering. A general note for all EE Machine Learning courses: students will be permitted to take only one out of the following courses: ELL409 (Machine Intelligence and Learning), and the two CSE Machine Learning courses: COL341 Machine Learning and COL774 Machine Learning. Those who do not fulfil the above criteria, and are found enrolled after the completion of the add-drop period will be forcibly removed from the course.

For those who fulfil the above criterion, you may note a cap of 50, on a first-come-first-served basis.
Audit pass criterion: overall marks equivalent to a C grade (including the two examinations, and the assignments)
People are welcome to sit-through the lectures without a formal registration. Please drop an email to the instructor with your preferred email address for communication. You can additionally join the WhatsApp group for the course: https://chat.whatsapp.com/Clo4X7iZqYm2eIaZTijf17

Credits: 3 (LTP: 3-0-0) [Slot C]

Schedule for Classes:

Please note that unless any further information is received, the classes will be in-person classes, conducted over MS-Teams. The instructor will teach on MS-Teams in the classroom (LH-318) with a USB tablet attachment and an USB microphone connected to his computer, which will enable automatic attendance logging, and lecture recording. All participants are to carry a device to the classroom (laptop computer/tablet PC (`Tab')/mobile phone) which has MS-Teams, either through an app, or on a browser (such as google-chrome). Participants are to keep the microphone (`mic') muted on their devices to prevent variable-delay enchoing and reverberations. Headphones/headsets are optional.

Tuesday	08:00 am - 09:00 am	LH-318 + MS-Teams (online)
Wednesday	08:00 am - 09:00 am	LH-318 + MS-Teams (online)
Friday	08:00 am - 09:00 am	LH-318 + MS-Teams (online)

Schedule for Examinations:

Mid-Term: 26 Feb (Mon), 08:00am-10:00pm, LH-308, LH-310 (Please check your room and seat number here)
End-Term: 01 May (Wed) 08:00am-10:00am, LH-310 (Please check your seat number here)

Teaching Assistants:

Aman Verma

Books, Papers and other Documentation

Textbook:

C. M. Bishop. Pattern Recognition and Machine Learning. First Edition. Springer, 2006. (Second Indian Reprint, 2015). [Bishop]

Reference Books:

I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press, 2016. [DL]
P. Flach. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. First Edition, Cambridge University Press, 2012.
S. J. Russell, P. Norvig. Artificial Intelligence: A Modern Approach. Third Edition, Prentice-Hall, 2010.
Y. S. Abu-Mostafa, M. Magdon-Ismail, H.-T. Lin. Learning from Data: A Short Course. First Edition, 2012.

Papers:

P. Domingos. A Few Useful Things to Know about Machine Learning. Communications of the ACM, vol. 55, no. 10, pp. 78 - 87, 2012.
C. Stauffer, W. E. L. Grimson. Adaptive Background Mixture Models for Real-Time Tracking. Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 246 - 252, 1999.
C. Stauffer, W. E. L. Grimson. Learning Patterns of Activity Using Real-Time Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747 - 757, 2000.
P. N. Belhumeur, J. P. Hespanha, D. J. Kiregman. Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711 - 720, 1997.

Some Interesting Web Links:

Lecture Schedule, Links to Material

Please see the [link] to the II Sem (Spring) 2022-2023 offering of this course, for an idea of the approximate structure of the course.
In case the MS-Teams forum does not allow access to files, try the following link: [link]

S.No.	Topics	Lectures	Instructor	References/Notes
0	Introduction to Machine Learning	01-02	SDR
	Flavours of Machine Learning: Unsupervised, Supervised, Reinforcement, Hybrid models. Decision Boundaries: crisp, and non-crisp, optimisation problems.	02 Jan (Tue) {lecture#01}	SDR	MS-Teams folder: slides_k_means_em1_02jan24.pdf, video_k_means_em1_02jan24.mp4
	Introduction (contd).	03 Jan (Wed) {lecture#02}	SDR	MS-Teams folder: slides_k_means_em2_03jan24.pdf, video_k_means_em2_03jan24.mp4
1	Unsupervised Learning: K-Means, Gaussian Mixture Models, EM	02-07	SDR	[Bishop Chap.9], [Do: Gaussians], [Do: More on Gaussians], [Ng: K-Means], [Ng: GMM], [Ng: EM], [Smyth: EM]
	The K-Means algorithm: Introduction. Algorithms: history, flavours. A mathematical formulation of the K-Means algorithm. The Objective function to minimise. The K-Means algorithm: The Objective function to minimise. The basic K-Means algorithm, computation complexity issues: each step, overall.	03 Jan (Wed) {lecture#02}	SDR	MS-Teams folder: slides_k_means_em2_03jan24.pdf, video_k_means_em2_03jan24.mp4
	Limitations of K-Means. K-Means: Alternate formulation with a distance threshold. An introduction to Gaussian Mixture Models. The Bayes rule, and Responsibilities. Maximum Likelihood Estimation. Parameter estimation for a mixture of Gaussians, starting with a simple 1-D single Gaussian case. ML-Estimation:	05 Jan (Fri) {lecture#03}	SDR	MS-Teams folder: video_k_means_em3_05jan24.mp4, slides_k_means_em3_05jan24.pdf
	ML-Estimation: the simple case of one 1-D Gaussian, to the general case of K D-dimensional Gaussians. The Mahalanobis Distance. Getting stuck, using Lagrange Multipliers.	09 Jan (Tue) {lecture#04}	SDR	MS-Teams folder: video_k_means_em4_09jan23.mp4, slides_k_means_em4_09jan23.pdf
	The EM Algorithm for Gaussian Mixtures. Application: Assignment 1: The Stauffer and Grimson Adaptive Background Subtraction Algorithm.	10 Jan (Wed) {lecture#05}	SDR	MS-Teams folder: video_k_means_em5_10jan24.mp4, slides_k_means_em5_10jan24.pdf
	The Stauffer and Grimson Adaptive Background Subtraction Algorithm (contd).	12 Jan (Fri) {lecture#06}	SDR	MS-Teams folder: video_k_means_em6_12jan24.mp4, slides_k_means_em6_12jan24.pdf,
	The Stauffer and Grimson Adaptive Background Subtraction Algorithm (contd).	16 Jan (Tue) {lecture#07}	SDR	MS-Teams folder: video_k_means_em7_eigen1_16jan24.mp4, slides_k_means_em7_16jan24.pdf
2	Unsupervised Learning: EigenAnalysis: PCA, LDA and Subspaces	07-10	SDR	[Ng: PCA], [Ng: ICA], [Burges: Dimension Reduction], [Bishop Chap.12]
	Introduction to Eigenvalues and Eigenvectors. Properties of Eigenvalues and Eigenvectors.	16 Jan (Tue) {lecture#07}	SDR	MS-Teams folder: video_k_means_em7_eigen1_16jan24.mp4, slides_eigen1_16jan24.pdf
	Properties of Eigenvalues and Eigenvectors (contd).	17 Jan (Wed) {lecture#08}	SDR	MS-Teams folder: video_eigen2_17jan24.mp4, slides_eigen2_17jan24.pdf
	Gram-Schmidt Orthogonalisation: an introduction. The KL Transform.	19 Jan (Fri) {lecture#09}	SDR	MS-Teams folder: video_eigen3_19jan24.mp4, slides_eigen3_19jan24.pdf
	The SVD and its properties (contd). Application: Assignment 2: Eigenfaces and Fisherfaces	20 Jan (Sat) {lecture#10}	SDR	MS-Teams folder: video_eigen4_linear1_20jan24.mp4, slides_eigen4_20jan24.pdf
3	Linear Models for Regression, Classification	10-15	SDR	[Bishop Chap.3], [Bishop Chap.4], [Ng: Supervised, Discriminant Analysis], [Ng: Generative]
	General introduction to Regression and Classification.	20 Jan (Sat) {lecture#10}	SDR	MS-Teams folder: video_eigen4_linear1_20jan24.mp4, slides_linear1_20jan24.pdf
	General introduction to Regression and Classification (contd). Linearity and restricted non-linearity.	23 Jan (Tue) {lecture#11}	SDR	MS-Teams folder: video_linear2_23jan24.mp4, slides_linear2_23jan24.pdf
	Maximum Likelihood and Least Squares. The Moore-Penrose Pseudo-inverse.	24 Jan (Wed) {lecture#12}	SDR	MS-Teams folder: video_linear3_24jan24.mp4, slides_linear3_24jan24.pdf,
	Regularised Least Squares. Classification: Three Approaches. Two generalisations of linearity: restricted non-linearity. Discriminant functions for 2 and K classes.	26 Jan (Fri) {lecture#13}	SDR	MS-Teams folder: video_linear4_26jan24.mp4, slides_linear4_26jan24.pdf,
	An elegant formulation for a K-class discriminant. Physical significance of an optimality formulation. Fisher's Linear Discriminant. A few furtive attempts at a derivation, followed by a constrained optimisation approach.	30 Jan (Tue) {lecture#14}	SDR	MS-Teams folder: video_linear5_30jan24.mp4, slides_linear5_30jan24.pdf
	Fisher's Linear Discriminant	31 Jan (Wed) {lecture#15}	SDR	[Online-only class: MS-Teams: 07:00am-08:00am] MS-Teams folder: video_linear6_svm1_31jan24.mp4, slides_linear6_31jan24.pdf
4	SVMs and Kernels	15-23	SDR	[Bishop Chap.7], [Alex: SVMs], [Ng: SVMs], [Burges: SVMs], [Bishop Chap.6]
	Introduction to SVMs: an overview.	31 Jan (Wed) {lecture#15}	SDR	[Online-only class: MS-Teams: 07:00am-08:00am] MS-Teams folder: video_linear6_svm1_31jan24.mp4, slides_svm1_31jan24.pdf
	---	02 Feb (Fri) {lecture#xx}	SDR	No class. Make-up class (online-only) on 26 Jan (Fri).
	The basic SVM formulation (contd). The concept of a margin. One inequality for two conditions. The three important lines: y = 0 as the decision boundary, and the y = +1 and y = -1 lines, and their physical significance. An elegant formulation with a particular formulation for the margin. Finding the `golden' regions: the formulation in terms of a line in the `golden' region.	06 Feb (Tue) {lecture#16}	SDR	MS-Teams folder: video_svm2_06feb24.mp4, slides_svm2_06feb24.pdf,
	The single-inequation characterisation of the `golden' regions. Building a constrained optimisation formulation for the SVM. This is one formualation which was historically the first, and happens to be incredibly elegant. Building the primal optimisation problem, getting the dual problem. The essence of the kernel trick: the vaDA and the doughnut example.	07 Feb (Wed) {lecture#17}	SDR	MS-Teams folder: video_svm3_08feb24.mp4, lecture_notes_svm3_08feb24.pdf
	Langange's theory for multiple equality/inequality constraints: development, and some practical take-home points. The KKT conditions and SVMs.	09 Feb (Fri) {lecture#18}	SDR	MS-Teams folder: video_svm4_09feb24.mp4, lecture_notes_svm4_09feb24.pdf
	Plain-vanilla computation of SVM parameters from a. The physical significance of the particular SVM formulation chosen. The soft-margin SVM. The penalty function, and its physical significance.	13 Feb (Tue) {lecture#19}	SDR	MS-Teams folder: video_svm5_13feb24.mp4, lecture_notes_svm5_13feb24.pdf
	The physical significance of the penalty function (contd). Taking stock. The basic soft-margin SVM formulation. The primal problem, the dual, and box constraints.	14 Feb (Tue) {lecture#20}	SDR	MS-Teams folder: video_svm6_14feb24.mp4, slides_svm6_14feb24.pdf
	Introduction to kernels. What is a kernel? Basic properties. Why does one use a kernel? Some examples of kernels. Kernels in regression.	16 Feb (Fri) {lecture#21}	SDR	MS-Teams folder: video_kernel1_16feb24.mp4, lecture_notes_kernel1_16feb24.pdf slides_kernel1_16feb24.pdf,
---	Mid-Term	Mid-Term: 26 Feb (Mon), 08:00am-10:00pm, LH-308, LH-310 (Please check your room and seat number here)	---
	Kernels in regression (contd).	04 Mar (Mon) {lecture#22}	SDR	MS-Teams folder: video_kernel2_04mar24.mp4, lecture_notes_kernel2_04mar24.pdf
	Properties of kernels.	05 Mar (Tue) {lecture#23}	SDR	MS-Teams folder: video_kernel3_nn1_05mar23.mp4, lecture_notes_kernel3_05mar23.pdf
5	Neural Networks	23-xx	SDR	[Bishop Chap.5], [DL Chap.6], [DL Chap.9]
	The Perceptron: A linear classifier. A non-neural connotation, and a neural one. The history of neural networks: Rosenblatt, Minsky, and the XOR problem.	05 Mar (Tue) {lecture#23}	SDR	MS-Teams folder: video_kernel3_nn1_05mar24.mp4, slides_nn1_05mar24.pdf
	The Perceptron learning criterion. Iterative Weight Update. A Multi-Layer Perceptron: basic structure and notations. A note about activation functions.	06 Mar (Wed) {lecture#24}	SDR	MS-Teams folder: video_nn2_06mar24.mp4, slides_nn2_06mar24.pdf
	More about activation functions. One way of implementing an XOR function: the kernel interpretation of the hidden layer.	12 Mar (Tue) {lecture#25}	SDR	video_nn3_12mar24.mp4, slides_nn3_12mar24.pdf, lecture_notes_nn3_12mar24.pdf
	A discussion on activation functions. A (failed) XOR attempt using regression.	13 Mar (Wed) {lecture#26}	SDR	MS-Teams folder: video_nn4_13mar24.mp4, slides_nn4_13mar24.pdf, lecture_notes_nn4_13mar24.pdf
	Another XOR implementation, this time, a completely hand-crafted `compact' example, with a ReLU activation function. Another piece of basic philosophy, when designing neural networks. Yet another XOR implementation, hand-crafted again, with a sigmoid activation function.	15 Mar (Fri) {lecture#27}	SDR	MS-Teams folder: video_nn5_15mar24.mp4, slides_nn5_15mar24.pdf, lecture_notes_nn5_15mar24.pdf
	The previous XOR implementation (contd). The build-up to Backpropagation: factorisation basics. The same philosophy is there in probability, and differentiation! Partial derivatives and Taylor's series.	19 Mar (Tue) {lecture#28}	SDR	MS-Teams folder: video_nn6_19mar24.mp4, lecture_notes_nn6_19mar24.pdf
	The build-up to Backpropagation: factorisation basics. The case of 3 or more variables, and having another variable, on which all of the previous variables depend. Extending to itself to multiple variables. Illustration of the chain rule. Backpropagation. Why is this needed? The basic philosophy. A simple example. The first two steps.	20 Mar (Wed) {lecture#29}	SDR	MS-Teams folder: video_nn7_20mar24.mp4 lecture_notes_nn7_20mar24.pdf
	Backpropagation (contd). Alternatives.	22 Mar (Fri) {lecture#30}	SDR	MS-Teams folder: video_nn8_22mar24.mp4, lecture_notes_nn8_22mar24.pdf
	A summary of some intuitive results of a neural network as a function approximator. Empirical observations, in going from a multi-layer perceptron to a deep neural network. Visualising weights as an image, the concept of local receptive fields, the significance of the first layer of connections as local 2-D image derivative operators. Why the term `Convolutional Neural Network' is an oxymoron.	23 Mar (Sat) {lecture#31}	SDR	[Online-only class: MS-Teams: 08:00am-09:00am] MS-Teams folder: video_nn9_23mar24.mp4, lecture_notes_nn9_23mar24.pdf
	Why does one want to relate the inner product `correlation' operation to convolution? The well-accepted Linear Shift Invariant Systems theory in Electrical Engineering, which carries over to other disciplines as well. Examples.	24 Mar (Sun) {lecture#32}	SDR	[Online-only class: MS-Teams: 08:00am-09:00am] MS-Teams folder: video_nn10_24mar24.mp4, lecture_notes_nn10_24mar24.pdf
	The physical significance of Linear Shift Invariant (LSI) systems, and why Convolution is a fundamental operation for LSI systems. Characteristics of Deep Neural Networks: Local Receptive Fields, Strided Convolutions, Weight/Parameter Sharing.	02 Apr (Tue) {lecture#33}	SDR	MS-Teams folder: video_nn11_02apr24.mp4, slides_nn11_02apr24.pdf, lecture_notes_nn11_02apr24.pdf
	Another feature of deep networks: pooling. LeNet-5 (1989): the first two layers.	03 Apr (Wed) {lecture#34}	SDR	MS-Teams folder: video_nn12_03apr24.mp4, lecture_notes_nn12_03apr24.pdf
	LeNet-5 (1989) (contd).	05 Apr (Fri) {lecture#35}	SDR	MS-Teams folder: video_nn13_05apr24.mp4, lecture_notes_nn13_05apr24.pdf
	More features of deep networks: pooling, local response normalisation (AlexNet, 2012), Batch Normalisation (2015), Residual Connections (ResNet, 2015), Dropout (AlexNet, 2012).	09 Apr (Tue) {lecture#36}	SDR	MS-Teams folder: video_nn14_09apr24.mp4, lecture_notes_nn14_09apr24.pdf
	AlexNet and CaffeNet (2012): some details. The second prominent deep learning architecture.	10 Apr (Wed) {lecture#37}	SDR	MS-Teams folder: video_nn15_10apr24.mp4, lecture_notes_nn15_10apr24.pdf
	AlexNet and CaffeNet (2012) (contd). VGG16 and VGG19 (2014): simpler constant-sized convolutions leads to fewer parameters, covering the same area of interest (albeit in a filtered way). 16-19 layers in place of the 5 of AlexNet.	12 Apr (Fri) {lecture#38}	SDR	MS-Teams folder: video_nn16_12apr24.mp4, lecture_notes_nn16_12apr24.pdf
	---	16 Apr (Tue) {lecture#xx}	SDR	No class. Make-up class (online-only) on 23 Mar (Sat).
	The VGG architecture parameter saving examples. Motivation for ResNet. The Inception architecture.	19 Apr (Fri) {lecture#39}	SDR	MS-Teams folder: video_nn17_19apr24.mp4, lecture_notes_nn17_19apr24.pdf
	---	20 Apr (Sat) {lecture#xx}	SDR	No class. Make-up class (online-only) on 21 Apr (Sun).
	A gentle introduction to GANs.	21 Apr (Sun) {lecture#40}	SDR	[Online-only class: MS-Teams: 08:00am-09:00am] MS-Teams folder: video_nn18_21apr24.mp4, slides_nn18_21apr24.pdf
	A gentle introduction to GANs (contd). Recurrent Neural Networks.	23 Apr (Tue) {lecture#41}	SDR	MS-Teams folder: video_nn19_23apr24.mp4, slides_nn19_23apr24.pdf, lecture_notes_nn19_23apr24.pdf
	Recurrent Neural Networks (contd).	24 Apr (Wed) {lecture#42}	SDR	MS-Teams folder: video_nn20_24apr24.mp4, lecture_notes_nn20_24apr24.pdf
	--	27 Apr (Sat) {lecture#xx}	SDR	No class. Make-up class (online-only) on 24 Mar (Sun).
---	End-Term	01 May (Wed) 08:00am-10:00am, LH-310 (Please check your seat number here)	---	---
xx	Mathematical Basics for Machine Learning	xx-xx	xx	[Burges: Math for ML], [Do, Kolter: Linear Algebra Notes],

The above list is (obviously!) not exhaustive. Other reference material will be announced in the class. The Web has a vast storehouse of tutorial material on AI, Machine Learning, and other related areas.

Assignments

... A combination of theoretical work as well as programming work.
Both will be scrutinized in detail for original work and thoroughness.
For programming assignments, there will be credit for good coding.
Sphagetti coding will be penalized.
Program correctness or good programming alone will not fetch you full credit ... also required are results of extensive experimentation with varying various program parameters, and explaining the results thus obtained.
Assignments will have to be submitted on or before the due date and time.
Late submissions will not be considered at all.
Unfair means will be result in assigning as marks, the number said to have been discovered by the ancient Indians, to both parties (un)concerned. Assignment 1
Assignment 2
Assignment 3

Examinations and Grading Information

The marks distribution is as follows (out of a total of 100):

Mid-Term	37
Assignments	25
End-Term	38
Grand Total	100

ELL784 Marks and Grades (Anonymised)

Some points about examinations, including the honour code:

Instructions for online examinations
Unfair means will be result in assigning as marks, the number said to have been discovered by the ancient Indians, to both parties (un)concerned.

Attendance Requirements:

Attendance requirements for Online Semesters: in accordance with the IIT Delhi rules for an online semester.
Illness policy: illness to be certified by a registered medical practioner.
Attendance in Examinations is Compulsory.

Course Feedback

Link to Course Feedback Form

Sumantra Dutta Roy, Department of Electrical Engineering, IIT Delhi, Hauz Khas,

New Delhi - 110 016, INDIA. sumantra@ee.iitd.ac.in