COL 341: Fundamentals of Machine Learning

Course Objectives

After successfully completing the course, the students are expected to develop:

o Understanding of the basic concepts in machine learning

o Deeper mathematical understanding of the foundations of machine learning methods

o Skills in using popular tools, libraries and software for problem solving

o Hands on experience in solving problems that may be encountered in the industry

o Introductory understanding and exposure to selected research topics in Machine Learning

Course Outline

Week Number	Topics to be covered	References
1	Introduction to the course and basic concepts in machine learning	o Pattern Recognition and Machine Learning, by Christopher M. Bishop (Chapter 1.1) o Python Numpy Tutorial by Justin Johnson
2	Linear regression, feature creation, ridge regression, cross validation	o Andrew Ng course (CS229) notes on linear and logistic regression. o Convex Optimization, Stephen Boyd and Lieven Vandenberghe o Linear Algebra Review and Reference by Zico Kolter
3	Lasso regression, logistic regression, optimization basics, gradient descent and its variants, step size selection	o Regression Shrinkage and Selection Via the Lasso o Least Angle Regression o Compressive Sensing Resources
4	Multi-class classification, evaluation metrics, Introduction to neural networks	o Chapter 4.1, PRML, Bishop o Classification model metrics: : Precision and recall, sensitivity and specificity, ROC curves
5	Neural networks, Hessian, overfitting in neural networks, convolutional neural networks (CNNs) and deep learning	o Chapters 5.1, 5.2, 5.3, 5.4 of PRML, Bishop o Neural networks and back-propagation. o Roi Livni (Princeton) lecture notes on back-propagation. o CS231n (Stanford) slides on back-propagation. o Matt Gormley (CMU) slides on back-propagation o Convolutional Neural Networks o CNN notes from CS231n (Stanford University), Understanding and visualizing CNNs, Transfer learning, CS231n slides on CNN o Barnabas Poczos’s lecture notes on CNN
6	Generative models, Gaussian Discriminant Analysis (GDA), Naïve Bayes Classification	o Andrew Ng (CS229) lecture Notes on Generative Models
7	Understanding Multivariate Gaussian Distributions, linear regression revisited	o Multivariate Gaussian Distribution o A short reference on Multivariate Gaussian Distributions, by Leon Gu, CMU o Multivariate Gaussian Distributions, by Chuong B. Do, Stanford University o Probabilistic modelling – Linear regression & Gaussian processes by Fredrik Lindsten, Thomas B. Schön, Andreas Svensson and Niklas Wahlström
8	Estimation theory	o Estimation Theory notes, slides from Cristiano Porciani, Point estimation notes by Herman Bennett
9	Density estimation, Parzen Window, KNN classifier	o PRML, Bishop, Chapter 2.5 o Lecture Notes on Nonparametrics by Bruce E. Hansen
10	Decision trees, Random Forests	o Chapter 8.1-8.4 from Pattern Classification 2nd Edition, Duda, Hard and Stork o Chapter on Decision Trees by Lior Rokach and Oded Maimon o Classical paper on Random Forest by Leo Breiman
11	Support Vector Machines (SVMs)	o Andrew Ng course (CS229) lecture notes on Support Vector Machines (SVMs) o Chapters 5.1, 5.2, 5.3, 5.4, 5.5 from Stephen Boyd and Lieven Vandenberghe
12	Unsupervised Learning, Principal Component Analysis (PCA)	o A Tutorial on Principal Component Analysis (PCA) by Jonathon Shlens o Slides on PCA by Barnabás Póczos
13	Non-Negative Matrix Factorization (NMF), K-means, Non-linear dimensionality reduction	o Lee, Daniel D., and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pp. 556-562. 2001. o Lee, Daniel D., and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature 401, no. 6755 (1999): 788. o Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30–37, 2009. o PRML, Bishop, Chapter 9.1 o A Global Geometric Framework for Nonlinear Dimensionality Reduction, Joshua B. Tenenbaum,Vin de Silva,John C. Langford and Nonlinear Dimensionality Reduction by Locally Linear Embedding, Sam T. Roweis and Lawrence K. Saul
14	Boosting, Generative adversarial networks	o Yoav Freund and Robert E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999. o VC Dimension & The Fundamental Theorem, Lecture notes, Class of Roi Livni o Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014.

Grading structure (tentative)

· Exams — 45 (Minors — 25, Major — 20)

· Assignments/project — 45 (Assignments — 30, Project — 15)

· Quizzes/home work/class participation — 10

Homework assignments (ungraded)

o Week 1

o Learn Python, NumPy and SciPy

o Write a matrix multiplication program in Python using (a) for loops and (b) by using NumPy/SciPy. Plot the Gflops of your programs as a function of matrix sizes

o Week 2

o Prove that J(\theta) is convex in \theta (refer to Andrew Ng lecture notes on linear and logistic regression for definition of J(\theta).

o Read Linear Algebra Review and Reference by Zico Kolter

o Week 3

o Read chapters 2.1, 2.2, 2.3, 2.5, 9.2, 9.3 and optionally 9.4 of Convex Optimization text book

o Week 4

o Review probability and statistics

o Read 1D, 2D and 3D convolutions

o Learn Keras/Tensorflow