So after having an awesome Day 1 @ MIT, I was in CSAIL library and met Pedro Ortega, NIPS 2015 Program Manager @adaptiveagents. Celebrity sighting!
Today on Day 2, Dr. Jaakkola (Bio) (Personal Webpage) professor, Electrical Engineering and Computer Science/Computer Science and Artificial Intelligence Laboratory (CSAIL), went over the following .
- Non-linear classification and regression, kernels
- Passive aggressive algorithm
- Overfitting, regularization, generalization
- Content recommendation
Dr. Jaakkola's socratic method of inquiring the common sense questions ingrain the common concepts in the mind of people. The class started with the follow up of perceptron from yesterday and quickly turned into a session on when NOT to use perceptron such as in case of non linearly seperable problems. Today's lecture was derieved from 6.867 Machine Learning Lecture 8. The discussion extended to Support Vector Machine (and Statistical Learning Theory) Tutorial, which is also well explained in the An Idiot’s guide to Support vector machines (SVMs) R. Berwick, Village Idiot
Speaking of SVM and dimensionality, Dr. Jaakkola posed the question if ranking can also be a secondary classification problem? Learning to rank or machine-learned ranking (MLR) is a fascinating topic where common intuitions like number of items displayed, error functions between user's preference and display order sparseness fall flat. Microsoft research has some excellent reference papers and tutorials on learning to rank which are definitely worth pouring over in case you are interested in this topic. Label ranking by learning pairwise preferences is another topic discussed in detail during the class. Some reference papers follow:
- A Short Introduction to Learning to Rank
- Reviewing Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales
- LETOR: Learning to Rank for Information Retrieval Tutorials on Learning to Rank
- Ranking Methods in Machine Learning A Tutorial Introduction
- Yahoo! Learning to Rank Challenge Datasets
- Large Scale Learning to Rank
- Yahoo! Learning to Rank Challenge Overview
- Multiclass Classification: One-vs-all
- Zipf, Power-laws, and Pareto - a ranking tutorial Lada A. Adamic
Indeed with SVM, the natural progression led to the 'k' word; kernel functions. A brief introduction to kernel classifiers Mark Johnson Brown University is a good starting point and The difference of kernels in SVM?, and how to select a kernel for SVM provide good background material to understand the practical aspects of kernel. Kernels and the Kernel Trick Martin Hofmann Reading Club "Support Vector Machines"
The afternoon topic was Anomaly detection; use cases included aberrant behavior in financial transactions, insurance fraud, bot detection, manufacturing quality control etc. One the most comprehensive presentations on Anomaly Detection Data Mining Techniques is by Francesco Tamberi which is great for the background. Several problems worked on during the class were from 6.867 Machine learning which shows how instructors carefully catered the program for practitioners with the right contents from graduate level courses, as well as industry use cases. Other topics discussed included Linear versus nonlinear classifiers and we learned how decision boundary is the region of a problem space in which the output label of a classifier is ambiguous. Class discussions and Q&A touched on the wide variety of subjects including but not limited to How to increase accuracy of classifiers?, Recommendation Systemsm A Comparative Study of Collaborative Filtering Algorithms which eventually led to Deep Learning Tutorial: From Perceptrons to Deep Networks which performed really well on MNIST Database for handwritten digits.
- Caltech 101
- THE MNIST DATABASE of handwritten digits
- Why do naive Bayesian classifiers perform so well?
Linear vs. non linear classifiers followed where Dr. Jaakkola spoke about why logistic regression a linear classifier, more on Linear classifier, Kernel Methods for General Pattern Analysis, Kernel methods in Machine learning, How do we determine the linearity or nonlinearity of a classification problem? and review of Kernel Methods in Machine Learning
Misc. discussions of Kernel Methods, So you think you have a power law, Radial basis function kernel, Kernel Perceptron in Python surfaced, some of which briefly reviewed in Machine Learning: Perceptrons- Kernel Perceptron Learning Part-3/4. Shape Fitting with Outliers and SIGIR 2003 Tutorial Support Vector and Kernel Methods tutorial with radial basis functions. Other topics included Kernel based Anomaly Detection with Multiple Kernel Anomaly Detection (MKAD) Algorithm, Support Vector Machines: Model Selection Using Cross-Validation and Grid-Search, LIBSVM -- A Library for Support Vector Machines, Practical Guide to Support Vector Classification, Outlier Detection with Kernel Density Functions and Classification Framework for Anomaly Detection as relevant readings.
For a linear Algebra Refresher, Dr. Barzilay recommended Prof. Gilbert Strang MIT Open Course Number 18.06 or Gilbert Strang lectures on Linear Algebra via video lectures.
Looking forward to the Deep Learning and Boosting tomorrow! Dr. Barzilay said its going to be pretty cool.
Misc: