On day 4 of the Machine learning course, following was the agenda:
- Unsupervised learning, clustering
- Dimensionality reduction, matrix factorization, and
- Collaborative filtering, recommender problems
The day started with Regina Barzilay (Bio) (Personal Webpage) talk on Determining the number of clusters in a data set and approaches to determine the correct numbers of clusters. The core idea being addressed was difference between supervised, unsupervised, and semi-supervised feature selection algorithms and Supervised/Unsupervised/Semi-supervised Feature Selection for Multi-Cluster/Class Data. Dr. Barzilay discussed Voronoi diagrams, VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS leading to Lloyd's algorithm or Vornoi iteration. The lecture also included the simple yet effective k-means clustering and k-medoids. The k-medoids algorithm is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm. Elbow method was also briefly discussed.
Llyods Algorithm
Choosing the number of clusters, as well as what to 'cluster around' are both quite interesting problems. Google news algorithms focused on clustering and how to measure the appeal of a product and to determine features of how google news cluster stories is a topic of immense interest. Instructor was inquired about Under The Hood: Google News & Ranking Stories and this link provides some insight as Patent Application Reveals Key Factors In Google News Algorithm
High Performance Text Processing in Machine Learning by Daniel Krasner
The second part of the class was with Dr. Tommi Jaakkola (Bio) (Personal Webpage) who focused mainly on examples of mixture models, revised the K-means algorithm for finding clusters in a data set, reviewed the latent variable view of mixture distributions, how to assign data points to specific components of mixture and general techniques for finding m.l. estimators in latent variable models.
The expectation Maximization (EM) algorithm and its explanation in context with Gaussian mixture models which motivates EM took majority of time. Dr. Jaakkola talked about framework for building complex probability distributions – A method for clustering data and using social media to reinforce learning. The topic then went on to Sequence learning and a brief Introduction to decision trees and random forests, Mixtures, EM, Non-parametric models as well as Machine Learning from Data Gaussian Mixture Models.
Expectation Maximization (EM) is an iterative procedure that is very sensitive to initial conditions. The principal of Garbage in Garbage out applies here and therefore we need a good and fast initialization procedure. The Expectation Maximization Algorithm A short tutorial explains few techniques including sed: K-Means, hierarchical K-Means, Gaussian splitting etc.
Here is a great tutorial by MathematicalMonk on (ML 16.3) Expectation-Maximization (EM) algorithm.
Mixture Models and EM Tutorial by Sargur Srihari
(ML 16.6) Gaussian mixture model (Mixture of Gaussians) Introduction to the mixture of Gaussians, a.k.a. Gaussian mixture model (GMM). This is often used for density estimation and clustering.
In response to Henry Tan's query regarding how is tensor analysis applied to machine learning?, Dr. Regina pointed to one of her papers as a resource
Rest of the class continued with ML topics and practical advise on on things like log-likelihood use, Clustering in high dimension is extremely tricky, Dimensionality reduction for supervised learning, Dimensionality reduction, Random Projections, Dimensionality reduction Feature selection, and last but not least, the BIC - Model Selection Lecture V: The Bayesian Information Criterion
Looking forward to tomorrow's final class on generative models, mixtures, EM algorith, Semi-supervised and active learning as well as tagging, information extraction.
Misc
- Machine Learning Mastery - Getting Started
- Data Scientist In a Can
- Atlas of Knowledge
- Machine Learning Math
- Math for machine learning
- What if I’m Not Good at Mathematics
Machine Learning with Scikit-Learn (I) - PyCon 2015