MIT Machine Learning for Big Data and Text Processing Class Notes Day 4

On day 4 of the Machine learning course, following was the agenda:

  • Unsupervised learning, clustering
  • Dimensionality reduction, matrix factorization, and
  • Collaborative filtering, recommender problems


The day started with Regina Barzilay (Bio) (Personal Webpage) talk on Determining the number of clusters in a data set and approaches to determine the correct numbers of clusters. The core idea being addressed was difference between supervised, unsupervised, and semi-supervised feature selection algorithms and Supervised/Unsupervised/Semi-supervised Feature Selection for Multi-Cluster/Class Data. Dr. Barzilay discussed Voronoi diagramsVORONOI DIAGRAM BASED CLUSTERING ALGORITHMS leading to Lloyd's algorithm or Vornoi iteration. The lecture also included the simple yet effective k-means clustering and k-medoidsThe k-medoids algorithm is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm. Elbow method was also briefly discussed.

Llyods Algorithm

Choosing the number of clusters, as well as what to 'cluster around' are both quite interesting problems. Google news algorithms focused on clustering and how to measure the appeal of a product and to determine features of how google news cluster stories is a topic of immense interest. Instructor was inquired about Under The Hood: Google News & Ranking Stories and this link provides some insight as Patent Application Reveals Key Factors In Google News Algorithm 


High Performance Text Processing in Machine Learning by Daniel Krasner

The second part of the class was with Dr. Tommi Jaakkola (Bio) (Personal Webpage) who focused mainly on  examples of mixture models, revised the K-means algorithm for finding clusters in a data set, reviewed the latent variable view of mixture distributions, how to assign data points to specific components of mixture and general techniques for finding m.l. estimators in latent variable models.

The expectation Maximization (EM) algorithm and its explanation in context with Gaussian mixture models which motivates EM took majority of time. Dr. Jaakkola talked about framework for building complex probability distributions – A method for clustering data and using social media to reinforce learning.  The topic then went on to Sequence learning and a brief Introduction to decision trees and random forestsMixtures, EM, Non-parametric models  as well as Machine Learning from Data Gaussian Mixture Models.

Expectation Maximization (EM) is an iterative procedure that is very sensitive to initial conditions. The principal of Garbage in Garbage out applies here and therefore we need a good and fast initialization procedure. The Expectation Maximization Algorithm A short tutorial explains few techniques including sed: K-Means, hierarchical K-Means, Gaussian splitting etc.

Here is a great tutorial by MathematicalMonk on (ML 16.3) Expectation-Maximization (EM) algorithm.

Mixture Models and EM Tutorial by Sargur Srihari

(ML 16.6) Gaussian mixture model (Mixture of Gaussians) Introduction to the mixture of Gaussians, a.k.a. Gaussian mixture model (GMM). This is often used for density estimation and clustering.

In response to Henry Tan's query regarding how is tensor analysis applied to machine learning?, Dr. Regina pointed to one of her papers as a resource

Low-Rank Tensors for Scoring Dependency Structures Tao Lei, Yu Xin, Yuan Zhang, Regina Barzilay, and Tommi Jaakkola 

Rest of the class continued with ML topics and practical advise on on things like log-likelihood useClustering in high dimension is extremely trickyDimensionality reduction for supervised learningDimensionality reductionRandom Projections, Dimensionality reduction Feature selection, and last but not least, the BIC - Model Selection Lecture V: The Bayesian Information Criterion

Looking forward to tomorrow's final class on generative models, mixtures, EM algorith, Semi-supervised and active learning as well as tagging, information extraction.





Machine Learning with Scikit-Learn (I) - PyCon 2015