ACM's KDD 2008 Conference – Day 1 Proceedings

ACM's KDD 2008 is the annual premier international forum for
data mining researchers and practitioners from academia, industry, and
government to share their ideas, research results and experiences. This year
this event was held in Loews Lake Las Vegas resort where Jeff Bergman and I
attended it. Details of the program can be found here and the summary is as follows.

9:00 am - 5:00 pm

Full Day Workshop W1 - ADKDD'08
Full Day Workshop W2 - WEBKDD'08
Full Day Workshop W3 - Sensor-KDD
Full Day Workshop W4 - PinKDD'08
Full Day Workshop W5 - SNA-KDD
Full Day Workshop W13 - Multimedia Data Mining

9:00 am - 12:00 pm
Half Day Workshop W6 - KDD CUP and Mining Medical data
Half Day Workshop W7 - Multiple Information Sources
Half Day Workshop W11 - BIOKDD08
Half Day Workshop W12 - Mining for Business Applications

9:00 am - 12:00 pm
Tutorial - Mining Massive RFID, Trajectory, and Traffic Data Sets
Tutorial - Predictive Modeling with Social Networks
Tutorial - Mining Uncertain and Probabilistic Data: Problems, Challenges,
Methods, and Applications
Tutorial - Detecting Clusters in Moderate-to-High Dimensional Data: Subspace
Clustering, Pattern-based Clustering, and Correlation Clustering

2:00 pm - 5:30 pm Half Day Workshop
W8 - Large Scale Recommender Systems and NetFlix Prize
W10 - Mining using Matrices and Tensors

2:00 pm - 5:00 pm
Tutorial - Blogosphere: Research Issues, Applications, and Tools
Tutorial - Graph Mining and Graph Kernels
Tutorial - Applied Text Mining

6:15 pm - 6:45 pm : Award Presentations

6:45 pm - 7:30 pm : Innovation Award Talk

Day 1 was very informative and provided good learning experience. The program
included several full day workshops and tutorials listed below.

J. Han, J. Lee, H. Gonzalez, X. Li, "Mining
Massive RFID, Trajectory, and Traffic Data Sets"
Jiawei Han, Jae-Gil Lee, Hector Gonzalez, Xiaolei Li
Department of Computer Science, University of Illinois at Urbana-Champaign

J. Neville, F. Provost, "Predictive
Modeling with Social Networks"
Jennifer Neville, Purdue University
Foster Provost, New York University

J. Pei, M. Hua, Y. Tao, X. Lin, "Mining
Uncertain and Probabilistic Data: problems, Challenges, Methods, and
Jian Pei, Simon Fraser University, Canada
Ming Hua, Simon Fraser University, Canada
Yufei Tao, The Chinese University of Hong Kong
Xuemin Lin, The University of New South Wales, Australia

H. Kriegel, P. Kroger, A. Zimek, "Detecting
Clusters in Moderate-to-High Dimensional Data: Subspace Clustering,
Pattern-based Clustering, and Correlation Clustering"
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek
Institute for Informatics, Ludwig-Maximilians-Universitat Munchen, Germany

H. Liu and N. Agarwal, "Blogosphere:
Research Issues, Applications, and Tools". Huan Liu, Arizona State
University, Nitin Agarwal, Arizona State University
R. Feldman, L. Ungar, "Applied Text Mining"

Social Networking being the prominent theme at the
conference, I decided to get a head start by attending the half day tutorial on
"Predictive Modeling in Social Networks" by Jennifer Neville and
Foster Provost.  The abstract from the
tutorial is as follows.

Recently there has been a surge of interest in methods for
analyzing complex social networks: from communication networks, to friendship
networks, to professional and organizational networks. The dependencies among
linked entities in the networks present an opportunity to improve inference
about properties of individuals, as birds of a feather do indeed flock
together. For example, when deciding how to market a product to people in
MySpace or Facebook, it may be helpful to consider whether a person's friends
are likely to purchase the product.

This tutorial will explore the unique opportunities and
challenges for modeling social network data. We will begin with a description
of the problem setting, including examples of various applications of social
network mining (e.g., marketing, fraud detection). We will then present a
number of characteristics of social network data that differentiate it from
traditional inference and learning settings, and outline the resulting
opportunities for significantly improved inference and learning. We will
discuss specific techniques for capitalizing on each of the opportunities in
statistical models, and outline both methodological issues and potential
modeling pathologies that are unique to network data. We will give links to the
recent literature to guide study, and present results demonstrating the effectiveness
of the techniques.

Dr. Provost started by establishing the core foundation for
social networking and further get in depth with network targeting, disjoint
inference, learning & classification, wvRN, ACORA, RBC, RPT, SLR and
context of collective inference. Dr. Neville then continued with Gaussian
random fields and elaborated with her work on questionable broker detection.
Semi-supervised learning, conventional bias vs. variance analysis, homophily,
social influence, external factors and open research issues were also part of
tutorial. Later in a discussion with Dr. Provost, he mentioned that the
collaborative techniques described can also be implemented for outlier analysis
which was encouraging.

For the second tutorial, I attended the "Graph Mining
and Graph Kernel" tutorial by Karsten M. Borgwardt
( and Xifeng Yan (IBM Research Center). This
tutorial presented a comprehensive overview of the techniques developed in
graph mining and graph kernels and examines the connection between them.  As described by authors, “The goal of this
tutorial is i) to introduce newcomers to the field of graph mining, ii) to
introduce people with database background to graph mining using kernel
machines, iii) to introduce people with machine learning background to
database-oriented graph mining, and iv) to present exciting research problems
at the interface of both fields.”

Applied Text mining tutorials by Dr. Ronen Feldman & Dr.
Lyle Unger was also an excellent talk. Dr. Feldman, author of applied text
mining, has a great style of pragmatic discussion and connects with the
audience really well. I am looking forward to his future presentation and
discuss the idea of natural language corpus extraction implementations in Text
mining for my Urdu machine translation work; he must have some great ideas
about it.

After the tutorials Bing Liu, the program chair presented
conference statistics; apart from all other numbers, salient ones are
submission from the US, 323 papers out of which 81 were accepted. In total
there were 593 submissions and 118 accepted ones, a less than 20% or less than
1 out of 5 ratio! These guys are picky.

Then came the best research paper award, best application
paper award, student travel awards, KDD dissertation award, KDD Cup awards, KDD
innovation award and finally concluded on innovation award talk by Raghu
Ramakrishnan.  KDD Cup 2008 winning
announcements in medical data mining was a highly practical and quite challenging
problem. Details of the cup submissions can be seen here.

Dr. Ramakrishnan is the author of “Cow Book” and his final talk
for the day covered his past research and a broad spectrum of future directions
of information retrieval. With educated “predictions” from  a seasoned data miner, the first day

I’m very much looking forward to tomorrow’s sessions; till
then, happy mining.

I've taken a lot of photos of the presentations Photos of the event are shared on the facebook. Click here
to see them.