Events
KDD 2008 – Day 3 & 4.
Day 3 started with Invited Talk of Dr. Michael Schwarz from Yahoo! Research on “Internet Advertising and Optimal Auction Design” who discussed Generalized English Auctions and Internet advertising, generalized second price options. (Details can be read on Akshay Java’s Social Media research blog here). It was an interesting talk pertaining to how almost every transaction follows the modern auction model and what approaches can be used to maximize the throughput and ROI.
Later during regular sessions, I attended the Discovery and Detection research session which was focused on outlier analysis. It comprised of the following presentations.
25-minute presentations
- Automatic Identification of Quasi-Experimental Designs for Discovering Causal Knowledge. D. D. Jensen, A. S. Fast, B. J. Taylor, M. E. Maier.
- Discrimination-aware Data Mining. D. Pedreschi, S. Ruggieri, F. Turini.
15-minute presentations
- Local Peculiarity Factor and Its Application in Outlier Detection. J. Yang, N. Zhong, Y. Yao, J. Wang.
- Angle-Based Outlier Detection in High-dimensional Data. H. Kriegel, M. Schubert, A. Zimek.
- Anomaly Pattern Detection in Categorical Datasets. K. Das, J. Schneider, D. B. Neill.
Lunch was sponsored by Yahoo! and it was pretty cool décor with their gadgets, puzzles and YahooDokus. Later in the afternoon I attended
25-minute presentations
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Y. Koren.
- Combinational Collaborative Filtering for Personalized Community Recommendation. W. Chen, D. Zhang, E. Y. Chang.
And then
- (R) Scaling Up Text Classification for Large File Systems. G. Forman, S. Rajaram.
- (I) Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere. R. Grossman, Y. Gu Short.
I later got a chance to talk to Dr. Grossman about the cloud computing initiative that he is very passionate about. He discussed in his presentation how Sector is approximately twice as fast as Hadoop and how Sector has been used to distribute the Sloan Digital Sky Survey (SDSS) via the web site sdss.ncdm.uic.edu.
The day concluded with Poster reception where I get to talk to several authors and presenters including Wen-Yen Chen, Pooja Mittal, Yabo-Arber Xu, Kaustav Das of Anomaly Pattern Detection in Categorical Datasets and one of the authors of Information Extraction from Wikipedia, not sure who.
Day 4
The last day of conference started with Jitendra Malik’s invited talk on “The Future of Image Search”. (Greg Linden’s Blog Post about the talk). It was a great talk where Jitendra discussed the evolution of vision, image search, shortcomings of tagging and textual taxonomies and pushed for "category recognition" for objects in images.
Later there were the following excellent sessions.
- Context-Aware Query Suggestion by Mining Click-Through and Session Data. H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li. (Best Application Paper Award Winner)
- Scalable and Near Real-Time Burst Detection from eCommerce Queries. N. Parikh, N. Sundaresan.
- Using Predictive Analysis to Improve Invoice-to-Cash Collection. S. Zeng, P. Melville, C. A. Lang, I. Boier-Martin, C. Murphy.
- TagMark: Reliable Estimations of RFID Tags for Business Processes. L. W. F. Chaves, E. Buchmann, K. Böhm.
The conference concluded with closing remarks from the general chair, Ying Li.
Call for Volunteers – SoCal Code Camp Oct 25-26.
Greetings Southern California .NET Community
I
was wondering if any of you would be able to volunteer to help register
speakers, attendees, & help with general organization, etc. on the Code Camp event days (Saturday/Sunday October 25/26).
I am heading up Volunteer Coordination so if you are available please contact me directly (
) and put volunteer in subject line) or sign up via the “Contacts” pull-down of www.socalcodecamp.com.
Thanks!
Adnan Masood
Volunteer Coordinator
SoCal.NET Code Camp
www.SoCalCodeCamp.com
President & Co-Founder
San Gabriel Valley .NET Developers Group
www.SGVdotNet.org
KDD 2008 Day 2
Day 2 started with Trever Hastie’s talk on regularization paths and coordinate descent. It was great to see Dr. Hastie passionately speaking about the coordinate descent, logistic regression and fitting. The keynote’s topic was “Regularization Paths and Coordinate Descent” and following is a brief abstract from the talk.
“In a statistical world faced with an explosion of data, regularization has become an important ingredient. In many problems, we have many more variables than observations, and the lasso penalty and its hybrids have become increasingly useful. This talk presents some effective algorithms based on coordinate descent for fitting large scale regularization paths for a variety of problems. Joint work with Rob Tibshirani and Jerome Friedman”
After the keynote talk, there were combined and research sessions. I attended one with social Networks which comprised of the following presentations.
25-minute presentations
- The Structure of Information Pathways in a Social Communication Network. G. Kossinets, J. Kleinberg, D. Watts.
- Influence and Correlation in Social Networks. A. Anagnostopoulos, R. Kumar, M. Mahdian.
- Weighted Graphs and Disconnected Components. M. McGlohon, L. Akoglu, C. Faloutsos.
15-minute presentations
- Microscopic Evolution of Social Networks. J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins.
- Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. M. Seshadri, S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, J. Leskovec.
- Feedback Effects between Similarity and Social Influence in Online Communities. D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, S. Suri.
During lunch which was sponsored by Microsoft adCenter labs, they talked about challenges in advertising and applying it to get the best out of revenue share and context base hits. They also announced adCenter labs challenge which I have yet to find any information about online.
Later in the afternoon, Microsoft Research’s Thore Graepel talked about Large Scale Data Analysis and Modeling in Online Services and Advertising. It was a very interesting and pragmatic presentation about the real world problems in online search and advertising. Even though the first part of presentation was online ranking and matchmaking heavy, the later discussion on advertising made up for it.
Then there were 25-minute presentations
- ArnetMiner: Extraction and Mining of Academic Social Networks. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su.
- Identifying Authoritative Actors in Question-Answering Forums – The Case of Yahoo! Answers. M. Bouguessa, B. Dumoulin, S. Wang. (the paper was on Belinko’s website but not there anymore, interesting?)
And on a separate track,
15-minute presentations
- Automated Cyclone Discovery and Tracking using Knowledge Sharing in Multiple Heterogeneous Satellite Data. S.-S. Ho, A. Talukder.
- Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. N. Koenigstein, Y. Shavitt, T. Tankel.
- Land Cover Change Detection: A Case Study. S. Boriah, V. Kumar, M. Steinbach, C. Potter, S. Klooster.
Later after the automated cyclone discovery session, I got a chance to meet Dr. Talukder of Jet Propulsion Laboratory/NASA to talk about a mutual acquaintance, Dr. Homayoun Seraji
This concluded day 2 of the conference.
Post Event Resources – MSDN Webcast on REST and WCF
The MSDN webcast on geekSpeak: REST and Windows Communication Foundation 3.5 went very well; Since REST is a very broad topic and there were tons of questions, I didn’t get a chance to show all the demos however the sample code can be downloaded from here.
Also, keep an eye on geek speak blog for future updates. Overall. there is a lot of concern about security in REST. I’ll be doing a series of blog posts on security in REST in near future however in the mean time, following resources would be provide a good starting point.
Mark O’Neill’s Radio Weblog
Message Level Security in REST
Taking Amazon S3 as a model for secure REST services can be one way to implement security in REST. As mentioned in this article by Eric Heuveneers
“Amazon S3 REST resources are secure. This is important not just for your own purposes, but also because customers are billed depending on how their S3 buckets and objects are used. An AWSSecretKey is assigned to each AWS customer, and this key is identified by an AWSAccessKeyID. The key must be kept secret and will be used to digitally sign REST requests. S3 security features are:
- Authentication: Requests include AWSAccessKeyID
- Authorization: Access Control List (ACL) could be applied to each resource
- Integrity: Requests are digitally signed with AWSSecretKey
- Confidentiality: S3 is available through both HTTP and HTTPS
- Non repudiation: Requests are time stamped (with integrity, it’s a proof of transaction)
The signing algorithm is HMAC/SHA1 (Hashing for Message Authentication with SHA1).’
Reference: Introduction to Amazon S3 with Java and REST
Links to the books and reference articles mentioned in the webcast are as follows. Please feel free to send me your questions and comments on my email 
Books
- RESTFul .NET
- RESTFul Web Services
References
- The Weekly Source Code 32- Atom, AtomPub and BlogSvc, an AtomPub Server in WCF
- WCF 3.5 Utilities
- WCF 3.5 RSS / ATOM Syndication Support
- Dare Obasanjo aka Carnage4Life – ETech 2005 Trip Report Building a New Web Service at Google
- Defining REST based Formats
- InfoQ Dan Diephouse on Atom, AtomPub, REST and Web Services
- MIX07 Buzzcast #12 – Steve Maine – Navigating the Programmable Web MIX07 Buzzcast Channel 9
- Scott Hanselman’s Computer Zen – Web Services
- Showcase of Live ASP.NET MVC Sites – Mike Bosch’s Blog on .NET
- The Highs and Lows of REST
- TRUVEO – Complete ASP.NET sample REST Service Calls
- WCF & REST at MIX08 The Tale of MySpace APIs
- WCF Web Programming Model Overview
- WebServiceStudio – Home (CodePlex Project)
- YABE Yet Another Blogging Engine – Home (CodePlex Project)
- WADL (CodePlex Project)
MSDN Webcast: geekSpeak: REST and Windows Communication Foundation 3.5 with Adnan Masood (Level 200)
I’ll be doing a webcast on 3rd September on geekspeak. The topic is “REST and Windows Communication Foundation 3.5″. Details are as follows.
MSDN Webcast: geekSpeak: REST and Windows Communication Foundation 3.5 with Adnan Masood (Level 200)
Audience(s): Developer.
Duration: 60 Minutes
Start Date:
Wednesday, September 03, 2008 12:00 PM Pacific Time (US & Canada)
Event Overview
The geekSpeak webcast series brings you industry experts in a “talk-radio” format hosted by developer evangelists from Microsoft. These experts share their knowledge and experience about a particular developer technology, and they are ready to answer your questions in real time during the webcast.
This geekSpeak is a very RESTful one. Distributed systems guru Adnan Masood introduces the Representational State Transfer (REST) architectural style and its design principles, and he discusses how they can be implemented using Windows Communication Foundation (WCF) 3.5. Adnan offers guidance and takes questions on when to choose a RESTful design over SOAP-based services and how WCF fits into the spectrum of Microsoft technologies that include ADO.NET Data Services (Astoria) and ASP.NET MVC. Your hosts for this geekSpeak are Lynn Langit and Glen Gordon.
To ask a question in advance of the live webcast, or for post-show resources, be sure to visit the geekSpeak blog.
Guest Presenter: Adnan Masood, Senior Software Engineer, Green Dot Corporation
Adnan Masood works as a senior software engineer and technical lead in a Monrovia-based financial institution where he develops middle-tier architectures, distributed systems, and Web-applications using the Microsoft .NET framework. He holds various professional memberships (ACM, BCS, and ACS) and several technical certifications, including MCSD .NET, MCAD .NET, and SCJP-II. Adnan is attributed and published in print media and on the Web, holds a master’s degree in computer science from Nova Southeastern University, and is currently pursuing his doctoral studies in machine learning. Adnan has taught Windows Communication Foundation (WCF) courses at the University of California at San Diego and regularly presents at local code camps. He is actively involved in the .NET community as cofounder and president of the of San Gabriel Valley .NET Developers group. Adnan is a recent recipient of an INETA Community Champion Award for his contributions to the developer community in Southern California.
Event ID: 1032387085
KDD 2008 Conference Photos
ACM’s KDD 2008 Conference – Day 1 Proceedings
ACM’s KDD 2008 is the annual premier international forum for
data mining researchers and practitioners from academia, industry, and
government to share their ideas, research results and experiences. This year
this event was held in Loews Lake Las Vegas resort where Jeff Bergman and I
attended it. Details of the program can be found here http://www.kdd2008.com/program.html and the summary is as follows.
9:00 am – 5:00 pm
Full Day Workshop W1 – ADKDD’08
Full Day Workshop W2 – WEBKDD’08
Full Day Workshop W3 – Sensor-KDD
Full Day Workshop W4 – PinKDD’08
Full Day Workshop W5 – SNA-KDD
Full Day Workshop W13 – Multimedia Data Mining
9:00 am – 12:00 pm
Half Day Workshop W6 – KDD CUP and Mining Medical data
Half Day Workshop W7 – Multiple Information Sources
Half Day Workshop W11 – BIOKDD08
Half Day Workshop W12 – Mining for Business Applications
9:00 am – 12:00 pm
Tutorial – Mining Massive RFID, Trajectory, and Traffic Data Sets
Tutorial – Predictive Modeling with Social Networks
Tutorial – Mining Uncertain and Probabilistic Data: Problems, Challenges,
Methods, and Applications
Tutorial – Detecting Clusters in Moderate-to-High Dimensional Data: Subspace
Clustering, Pattern-based Clustering, and Correlation Clustering
2:00 pm – 5:30 pm Half Day Workshop
W8 – Large Scale Recommender Systems and NetFlix Prize
W10 – Mining using Matrices and Tensors
2:00 pm – 5:00 pm
Tutorial – Blogosphere: Research Issues, Applications, and Tools
Tutorial – Graph Mining and Graph Kernels
Tutorial – Applied Text Mining
6:15 pm – 6:45 pm : Award Presentations
6:45 pm – 7:30 pm : Innovation Award Talk
Day 1 was very informative and provided good learning experience. The program
included several full day workshops and tutorials listed below.
·
J. Han, J. Lee, H. Gonzalez, X. Li, “Mining
Massive RFID, Trajectory, and Traffic Data Sets”
Jiawei Han, Jae-Gil Lee, Hector Gonzalez, Xiaolei Li
Department of Computer Science, University of Illinois at Urbana-Champaign
·
J. Neville, F. Provost, “Predictive
Modeling with Social Networks”
Jennifer Neville, Purdue University
Foster Provost, New York University
·
J. Pei, M. Hua, Y. Tao, X. Lin, “Mining
Uncertain and Probabilistic Data: problems, Challenges, Methods, and
Applications”
Jian Pei, Simon Fraser University, Canada
Ming Hua, Simon Fraser University, Canada
Yufei Tao, The Chinese University of Hong Kong
Xuemin Lin, The University of New South Wales, Australia
·
H. Kriegel, P. Kroger, A. Zimek, “Detecting
Clusters in Moderate-to-High Dimensional Data: Subspace Clustering,
Pattern-based Clustering, and Correlation Clustering”
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek
Institute for Informatics, Ludwig-Maximilians-Universitat Munchen, Germany
·
H. Liu and N. Agarwal, “Blogosphere:
Research Issues, Applications, and Tools”. Huan Liu, Arizona State
University, Nitin Agarwal, Arizona State University
R. Feldman, L. Ungar, “Applied Text Mining”
Social Networking being the prominent theme at the
conference, I decided to get a head start by attending the half day tutorial on
“Predictive Modeling in Social Networks” by Jennifer Neville and
Foster Provost. The abstract from the
tutorial is as follows.

Recently there has been a surge of interest in methods for
analyzing complex social networks: from communication networks, to friendship
networks, to professional and organizational networks. The dependencies among
linked entities in the networks present an opportunity to improve inference
about properties of individuals, as birds of a feather do indeed flock
together. For example, when deciding how to market a product to people in
MySpace or Facebook, it may be helpful to consider whether a person’s friends
are likely to purchase the product.
This tutorial will explore the unique opportunities and
challenges for modeling social network data. We will begin with a description
of the problem setting, including examples of various applications of social
network mining (e.g., marketing, fraud detection). We will then present a
number of characteristics of social network data that differentiate it from
traditional inference and learning settings, and outline the resulting
opportunities for significantly improved inference and learning. We will
discuss specific techniques for capitalizing on each of the opportunities in
statistical models, and outline both methodological issues and potential
modeling pathologies that are unique to network data. We will give links to the
recent literature to guide study, and present results demonstrating the effectiveness
of the techniques.

Dr. Provost started by establishing the core foundation for
social networking and further get in depth with network targeting, disjoint
inference, learning & classification, wvRN, ACORA, RBC, RPT, SLR and
context of collective inference. Dr. Neville then continued with Gaussian
random fields and elaborated with her work on questionable broker detection.
Semi-supervised learning, conventional bias vs. variance analysis, homophily,
social influence, external factors and open research issues were also part of
tutorial. Later in a discussion with Dr. Provost, he mentioned that the
collaborative techniques described can also be implemented for outlier analysis
which was encouraging.
For the second tutorial, I attended the “Graph Mining
and Graph Kernel” tutorial by Karsten M. Borgwardt
(http://mlg.eng.cam.ac.uk/~karsten/) and Xifeng Yan (IBM Research Center). This
tutorial presented a comprehensive overview of the techniques developed in
graph mining and graph kernels and examines the connection between them. As described by authors, “The goal of this
tutorial is i) to introduce newcomers to the field of graph mining, ii) to
introduce people with database background to graph mining using kernel
machines, iii) to introduce people with machine learning background to
database-oriented graph mining, and iv) to present exciting research problems
at the interface of both fields.”

Applied Text mining tutorials by Dr. Ronen Feldman & Dr.
Lyle Unger was also an excellent talk. Dr. Feldman, author of applied text
mining, has a great style of pragmatic discussion and connects with the
audience really well. I am looking forward to his future presentation and
discuss the idea of natural language corpus extraction implementations in Text
mining for my Urdu machine translation work; he must have some great ideas
about it.

After the tutorials Bing Liu, the program chair presented
conference statistics; apart from all other numbers, salient ones are
submission from the US, 323 papers out of which 81 were accepted. In total
there were 593 submissions and 118 accepted ones, a less than 20% or less than
1 out of 5 ratio! These guys are picky.


Then came the best research paper award, best application
paper award, student travel awards, KDD dissertation award, KDD Cup awards, KDD
innovation award and finally concluded on innovation award talk by Raghu
Ramakrishnan. KDD Cup 2008 winning
announcements in medical data mining was a highly practical and quite challenging
problem. Details of the cup submissions can be seen here. http://www.kdd2008.com/kddcup.html
Dr. Ramakrishnan is the author of “Cow Book” and his final talk
for the day covered his past research and a broad spectrum of future directions
of information retrieval. With educated “predictions” from a seasoned data miner, the first day
concluded.

I’m very much looking forward to tomorrow’s sessions; till
then, happy mining.
I’ve taken a lot of photos of the presentations Photos of the event are shared on the facebook. Click here
to see them.
Going Places – PDC, KDD and IASA Connections and Teaching WCF @ UCSD
August and the next couple of months looks
really busy. I’ll be teaching WCF at
UCSD and will be attending the following conferences along with doctoral
cluster meeting. Therefore I am seriously considering “The Terminal” style living.
KDD 2008, 24 – 27 Aug 2008, Loews
Lake Las Vegas Las Vegas, NV
The
annual ACM SIGKDD conference is the premier international forum for data mining
researchers and practitioners from academia, industry, and government to share
their ideas, research results and experiences. KDD-08 will feature keynote
presentations, oral paper presentations, poster sessions, workshops, tutorials,
panels, exhibits, demonstrations, and the KDD Cup competition.
IASA
Connections, October 6 – 8, 2008, San Francisco Marriott, San Francisco, CA
I’ll be speaking to IASA connections
conference in San Francisco on Aspect Oriented Programming in Distributed
Systems. More details here.
Microsoft PDC 2008 – 27 – 30 Oct, Los Angeles
Convention Center, Los Angeles, CA
Since
1991, the Professional Developers Conference (PDC) has been Microsoft’s premier
gathering of leading-edge developers and architects. Attend the PDC to
understand the future of the Microsoft platform and to exchange ideas with
fellow professionals. You’ll learn about upcoming products, meet Microsoft’s
leaders and top engineers, write some code, and be inspired! Unplug for a few
days and think about the future.
Programming
Windows Communication Foundation (WCF) (Summer 2008)
Sa, 8:00 a.m. – 5:00 p.m.
8/9/2008 – 8/23/2008
Room 134, UCSD Extension Complex, 9600 N Torrey Pines Rd, La Jolla
Programming
Windows Communication Foundation (WCF) (Fall 2008)
Sa, 8:00 a.m. – 5:00 p.m.
10/4/2008 – 10/18/2008
Room 110, UCSD Extension Sorrento Mesa Center, 6925 Lusk Blvd, San Diego
REST and WCF 3.5 Talk Slides and Code Samples
On Thursday July 17th, I presented “RESTFul Web Services – UriTemplates and REST support with WCF 3.5″.
to SoCal.NET architecture group (http://www.socaldotnetarchitecture.org/). It was well recieved and I got good feedback.
The code samples and slides are as follows.
- REST using WCF 3.5.pdf – Slides in PDF (529.21 KB)
- REST using WCF 3.5.pptx – Slides in Powerpoint (184.54 KB)
- RESTful Communication.docx (Excerpt from
David
Chappell’s whitepaper) - RestService.zip – Simple Service Source Code (3.18 KB)
- WCF3.5-SyndicationRESTSamples.zip (Uri Templates, REST-Sample MSDN Source Code) (162.61 KB)
Thanks to all the attendees especially Mike Vincent and David Wells for arranging this talk.
INETA’s David Yack on ADO.NET and My REST Talk at SoCal Architecture Group.
Tomorrow, July 16th, David Yack will be speaking on Exploring the Entity Framework at SGV.NET User Group (www.sgvdotnet.org). It’s an INETA sponsored event and for those interested in understanding a core strategic part of Microsoft’s data access strategy, please join us. David would walks us through how Entity Framework aims to improve the mismatch between data storage and data usage by applications. In his talk he will explore the Entity Data Model and the various techniques for accessing using the client libraries that are part of the Entity Framework. With V1 of Entity Framework almost ready to go out the door, David will also touch on efforts already underway for V2.
Speaking of Speaking, On Thursday July 17th, I’ll be presenting to SoCal.NET architecture group (http://www.socaldotnetarchitecture.org/) on “RESTFul Web Services – UriTemplates and REST support with WCF 3.5″. The abstract of the talk as follows.
“REST (Representational state transfer) is an architectural style to build distributed systems in a Uri centric way focusing on resource addressing via HTTP style “command line” interface. REST style of service development improves server scalability, allows systems to be more robust and promotes long-term compatibility and evolvability. Related technologies using the similar design principles are ASP.NET MVC and ADO.NET data services (Astoria). Support for REST is introduced in WCF 3.5 with a new WCF binding (webHttpBinding) allowing .NET developers to have the option of build light weight REST style services in contrast with traditional SOAP/RPC style development.
The presentation focuses on REST design principles and how they can be implemented using Windows Communication Foundation (WCF) 3.5. New Features such as support for UriTemplates, Web HTTP binding, syndication support and the new web programming model leveraging a RESTful design of web services within the unified WCF programming model will be addressed for architectural and implementation perspective.”