Archive for September, 2008
F# Eye for the C# Guy
WADL (Web Application Description Language) – Call for Contributors for Open Source CodePlex Project
For those who use Windows Communication Foundation extensively know that unlike SOAP based services, there is a lack of REST services contractual support in WCF. In lieu of this need, I have started this project, WADL on CodePlex, to build a tool which generates code for RESTFul
Web services and RESTFul Web service clients from WADL contract files.
With YABE and everything else going on, I think I would need some significant help from open source volunteers and contributing developers looking forward to work in the area of REST interoperability. Interested? email me 
or contact via codeplex.
The intent of this project is to for .wadl, do exactly what WSDL does for SOAP based services. Further details about WADL file format are as follows.
A
.wadl file is an XML document written in an XML grammar called Web
Application Description Language (WADL). This file defines how an REST
based Web service behaves and instructs clients as to how to interact
with the service. When you use Wadl.exe to create a proxy class, a
single source file is created in the programming language that you
specify. In the process of generating the source code for the proxy
class, the tool determines the best type to use for objects specified
in the service description.
wadl: Web Application Description Language (WADL) – Specification …
Web Application Description Language (WADL) https://wadl.dev.java.net/wadl20061109.pdf
The
WADL specification (PDF) and WADL schema describe the features of the
language in detail but a few points are worth highlighting: [1]
- Generative URIs are handled by including support for parameterization of URI components.
- The
base set of HTTP methods (GET, POST, PUT, DELETE, HEAD) are
specifically supported by the WADL schema but the method enumeration is
open to use with other methods such as those specified by WebDAV. - Representation
parameters are hints to processors that point out interesting parts of
a resource representation. As such they may be used or ignored as
desired. I think they’ll be useful for RPC-like interactions but less
so for more document oriented work. - Sibling representation elements
represent alternative representations of the same resource. This means
you can describe a resource that is offered in alternate formats, e.g.
XML or HTML. - The design center is XML/HTTP but this doesn’t
preclude use with alternate representation formats: the
representation/@mediaType attribute provides the necessary hook.
Reference
[1] http://weblogs.java.net/blog/mhadley/archive/2005/05/introducing_wad.html
Concluding Thoughts on KDD 2008 and High Resolution Posters.
(Update: I finally got around to upload the conference posters in original higher resolution. This wasn’t my usual cannon so sorry for the little grainy result but it should be kinda readable.)
KDD 2008 was a great learning experience, providing opportunities for life-long learning, career development, and professional networking. It helps very much in getting to know the data mining community, the current trends in knowledge discovery & machine learning areas and that all these prolific authors and researchers are actually humans, not robots like previously thought.
Social Networking was the dominating theme of the conference and research areas specified, no doubt about that. They key sessions were as follows.
- Trevor Hastie of Stanford University on “Regularization Paths and Coordinate Descent
- Thore Graepel of Microsoft Research on “Large Scale Data Analysis and Modeling in Online Services and Advertising”
- Michael Schwarz of Yahoo! Research on “Internet Advertising and Optimal Auction Design”
- Jitendra Malik of the University of California Berkeley on “The Future of Image Search”
One of my personal favorites was Foster Provost and Jennifer Neville’s tutorial session on predictive modeling in social networks. Also, I got a chance to meet and talk to the the following luminaries of the genre.
- Professor Foster Provost, editor-in-Chief of the journal Machine Learning
- Ron Kohavi (GM for Microsoft’s Experimentation Platform,)
- Gregory Piatetsky-Shapiro (Chair ACM SIGKDD)
And to see the following
- Christos Faloutsos
- Usama Fayyad
- Trevor Hastie
- Raghu Ramakrishnan (author of cow book)
However, I missed the chance of meeting Dr. Jaiwei Han. He had to leave early.
It was a well organized event with breaks. There are a few suggestions I have for improvement.
1. Provide a voting mechanism to allow people to choose their sessions of liking in advance; allocate the size of room according to the interest. This might not be perfect but will provide a good estimation. This is because some of the rooms were completely packed when people were sitting on the floor in the alleyway and some of them were half empty.
2. Full disclosure and reproducibility is important in academia and research. Some of the data used in the papers and presentations was unavailable for verification of the claims due to the proprietary nature of it, especially some of the vendor specific presentations (Yahoo, Microsoft and Orkut etc). There are very effective anonymization and privacy preserving techniques to allow the sharing.
3. Slides of the presentations should be made available to the attendees.
and next time, J’aime la vie en Paris!
KDD 2008 – Day 3 & 4.
Day 3 started with Invited Talk of Dr. Michael Schwarz from Yahoo! Research on “Internet Advertising and Optimal Auction Design” who discussed Generalized English Auctions and Internet advertising, generalized second price options. (Details can be read on Akshay Java’s Social Media research blog here). It was an interesting talk pertaining to how almost every transaction follows the modern auction model and what approaches can be used to maximize the throughput and ROI.
Later during regular sessions, I attended the Discovery and Detection research session which was focused on outlier analysis. It comprised of the following presentations.
25-minute presentations
- Automatic Identification of Quasi-Experimental Designs for Discovering Causal Knowledge. D. D. Jensen, A. S. Fast, B. J. Taylor, M. E. Maier.
- Discrimination-aware Data Mining. D. Pedreschi, S. Ruggieri, F. Turini.
15-minute presentations
- Local Peculiarity Factor and Its Application in Outlier Detection. J. Yang, N. Zhong, Y. Yao, J. Wang.
- Angle-Based Outlier Detection in High-dimensional Data. H. Kriegel, M. Schubert, A. Zimek.
- Anomaly Pattern Detection in Categorical Datasets. K. Das, J. Schneider, D. B. Neill.
Lunch was sponsored by Yahoo! and it was pretty cool décor with their gadgets, puzzles and YahooDokus. Later in the afternoon I attended
25-minute presentations
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Y. Koren.
- Combinational Collaborative Filtering for Personalized Community Recommendation. W. Chen, D. Zhang, E. Y. Chang.
And then
- (R) Scaling Up Text Classification for Large File Systems. G. Forman, S. Rajaram.
- (I) Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere. R. Grossman, Y. Gu Short.
I later got a chance to talk to Dr. Grossman about the cloud computing initiative that he is very passionate about. He discussed in his presentation how Sector is approximately twice as fast as Hadoop and how Sector has been used to distribute the Sloan Digital Sky Survey (SDSS) via the web site sdss.ncdm.uic.edu.
The day concluded with Poster reception where I get to talk to several authors and presenters including Wen-Yen Chen, Pooja Mittal, Yabo-Arber Xu, Kaustav Das of Anomaly Pattern Detection in Categorical Datasets and one of the authors of Information Extraction from Wikipedia, not sure who.
Day 4
The last day of conference started with Jitendra Malik’s invited talk on “The Future of Image Search”. (Greg Linden’s Blog Post about the talk). It was a great talk where Jitendra discussed the evolution of vision, image search, shortcomings of tagging and textual taxonomies and pushed for "category recognition" for objects in images.
Later there were the following excellent sessions.
- Context-Aware Query Suggestion by Mining Click-Through and Session Data. H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, H. Li. (Best Application Paper Award Winner)
- Scalable and Near Real-Time Burst Detection from eCommerce Queries. N. Parikh, N. Sundaresan.
- Using Predictive Analysis to Improve Invoice-to-Cash Collection. S. Zeng, P. Melville, C. A. Lang, I. Boier-Martin, C. Murphy.
- TagMark: Reliable Estimations of RFID Tags for Business Processes. L. W. F. Chaves, E. Buchmann, K. Böhm.
The conference concluded with closing remarks from the general chair, Ying Li.
Call for Volunteers – SoCal Code Camp Oct 25-26.
Greetings Southern California .NET Community
I
was wondering if any of you would be able to volunteer to help register
speakers, attendees, & help with general organization, etc. on the Code Camp event days (Saturday/Sunday October 25/26).
I am heading up Volunteer Coordination so if you are available please contact me directly (
) and put volunteer in subject line) or sign up via the “Contacts” pull-down of www.socalcodecamp.com.
Thanks!
Adnan Masood
Volunteer Coordinator
SoCal.NET Code Camp
www.SoCalCodeCamp.com
President & Co-Founder
San Gabriel Valley .NET Developers Group
www.SGVdotNet.org
KDD 2008 Day 2
Day 2 started with Trever Hastie’s talk on regularization paths and coordinate descent. It was great to see Dr. Hastie passionately speaking about the coordinate descent, logistic regression and fitting. The keynote’s topic was “Regularization Paths and Coordinate Descent” and following is a brief abstract from the talk.
“In a statistical world faced with an explosion of data, regularization has become an important ingredient. In many problems, we have many more variables than observations, and the lasso penalty and its hybrids have become increasingly useful. This talk presents some effective algorithms based on coordinate descent for fitting large scale regularization paths for a variety of problems. Joint work with Rob Tibshirani and Jerome Friedman”
After the keynote talk, there were combined and research sessions. I attended one with social Networks which comprised of the following presentations.
25-minute presentations
- The Structure of Information Pathways in a Social Communication Network. G. Kossinets, J. Kleinberg, D. Watts.
- Influence and Correlation in Social Networks. A. Anagnostopoulos, R. Kumar, M. Mahdian.
- Weighted Graphs and Disconnected Components. M. McGlohon, L. Akoglu, C. Faloutsos.
15-minute presentations
- Microscopic Evolution of Social Networks. J. Leskovec, L. Backstrom, R. Kumar, A. Tomkins.
- Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. M. Seshadri, S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, J. Leskovec.
- Feedback Effects between Similarity and Social Influence in Online Communities. D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, S. Suri.
During lunch which was sponsored by Microsoft adCenter labs, they talked about challenges in advertising and applying it to get the best out of revenue share and context base hits. They also announced adCenter labs challenge which I have yet to find any information about online.
Later in the afternoon, Microsoft Research’s Thore Graepel talked about Large Scale Data Analysis and Modeling in Online Services and Advertising. It was a very interesting and pragmatic presentation about the real world problems in online search and advertising. Even though the first part of presentation was online ranking and matchmaking heavy, the later discussion on advertising made up for it.
Then there were 25-minute presentations
- ArnetMiner: Extraction and Mining of Academic Social Networks. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su.
- Identifying Authoritative Actors in Question-Answering Forums – The Case of Yahoo! Answers. M. Bouguessa, B. Dumoulin, S. Wang. (the paper was on Belinko’s website but not there anymore, interesting?)
And on a separate track,
15-minute presentations
- Automated Cyclone Discovery and Tracking using Knowledge Sharing in Multiple Heterogeneous Satellite Data. S.-S. Ho, A. Talukder.
- Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. N. Koenigstein, Y. Shavitt, T. Tankel.
- Land Cover Change Detection: A Case Study. S. Boriah, V. Kumar, M. Steinbach, C. Potter, S. Klooster.
Later after the automated cyclone discovery session, I got a chance to meet Dr. Talukder of Jet Propulsion Laboratory/NASA to talk about a mutual acquaintance, Dr. Homayoun Seraji
This concluded day 2 of the conference.
Post Event Resources – MSDN Webcast on REST and WCF
The MSDN webcast on geekSpeak: REST and Windows Communication Foundation 3.5 went very well; Since REST is a very broad topic and there were tons of questions, I didn’t get a chance to show all the demos however the sample code can be downloaded from here.
Also, keep an eye on geek speak blog for future updates. Overall. there is a lot of concern about security in REST. I’ll be doing a series of blog posts on security in REST in near future however in the mean time, following resources would be provide a good starting point.
Mark O’Neill’s Radio Weblog
Message Level Security in REST
Taking Amazon S3 as a model for secure REST services can be one way to implement security in REST. As mentioned in this article by Eric Heuveneers
“Amazon S3 REST resources are secure. This is important not just for your own purposes, but also because customers are billed depending on how their S3 buckets and objects are used. An AWSSecretKey is assigned to each AWS customer, and this key is identified by an AWSAccessKeyID. The key must be kept secret and will be used to digitally sign REST requests. S3 security features are:
- Authentication: Requests include AWSAccessKeyID
- Authorization: Access Control List (ACL) could be applied to each resource
- Integrity: Requests are digitally signed with AWSSecretKey
- Confidentiality: S3 is available through both HTTP and HTTPS
- Non repudiation: Requests are time stamped (with integrity, it’s a proof of transaction)
The signing algorithm is HMAC/SHA1 (Hashing for Message Authentication with SHA1).’
Reference: Introduction to Amazon S3 with Java and REST
Links to the books and reference articles mentioned in the webcast are as follows. Please feel free to send me your questions and comments on my email 
Books
- RESTFul .NET
- RESTFul Web Services
References
- The Weekly Source Code 32- Atom, AtomPub and BlogSvc, an AtomPub Server in WCF
- WCF 3.5 Utilities
- WCF 3.5 RSS / ATOM Syndication Support
- Dare Obasanjo aka Carnage4Life – ETech 2005 Trip Report Building a New Web Service at Google
- Defining REST based Formats
- InfoQ Dan Diephouse on Atom, AtomPub, REST and Web Services
- MIX07 Buzzcast #12 – Steve Maine – Navigating the Programmable Web MIX07 Buzzcast Channel 9
- Scott Hanselman’s Computer Zen – Web Services
- Showcase of Live ASP.NET MVC Sites – Mike Bosch’s Blog on .NET
- The Highs and Lows of REST
- TRUVEO – Complete ASP.NET sample REST Service Calls
- WCF & REST at MIX08 The Tale of MySpace APIs
- WCF Web Programming Model Overview
- WebServiceStudio – Home (CodePlex Project)
- YABE Yet Another Blogging Engine – Home (CodePlex Project)
- WADL (CodePlex Project)