More Interesting Challenges in Spam and OCR

I’d never know who actually take their stock purchase advise from email messages which start with bunch of garbled text and inform you of insider information about price sky rocketing next week. Regardless of its financial significance or lack of it, this has increased the job for anti-spam tools since now it’s not about text anymore. In order to make it difficult for OCR and CAPTCHA breakers, the text is also made difficult to read as shown in the examples below I regularly receive in my gmail.

There has been several discussions about CAPTCHA's effectiveness over the years. Jay Allen has spoken, have yet to see what Paul Graham has to say about it. Breaking a Visual CAPTCHA: High Level Description describes the underlying mechanics. The bigger question is, are these methodologies efficient or just band-aids to the existing tricks which will only work till spammers find a new way around it. I think I'd agree with author of On Intelligence on this matter when he suggested that our approach to AI is inherently wrong. How do a person sees an image or an email and know that it's unsolicited commercial email, cerebral cortex? lets map it. Possibly the problem of the century but we would have it sort out, sooner than later.

References and Further Readings

Telling humans and computers apart automatically
Luis von Ahn, Manuel Blum, John Langford
Communications of the ACM,  Volume 47 Issue 2

Shape Matching and Object Recognition
Berkeley Computer Vision page

Email and security: Designing human friendly human interaction proofs (HIPs)
Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski
Proceedings of the SIGCHI conference on Human factors in computing systems

Poster 2: applications track: IMAGINATION: a robust image-based CAPTCHA generation system
Ritendra Datta, Jia Li, James Z. Wang
Proceedings of the 13th annual ACM international conference on Multimedia MULTIMEDIA '05

Games: Preventing bots from playing online games
Philippe Golle, Nicolas Ducheneaut
Computers in Entertainment (CIE),  Volume 3 Issue 3

Invited workshop on conceptual information retrieval and clustering of documents: Spam filters: bayes vs. chi-squared; letters vs. words
Cormac O'Brien, Carl Vogel
Proceedings of the 1st international symposium on Information and communication technologies ISICT '03

Breaking a Visual CAPTCHA

15 Seconds : Fighting Spambots with .NET and AI -- Cont'd

Share