# Algorithms

## P≠NP - A Definitive Proof by Contradiction

Following the great scholarly acceptance and outstanding academic success of "The Clairvoyant Load Balancing Algorithm for Highly Available Service Oriented Architectures, this year I present **P Not Equal to NP - A Definitive Proof by Contradiction**.

Click here to read the entire paper in PDF. **P Not Equal to NP - A Definitive Proof by Contradiction**.

## Machine Learning - On the Art and Science of Algorithms with Peter Flach

Over a decade ago, Peter Flach of Bristol University wrote a paper on the topic of "On the state of the art in machine learning: A personal review" in which he reviewed several, then recent books, related to developments in machine learning. This included Pat Langley’s Elements of Machine Learning (Morgan Kaufmann), Tom Mitchell’s Machine Learning (McGraw-Hill), and Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations by Ian Witten and Eibe Frank (Morgan Kaufman) among many others. Dr. Flach mentioned Michael Berry and Gordon Linoff’s Data Mining Techniques for Marketing, Sales, and Customer Support (John Wiley) for it's excellent writing style citing the paragraph below and commending "I wish that all computer science textbooks were written like this."

“People often find it hard to understand why the training set and test set are “tainted” once they have been used to build a model. An analogy may help: Imagine yourself back in the 5th grade. The class is taking a spelling test. Suppose that, at the end of the test period, the teacher asks you to estimate your own grade on the quiz by marking the words you got wrong. You will give yourself a very good grade, but your spelling will not improve. If, at the beginning of the period, you thought there should be an ‘e’ at the end of “tomato”, nothing will have happened to change your mind when you grade your paper. No new data has entered the system. You need a test set!

Now, imagine that at the end of the test the teacher allows you to look at the papersof several neighbors before grading your own. If they all agree that “tomato” has no final ‘e’, you may decide to mark your own answer wrong. If the teacher gives the same quiz tomorrow, you will do better. But how much better? If you use the papers of the very same neighbors to evaluate your performance tomorrow, you may still be fooling yourself. If they all agree that “potatoes” has no more need of an ‘e’ then “tomato”, and you have changed your own guess to agree with theirs, then you will overestimate your actual grade on the second quiz as well. That is why the evaluation set should be different from the test set.” [3, pp. 76–77] 4

That is why when I recently came across * *"Machine Learning The Art and Science of Algorithms that Make Sense of Data", I decided to check it out and wasn't disappointed. Dr. Flach is the Professor of Artificial Intelligence at the University of Bristol and in this "future classic", he left no stone unturned when it comes to clarity and explainability. The book starts with a machine learning sampler, introduces the ingredients of machine learning fast progressing to Binary classification and Beyond. Written as a textbook, riddled with examples, foot-notes and figures, this text elaborates concept learning, tree models, rule models, linear models, distance-based models, probabilistic models to features and ensembles concluding with Machine learning experiments. I really enjoyed the "Important points to remember" section of the book as a quick refresher on machine-learning-commandments.

The concept learning section seems to have been influenced by author's own research interest and is not discussed in as much details in contemporary machine learning texts. I also found frequent summarization of concepts to be quite helpful. Contrary to it's subtitle and compared to it's counterparts, the book however is light on algorithms and code, possibly on purpose. While it explains the concepts with examples, number of formal algorithms are kept to a minimum. This may aid in clarity and help avoiding recipe-book-syndrome while making it potentially inaccessible to practitioners. Great at basics, the text also falls short on elaboration of intermediate to advance topics such as LDA, kernel methods, PCA, RKHS, and convex optimization. For instance, in chapter 10 "Matrix transformations and decompositions" could have been made an appendix while expanding upon meaningful topics like LSA and use cases of sparse matrix (pg 327). It is definitely not the book's fault; but rather of this reader expecting too much from an introductory text just because author explains everything so well!

As a text book on On the Art and Science of Algorithms, Peter Flach definitely delivers on the promise of clarity, with well chosen illustrations and example based approach. A highly recommended reading for all who would like to understand the principles behind machine learning techniques.

Materials can be downloaded from here which generously include excerpts with background material and literature references, full set of 540 lecture slides in PDF including all figures in the book with LaTeX beamer source of the above.

## Demystification of Demystifying Machine Learning using nuML w/ Seth Juarez

Going for a little Benoit B. Mandelbrot recursion joke here with the title.

Seth Juarez (github) recently spoke to Pasadena .NET user group on the topic of Practical Machine Learning using nuML. Seth is a wonderful speaker, educator and nuML is an excellent library to get started with machine learning in .NET. His explanations are very intuitive; even for people who have been working in the field for a while. During the talk and follow up discussions, there were various technical references made which went beyond the scope of talk. To be fair with Seth, he covered lot of material in an hour and a half; probably couple of weeks worth in a traditional ML course.

Therefore I decided to provide links to these underlying topics for the benefit of attendees in case anyone is interested in knowing more about them.

- No free lunch in search and optimization
- Probably approximately correct learning
- Kernalized Sorting for NLP Presentation - Paper by Seth
- QP Solver
- NP-Complete Problems
- Intuitive Explanation of Expectation Maximization
- Multi-class classification
- REPL
- Rosylyn and Roslyn CTP Introduces Interactive Code for C#
- Expando Objects
- Cardinality vs Selectivity
**Microsoft Automatic Graph Layout Library**- Positive Definite Matrix
- Kernel Perceptron in Python
- Perceptrons and Kernels
- math.net numerics
- Matrix Slicing
- Vectors and Matrices
- CodeMash 2013 Repo and readme
- What is EM algorithm?
- k-means clustering
- Clustering Algorithms
- Bag of Words Model
- Cosine similarity vs Hamming distance
- Time series regression and generalized least squares
- Machine Learning Techniques for Stock Prediction
- Causality, Correlation and Browian Motion

Happy Machine Learning!

## The Clairvoyant Load Balancing Algorithm for Highly Available Service Oriented Architectures

*Clairvoyant*for optimal yet unrealizable distribution of traffic.

*Clairvoyant*, by utilizing the ensemble of anomalous cognition, ESP, remote viewing and psychometry, can provide a high performance yet irreproducible load balancing approach. The

*Clairvoyant*load balancing algorithm helps the system administrator fine-tune how traffic is distributed across connections in a psychic manner. Backed by parapsychological research[1], each load balancer is equipped with an enterprise grade channeling medium with features to fulfill potential special deployment requirements. Building upon the techniques proposed in RFC 5984, using extrasensory perception to achieve "infinite bandwidth" in IP networks,

*Clairvoyant*can achieve negative latency as well as negative transmission time difference with appropriate parameters, unachievable by traditional methods[6, 3]. The algorithm uses claircognizance to redirect traffic to one of the unused or even non existent nodes. Clairaudience allows setting up the connection priority order, however early experiments suggest that using 0x8 spherical surfaces also achieve the same level of performance when compared using ROC/AUC.

*packet forwarding that will provide unsurpassed end user performance regardless of link capacity, distance, and number of hops*. Detailed algorithm and findings will be published in The Journal of Irreproducible Results by 4/1/2014.

# References

[1] . Psychic Routing: Upper Bounds on Routing in Private DTNs. , 2011.

[2] . Black-boxing the user: internet protocol over xylophone players (IPoXP). *Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Extended Abstracts*:71—80, 2012.

[3] . Simple efficient load balancing algorithms for peer-to-peer systems. *Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures*:36—43, 2004.

[4] . Increasing Throughput in IP Networks with ESP-Based Forwarding: ESPBasedForwarding. , 2011.

[5] . Service Undiscovery Using Hide-and-Go-Seek for the Domain Pseudonym System (DPS). , 2012.

[6] . Performance analysis of load balancing algorithms. *World Academy of Science, Engineering and Technology*, 38:269—272, 2008.

## LA Machine Learning event on Mining Time Series Data w/ Sylvia Halasz

Last night's LA Machine Learning event on Mining Time Series Data w/ Sylvia Halasz of YP at OpenX Pasadena was quite interesting and well attended. Dr. Halasz spoke about Adaptive Ensemble Kalman Filter and her work on building n-gram correlation with the flu outbreaks. Some of the associated papers follow.

- The ngram chief complaint classifier: A novel method of automatically creating chief complaint classifiers based on international classification of diseases groupings
- Detecting the start of the flu season
- Syndrome Surveillance - CDC

## Causality, Probability, and Time - A Temporo-Philosophical Primer to Causal Inference with Case Studies

Causality, Probability and Time by Dr. Samantha Kelinberg is a whirlwind yet original journey of the interdisciplinary study of probabilistic temporal logic and causal inference. Probabilistic causation is a fairly demanding area of study which studies the relationship between cause and effect using the tools of probability theory. Judea Pearl, in his seminal text "Causality: Models, Reasoning, and Inference" refers to this quandary by stating that

(causality) connotes lawlike necessity, whereas probabilities connote exceptionality, doubt, and lack of regularity.

Dr. Kelinberg's work provides a balanced introduction to background work on this topic while breaking new grounds on a well-positioned approach of causality based on temporal logic. The envisioning problem is the problem of deducing the set of facts, possibly as the result of our actions leading to the decision problem. This is compounded with finding a timely and useful way to represent our knowledge about time, change, and chance.

In this ~260 page book, Dr. Kelinberg begins with a brief history of causality leading to Probability, logic and probabilistic temporal logic. The author then defines causality from various different facets, proceeding to causality inference, token causality and then finally the case studies. With practical examples and algorithms, author devises simple mathematical tools for analyzing the relationships between causal connections, inference, causal significance, model complexity, statistical associations, actions and observations.

Exploiting the temporal nature of probabilistic events, Dr. Kelinberg's research is a thought provoking and valuable addition to the scientific community interested in learning causal effects and inference with respect to time. Built upon the works of the likes of Heckerman, Breese, Santos and Young, this book will pave the way probabilistic reasoning researchers think about temporal effects on causality for years to come.

David Hume believed that the causes are invariably followed by their effects: "We may define a cause to be an object, followed by another, and where all the objects similar to the first, are followed by objects similar to the second." So, would you like a well written margin-annotation-laden text which provides formal and practical case study based approach to this somewhat abstract concept of causality? Then look no further!

## Selected Papers on Interestingness Measures, Knowledge Discovery and Outlier Mining

- S. Abe and T. Inoue.
**Fuzzy support vector machines for multiclass problems**.In ESANN 2002 Proceedings, pages 113-118, 2002.

**E.L. Allwein, RE. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research,**1:113-141,2000.

**P. Baldi and K. Hornik. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2(1):53-58**1989

- Marc Benioff. Data,
**data everywhere: A special report on managing information.**The Economist, February 2010.

**J.M. Keller R Krishnapuram L.I. Kuncheva J.C. Bezdek and N.R Pal. Will the real iris data please stand up? IEEE Transactions on Fuzzy Systems, 7:3,**1999.

**C. J. C. Burges. A tutorial on support vector machines for pattern recognition.**Data Mining and Knowledge Discovery, 2:121-167, 1998.

- C. Chen, A. Liaw, and L. Breiman.
**Using random forest to learn imbalanced data.**Technical report, Department of Statistics, UC Berkeley, 2004.

**V. Cherkassky and F. Mulier. Learning from Data: Concepts, Theory and**

**Methods. John Wiley & Sons, Inc., 1998.**

**R. Cilibrasi and P. Vitanyi. Clustering by compression. IEEE Transactions on**Information Theory, 51(4):1523-1545, 2005.

- R. Cilibrasi and P. Vitanyi.
**Normalized web distance and word similarity.**

CoRR, abs/0905.4039, 2009.

**R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. In**WEDELMUSIC, pages 110-117, 2004.

**T. Downs, I. Wood, and M. Gallagher. Empirical evidence for ultrametric structure in multi layer perceptron error surfaces. Neural Processing Letters,**16(2):177~186, 2002.

- A.A. Freitas.
**Are we really discovering "interesting" knowledge from data?**

Expert Update (the BCS-SGAI Magazine), 9(1):41~47, October 2006.

**L. Geng and H. J. Hamilton. Interestingness measures for data mining: A**

**survey. ACM Comput. Surv., 38(3), 2006.**

- M. Gori and F. Scarselli.
**Are multilayer perceptrons adequate for pattern recognition and verification?**IEEE Trans. Pattern Anal. Mach. Intell., 20(11):1121~

1132, 1998.

**P.M. Granitto, P.F. Verdes, and H.A. Cecatto. Neural network ensembles:**

**evaluation of aggregation algorithms. arXiv, arXiv:cs.AI/0502006vl, 2005.**

- S. Hashemi and T.P. Trappenberg.
**Using svm for classification in data sets with ambiguous data.**In International Conference on Information Systems, Analysis and Synthesis (SCI 2002), 2002.

**M. Hassoun. Fundamentals of Artificial Neural Networks. Massachusetts Institute of Technology, 1995.**

- S. Haykin. Neural Networks:
**A Comprehensive Foundation****.**Prentice-Hall Inc., second edition, 1999.

**Z. He, X. Xu, and S. Deng. Discovering cluster-based local outliers. Pattern**Recognition Letters, 24(9-10):1641-1650, 2003.

- S. Hettich and S. D. Bay.
**Kdd cup 1999 data.**UCI KDD Archive[http://kdd.ics.uci.edu/ /databases/kddcup99/kddcup99.html], 1999.

**L. Itti and P. Baldi. Bayesian surprise attracts human attention. In Proceedings**Neural Information Processing Systems, 2005.

**B. Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data**

**{Data-Centric Systems and Applications}. Springer, January 2007.**

**S. Singh M. Markou. Novelty detection: a review part 1: statistical approaches.**Signal Processing, 83(12):2481-2497, December 2003.

**K. McGarry. A survey of interestingness measures for knowledge discovery. The**Knowledge Engineering Review, 00:0:1-24, 2005.

**P. M. Murphy and M. J. Pazzani. Exploring the decision forest: an empirical investigation of occam's razor in decision tree induction. J. Artij. Int. Res.,**1(1):257-275, 1993.

**A. Orriols-Puig, J. Casillas, and E. Bernado-Mansilla. First approach toward online evolution of association rules with learning classifier systems. In GECCO**'08: Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, pages 2031-2038, New York, NY, USA, 2008. ACM.

**Y.H. Pao and C.- Y. Shen. Visualization of pattern data through learning of non- linear variance-conserving dimension-reduction mapping. Pattern Recognition,**30(10):1705-1717,1997.

**J.M. Puche, J.M. Benitez, and J.L. Mantas. Fuzzy pairwise multiclass support vector machines. In A. Gelbukh and C.A. Reyes-Garcia, editors, Mexican International Conference on Artificial Intelligence (MICA I) , volume LNAI, pages**562-571. Springer-Verlag, 2006.

**M. Robnik-Sikonja. Improving random forests. In J.F. Boulicaut et al., editor,**Machine Learning, ECML 2004, 2004.

**J. Schmidhuber. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. CoRR, abs/0812.4360,**2009.

- C. Shirky.
**It's not information overload. it's filter failure.**Keynote Speech, September 2008.

**E. Suzuki. Data mining methods for discovering interesting exceptions from an unsupervised table. Journal of Universal Computer Science, 12(6):627-653,**2006. http://w ..... jucs. org/jucs_12_6/data_mining_methods_for.

- E. Suzuki.
**Lecture Notes in Computer Science,**volume 5579/2009, chapter Compression-Based Measures for Mining Interesting Rules, pages 741-746. Springer Berlin / Heidelberg, 2009.

**P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining, {First**2005.

Edition}. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,

**D. Tax and R Duin. Experiments with classifier combining rules. In Lecture**Notes in Computer Science, volume 1857, pages 16-29, Berlin, 2000. Springer- Verlag.

**D. Tax and RP.W. Duin. Using two-class classifiers for multi class classification.**In C. Suen R Kasturi, D. Laurendeau, editor, Proceedings 16th International Conference on Pattern Recognition, volume II, pages 124-127, Quebec City, Canada, Aug.11-15 2002. IEEE Computer Society Press.

**I. Tsang, J. Kwok, P. Cheung, and N. Cristianini. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research,**6:363-392, 2005.

**L. H. Tsoukalas and R E. Uhrig. Fuzzy and Neural Approaches in Engineering.**John Wiley & Sons, Inc., New York, NY, USA, 1996.

**C. S. Wallace and D. M. Boulton. A information measure for classification.**Computer Journal, 11(2):185-194, 1968.

**J.-S. Wang and J.-C. Chiang. An efficient data preprocessing procedure for support vector clustering. Journal of Universal Computer Science, 15(4):705-**721, 2009. http://www . jucs. org/jucs_15_4/an_efficient_data_preprocessing.

**J. D. Williams. The Compleat Strategyst: Being a Primer on the Theory of**

Games of Strategy. Dover Publications, 1986.

## Hilary Mason - Machine Learning for Hackers

An interesting beginners talk for machine learning enthusiasts.

Ever tried to use a regular expression to parse an unstructured street address? This talk is an introduction to a few machine learning algorithms and some tips for integrating them where they make the most sense and will save you the most headaches.

Hilary Mason - Machine Learning for Hackers from BACON: things developers love on Vimeo.

## On Bayesian Sensitivity Analysis in Digital Forensics

The idea of using of Bayesian Belief Networks in digital forensics to quantify the evidence has been around for a while now. To provide qualitative approaches to Bayesian evidential reasoning in the digital Meta-Forensics is however relatively new in the decision support systems research. For law enforcement, decision support and application of data mining techniques to “soft” forensic evidence is a large area in Bayesian forensic statistics which has depicted how Bayesian Networks can be used to infer the probability of defense and prosecution statements based on forensic evidence. Kevin B. Korb and Ann E. Nicholson's study on Sally Clark is Wrongly Convicted of Murdering Her Children and Linguistic Bayesian Networks for reasoning with subjective probabilities in forensic statistics gives an insight into an important development which helps to quantify the meaning of forensic expert testimony for "strong support".

The IEEE paper on Sensitivity Analysis of a Bayesian Network for Reasoning about Digital Forensic Evidence published in 3rd International Conference on Human-Centric Computing (HumanCom), 2010 is of particular interest since it has a comprehensive real-world list of evidence items and hypothesis.

Bayesian network representing an actual prosecuted case of illegal file sharing over a peer-to-peer network has been subjected to a systematic and rigorous sensitivity analysis. Our results demonstrate that such networks are usefully insensitive both to the occurrence of missing evidential traces and to the choice of conditionalevidential probabilities

one of the co-authors Dr. Overill has also covered grounds for A Complexity Based Forensic Analysis of the Trojan Horse Defence.

The evidence nodes are follows.

- Modification time of the destination file equals that of the source file
- Creation time of the destination file is after its own modification time
- Hash value of the destination file matches that of the source file
- BitTorrent client software is installed on the seized computer
- File link for the shared file is created
- Shared file exists on the hard disk
- Torrent file creation record is found
- Torrent file exists on the hard disk
- Peer connection information is found
- Tracker server login record is found
- Torrent file activation time is corroborated by its MAC time and link file
- Internet history record about the publishing website is found
- Internet connection is available
- Cookie of the publishing website is found
- URL of the publishing website is stored in the web browser
- Web browser software is available
- Internet cache record about the publishing of the torrent file is found
- Internet history record about the tracker server connection is found
- The seized computer was used as the initial seeder to share the pirated file on a BitTorrent network

while the following hypothesis stand.

- The pirated file was copied from the seized optical disk to the seized computer
- A torrent file was created from the copied file
- The torrent file was sent to newsgroups for publishing
- The torrent file was activated, which caused the seized computer to connect to the tracker server
- The connection between the seized computer and the tracker server was maintained

The authors conclude, exonerating the sparse evidence such that

The sensitivity analysis reported in this paper demonstrates that the BT BBN used in is insensitive to the occurrence of missing evidence and also to the choice of evidential likelihoods to an unexpected degree.

Our overall finding is gratifying because it implies that the exact choice of values for the inherently subjective evidential likelihoods is not as critical as might have been expected. Values falling within the consensus of experienced expert investigators are sufficiently reliable to be used in the BBN model. Furthermore, our results imply that the inability to recover one or more evidential traces in a digital forensic investigation is not generally critical for the probability of the investigatory hypothesis under consideration.

For some reason, this reminded me of a recent read SuperFreakonomics where authors devise a terrorist-algorithm with the following black-box variable.

“What finally made it work was one last metric that dramatically sharpened the aalgorithm. In the interest of national security, was have been asked to not disclose the particulars; we’ll call it Variable X.

What makes Variable X so special?

For one, it is a behavioral metric, not a demographic one. The dream of anti-terrorist authorities everywhere is to somehow become a fly on the wall in a room full of terrorists. In one small important way, Variable X accomplishes that. Unlike most other metrics in the algorithm, which produce a yes or no answer, Variable X measures the intensity of a particular banking activity. While not unusual in low intensities among the general population, this behavior occurs in high intensities much more frequently among those who have other terrorist markers.

This ultimately gave the algorithm great predictive power. Starting with a database of millions of bank customers, Horsley was able to generate a list of about 30 highly suspicious individuals. According to his rather conservative estimate, at least 5 of those 30 are almost certainly involved in actitvities. Five out of 30 isn’t perfect—the algorithm misses many terrorists and still falsley identifies some innocents—but it sure beats 495 out of 500,495.”

Bayesian Belief Networks can definitely serve as a better probabilistic graphical model to achieve a improved visibility and prior/posterior probabilities for such network related algorithm.

## pgm.HelloWorld() with Wainwright & Jordan

I have recently came across Wainwright & Jordan's paper on exponential families, graphical models, and variational inference and found it to be quite comprehensive and unifying introduction of the topic. Probabilistic graphical models use a graph-based representation as the basis for compactly encoding a complex distribution over a high-dimensional space. If you are familiar with Koller and Friedman's work on Probabilistic Modeling, Wainwright and Jordan's paper would provide a less mathenamtically terse and more unifying view of the area.

Graphical Models, Exponential Families, and Variational Inference

As compared to Pearl's work on Causality, this paper provides a contemporary look at Message-passing Algorithms for Approximate Inference, Connection to Max-Product Message-Passing and detailed insight into Moment Matrices, Semidefinite Constraints, and Conic Programming Relaxation. Due to it's clarity and detailed explanation, the background material on Graphs, hypergraphs, exponential families and duality is definitely worth reading even if you don't need a refresher.

In Lieu of Pearl's polytree approach, Wainwright & Jordan's work discusses Graphical Models as Exponential Families before delving into Computational Challenges with High-Dimensional Models. Later chapters deal with Sum-Product, Bethe–Kikuchi, and Expectation-Propagation, Mean Field Methods, Variational Methods in Parameter Estimation, Convex Relaxations and Upper Bounds, Integer Programming, Max-product, and Linear Programming Relaxations concluding with Moment Matrices, Semidefinite Constraints, and Conic Programming Relaxation. For a computer scientist, it is always interesting to observe the statistical perspective of machine learning. This contemporary insight into Graphical Models, Exponential Families, and Variational Inference was published in Foundations and Trends in Machine Learning which is definitely built upon researchers' earlier work on Variational inference for Dirichlet process mixtures and Variational inference in graphical models: The view from the marginal polytope.

As an appetizer, I would also recommend Bishop's chapter on Graphical Model.