Algorithms

The Clairvoyant Load Balancing Algorithm for Highly Available Service Oriented Architectures

Abstract: Load balancing allows network devices to distribute workload across multiple processing resources including server clusters and storage devices. This distribution helps maximize throughput, achieve optimal resource utilization, minimize response time and help use hardware effectively in multiple data-center locations. As a meta-heuristic enhancement to Psychic Routing[1], researchers present early work in a novel algorithm Clairvoyant for optimal yet unrealizable distribution of traffic.
Among many earlier works including [5, 4], the main inspiration of this algorithm is the RFC 1149, i.e. a standard for the Transmission of IP Datagrams on Avian Carriers. Study of literature suggests that earlier work by [7, 2] on internet protocol over xylophone players (IPoXP) also has a huge impact on classical OSI network model. A typical application load balancing is based on techniques including round robin, weighted round robin, least connections, shortest response, SNMP, weighted balance, priority, overflow, persistence, least used, lowest latency, and enforced traffic flow [6]. Researchers propose that Clairvoyant, by utilizing the ensemble of anomalous cognition, ESP, remote viewing and psychometry, can provide a high performance yet irreproducible load balancing approach. The Clairvoyant load balancing algorithm helps the system administrator fine-tune how traffic is distributed across connections in a psychic manner. Backed by parapsychological research[1], each load balancer is equipped with an enterprise grade channeling medium with features to fulfill potential special deployment requirements. Building upon the techniques proposed in RFC 5984, using extrasensory perception to achieve "infinite bandwidth" in IP networks, Clairvoyant can achieve negative latency as well as negative transmission time difference with appropriate parameters, unachievable by traditional methods[6, 3]. The algorithm uses claircognizance to redirect traffic to one of the unused or even non existent nodes. Clairaudience allows setting up the connection priority order, however early experiments suggest that using 0x8 spherical surfaces also achieve the same level of performance when compared using ROC/AUC.
Although irreproducible in most non-REM environments, the researchers see the potential of using this load balancing algorithm in most high performing service oriented architectures allowing the packet forwarding that will provide unsurpassed end user performance regardless of link capacity, distance, and number of hops. Detailed algorithm and findings will be published in The Journal of Irreproducible Results by 4/1/2014.

References

[1] Jonathan Anderson, Frank Stajano. Psychic Routing: Upper Bounds on Routing in Private DTNs. , 2011.

[2] R Stuart Geiger, Yoon Jung Jeong, Emily Manders. Black-boxing the user: internet protocol over xylophone players (IPoXP). Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Extended Abstracts:71—80, 2012.

[3] David R Karger, Matthias Ruhl. Simple efficient load balancing algorithms for peer-to-peer systems. Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures:36—43, 2004.

[4] KM Moller. Increasing Throughput in IP Networks with ESP-Based Forwarding: ESPBasedForwarding. , 2011.

[5] C Pignataro, G Salgueiro, J Clarke. Service Undiscovery Using Hide-and-Go-Seek for the Domain Pseudonym System (DPS). , 2012.

[6] Sandeep Sharma, Sarabjit Singh, Meenakshi Sharma. Performance analysis of load balancing algorithms. World Academy of Science, Engineering and Technology, 38:269—272, 2008.

[7] Emily Wagner, Yoon Jeong, R Stuart Geiger. IPoXP: Internet Protocol over Xylophone Players.

 

Share

LA Machine Learning event on Mining Time Series Data w/ Sylvia Halasz

Last night's LA Machine Learning event on Mining Time Series Data w/ Sylvia Halasz of YP at OpenX Pasadena was quite interesting and well attended. Dr. Halasz spoke about Adaptive Ensemble Kalman Filter and her work on building n-gram correlation with the flu outbreaks. Some of the associated papers follow.

 

IMG00312-20130306-1935

Share

Causality, Probability, and Time - A Temporo-Philosophical Primer to Causal Inference with Case Studies

Causality, Probability and Time by Dr. Samantha Kelinberg is a whirlwind yet original journey of the interdisciplinary study of probabilistic temporal logic and causal inference. Probabilistic causation is a fairly demanding area of study which studies the relationship between cause and effect using the tools of probability theory. Judea Pearl, in his seminal text "Causality: Models, Reasoning, and Inference" refers to this quandary by stating that

(causality) connotes lawlike necessity, whereas probabilities connote exceptionality, doubt, and lack of regularity.

 

Dr. Kelinberg's work provides a balanced introduction to background work on this topic while breaking new grounds on a well-positioned approach of causality based on temporal logic. The envisioning problem is the problem of deducing the set of facts, possibly as the result of our actions leading to the decision problem. This is compounded with finding a timely and useful way to represent our knowledge about time, change, and chance.

CPT_cover

In this ~260 page book, Dr. Kelinberg begins with a brief history of causality leading to Probability, logic and probabilistic temporal logic. The author then defines causality from various different facets, proceeding to causality inference, token causality and then finally the case studies. With practical examples and algorithms, author devises simple mathematical tools for analyzing the relationships between causal connections, inference, causal significance, model complexity, statistical associations, actions and observations.

Exploiting the temporal nature of probabilistic events, Dr. Kelinberg's research is a thought provoking and valuable addition to the scientific community interested in learning causal effects and inference with respect to time. Built upon the works of the likes of Heckerman, Breese, Santos and Young, this book will pave the way probabilistic reasoning researchers think about temporal effects on causality for years to come.

David Hume believed that the causes are invariably followed by their effects: "We may define a cause to be an object, followed by another, and where all the objects similar to the first, are followed by objects similar to the second." So, would you like a well written margin-annotation-laden text which provides formal and practical case study based approach to this somewhat abstract concept of causality? Then look no further!

Share

Selected Papers on Interestingness Measures, Knowledge Discovery and Outlier Mining

  • S.   Hettich    and   S.  D.   Bay.   Kdd   cup   1999  data.       UCI   KDD   Archive
    [http://kdd.ics.uci.edu/      /databases/kddcup99/kddcup99.html],         1999.
  • E.  Suzuki.     Lecture  Notes   in  Computer   Science,   volume  5579/2009,   chapter  Compression-Based    Measures  for Mining  Interesting   Rules,  pages  741-746. Springer  Berlin  /  Heidelberg,  2009.

 

 

Share

Hilary Mason - Machine Learning for Hackers

An interesting beginners talk for machine learning enthusiasts.

Ever tried to use a regular expression to parse an unstructured street address? This talk is an introduction to a few machine learning algorithms and some tips for integrating them where they make the most sense and will save you the most headaches.

Hilary Mason - Machine Learning for Hackers from BACON: things developers love on Vimeo.

Share

On Bayesian Sensitivity Analysis in Digital Forensics

The idea of using of Bayesian Belief Networks in digital forensics to quantify the evidence has been around for a while now. To provide qualitative approaches to Bayesian evidential reasoning in the digital Meta-Forensics is however relatively new in the decision support systems research. For law enforcement, decision support and application of data mining techniques to “soft” forensic evidence is a large area in Bayesian forensic statistics which has depicted how Bayesian Networks can be used to infer the probability of defense and prosecution statements based on forensic evidence. Kevin B. Korb and Ann E. Nicholson's study on Sally Clark is Wrongly Convicted of Murdering Her Children and Linguistic Bayesian Networks for reasoning with subjective probabilities in forensic statistics gives an insight into an important development which helps to quantify the meaning of forensic expert testimony for "strong support".

The IEEE paper on Sensitivity Analysis of a Bayesian Network for Reasoning about Digital Forensic Evidence published in 3rd International Conference on Human-Centric Computing (HumanCom), 2010 is of particular interest since it has a comprehensive real-world list of evidence items and hypothesis.

Bayesian network representing an actual prosecuted case of illegal file sharing over a peer-to-peer network has been subjected to a systematic and rigorous sensitivity analysis. Our results demonstrate that such networks are usefully insensitive both to the occurrence of missing evidential traces and to the choice of conditionalevidential probabilities

one of the co-authors Dr. Overill has also covered grounds for A Complexity Based Forensic Analysis of the Trojan Horse Defence.

The evidence nodes are follows.

  • Modification time of the destination file equals that of the source file
  • Creation time of the destination file is after its own modification time
  • Hash value of the destination file matches that of the source file
  • BitTorrent client software is installed on the seized computer
  • File link for the shared file is created
  • Shared file exists on the hard disk
  • Torrent file creation record is found
  • Torrent file exists on the hard disk
  • Peer connection information is found
  • Tracker server login record is found
  • Torrent file activation time is corroborated by its MAC time and link file
  • Internet history record about the publishing website is found
  • Internet connection is available
  • Cookie of the publishing website is found
  • URL of the publishing website is stored in the web browser
  • Web browser software is available
  • Internet cache record about the publishing of the torrent file is found
  • Internet history record about the tracker server connection is found
  • The seized computer was used as the initial seeder to share the pirated file on a BitTorrent network

while the following hypothesis stand.

  • The pirated file was copied from the seized optical disk to the seized computer
  • A torrent file was created from the copied file
  • The torrent file was sent to newsgroups for publishing
  • The torrent file was activated, which caused the seized computer to connect to the tracker server
  • The connection between the seized computer and the tracker server was maintained

 

The authors conclude, exonerating the sparse evidence such that

The sensitivity analysis reported in this paper demonstrates that the BT BBN used in is insensitive to the occurrence of missing evidence and also to the choice of evidential likelihoods to an unexpected degree.

 

 

Our overall finding is gratifying because it implies that the exact choice of values for the inherently subjective evidential likelihoods is not as critical as might have been expected. Values falling within the consensus of experienced expert investigators are sufficiently reliable to be used in the BBN model. Furthermore, our results imply that the inability to recover one or more evidential traces in a digital forensic investigation is not generally critical for the probability of the investigatory hypothesis under consideration.

 

For some reason, this reminded me of a recent read SuperFreakonomics where authors devise a terrorist-algorithm with the following black-box variable.

“What finally made it work was one last metric that dramatically sharpened the aalgorithm.  In the interest of national security, was have been asked to not disclose the particulars; we’ll call it Variable X.

What makes Variable X so special?  

 

 

For one, it is a behavioral metric, not a demographic one.  The dream of anti-terrorist authorities everywhere is to somehow become a fly on the wall in a room full of terrorists.  In one small important way, Variable X accomplishes that.  Unlike most other metrics in the algorithm, which produce a yes or no answer, Variable X measures the intensity of a particular banking activity.  While not unusual in low intensities among the general population, this behavior occurs in high intensities much more frequently among those who have other terrorist markers.

 

 

 

 

This ultimately gave the algorithm great predictive power.  Starting with a database of millions of bank customers, Horsley was able to generate a list of about 30 highly suspicious individuals. According to his rather conservative estimate,  at least 5 of those 30 are almost certainly involved in actitvities.  Five out of 30 isn’t perfect—the algorithm misses many terrorists and still falsley identifies some innocents—but it sure beats 495 out of 500,495.”

Bayesian Belief Networks can definitely serve as a better probabilistic graphical model to achieve a improved visibility and prior/posterior probabilities for such network related algorithm.

 


Share

pgm.HelloWorld() with Wainwright & Jordan

I have recently came across Wainwright & Jordan's paper on exponential families, graphical models, and variational inference and found it to be quite comprehensive and unifying introduction of the topic. Probabilistic graphical models use a graph-based representation as the basis for compactly encoding a complex distribution over a high-dimensional space. If you are familiar with Koller and Friedman's work on Probabilistic Modeling, Wainwright and Jordan's paper would provide a less mathenamtically terse and more unifying view of the area.

Graphical Models, Exponential Families, and Variational Inference

As compared to Pearl's work on Causality, this paper provides a contemporary look at Message-passing Algorithms for Approximate Inference, Connection to Max-Product Message-Passing and detailed insight into Moment Matrices, Semidefinite Constraints, and Conic Programming Relaxation. Due to it's clarity and detailed explanation, the background material on Graphs, hypergraphs, exponential families and duality is definitely worth reading even if you don't need a refresher.

In Lieu of Pearl's polytree approach, Wainwright & Jordan's work discusses Graphical Models as Exponential Families before delving into Computational Challenges with High-Dimensional Models. Later chapters deal with Sum-Product, Bethe–Kikuchi, and Expectation-Propagation, Mean Field Methods, Variational Methods in Parameter Estimation, Convex Relaxations and Upper Bounds, Integer Programming, Max-product, and Linear Programming Relaxations concluding with Moment Matrices, Semidefinite Constraints, and Conic Programming Relaxation. For a computer scientist, it is always interesting to observe the statistical perspective of machine learning. This contemporary insight into Graphical Models, Exponential Families, and Variational Inference was published in Foundations and Trends in Machine Learning which is definitely built upon researchers' earlier work on Variational inference for Dirichlet process mixtures and Variational inference in graphical models: The view from the marginal polytope.

As an appetizer, I would also recommend Bishop's chapter on Graphical Model.

Share

Continued adventures in SamIam - Back to Basics with Code Bandit

As discussed previous post on Customizing Conditional Probability using Code Generation with SamIam, I have touched upon importance of having programmatic and declarative control over the network. Working with SamIam (and with Infer.NET to some extent) gives a researcher provides this flexibility which is hard to find in proprietary tools.

Here is a simple example of a typical text book belief network. Once graphically drawn, SamIam's code bandit allow you to extract the model out as a class.

This class hard codes the network ...code\samiam30_windows_amd64\samiam\BeliefNet.net where one can operate on the object BayesianNetwork and can modify the nodes by population from a different data source rather than hard coding. Once a simple, readable structure model is available in raw code, there are lots of possibilities for data population. To build, ensure that inflib.jar occurs in the command line classpath, e.g. javac -classpath inflib.jar ModelTutorial.java

 

public BayesianNetwork createBayesianNetwork()
  {
    /* Create a domain of size 5. */
    Domain domain = new Domain(5);

    /* Add a discrete variable called "H" to the domain,
       with states "True", "False". */
    String     name0 = "H";
    String[] values0 = new String[]{ "True", "False" };
    int          id0 = domain.addDim( name0, values0 );

    /* Add a discrete variable called "B" to the domain,
       with states "True", "False". */
    String     name1 = "B";
    String[] values1 = new String[]{ "True", "False" };
    int          id1 = domain.addDim( name1, values1 );

    /* Add a discrete variable called "L" to the domain,
       with states "True", "False". */
    String     name2 = "L";
    String[] values2 = new String[]{ "True", "False" };
    int          id2 = domain.addDim( name2, values2 );

    /* Add a discrete variable called "C" to the domain,
       with states "True", "False". */
    String     name3 = "C";
    String[] values3 = new String[]{ "True", "False" };
    int          id3 = domain.addDim( name3, values3 );

    /* Add a discrete variable called "F" to the domain,
       with states "True", "False". */
    String     name4 = "F";
    String[] values4 = new String[]{ "True", "False" };
    int          id4 = domain.addDim( name4, values4 );

    /* For the cpts, create arrays of double-precision floating point values. */
    //H     Value
    //True  0.2
    //False 0.8
    double[] cpt0 = new double[]{ 0.2, 0.8 };
    //B     H     Value
    //True  True  0.25
    //True  False 0.05
    //False True  0.75
    //False False 0.95
    double[] cpt1 = new double[]{ 0.25, 0.05, 0.75, 0.95 };
    //L     H     Value
    //True  True  0.03
    //True  False 5.0E-4
    //False True  0.97
    //False False 0.9995
    double[] cpt2 = new double[]{ 0.03, 5.0E-4, 0.97, 0.9995 };
    //C     L     Value
    //True  True  0.6
    //True  False 0.02
    //False True  0.4
    //False False 0.98
    double[] cpt3 = new double[]{ 0.6, 0.02, 0.4, 0.98 };
    //F     L     B     Value
    //True  True  True  0.75
    //True  True  False 0.1
    //True  False True  0.5
    //True  False False 0.05
    //False True  True  0.25
    //False True  False 0.9
    //False False True  0.5
    //False False False 0.95
    double[] cpt4 = new double[]{ 0.75, 0.1, 0.5, 0.05, 0.25, 0.9, 0.5, 0.95 };

Later on, SamIam creates the table using the CPT's and eventually build the network using these tables.

/*
Create a IL2 Table for each cpt.
The parameters to the Table constructor are:
(1) the domain,
(2) the variable ids that name the dimensions of the table (in the form of an IntSet),
(3) the cpt data.
*/
Table table0 = new Table( domain, new IntSet( new int[]{ id0 } ), cpt0 );
Table table1 = new Table( domain, new IntSet( new int[]{ id0, id1 } ), cpt1 );
Table table2 = new Table( domain, new IntSet( new int[]{ id0, id2 } ), cpt2 );
Table table3 = new Table( domain, new IntSet( new int[]{ id2, id3 } ), cpt3 );
Table table4 = new Table( domain, new IntSet( new int[]{ id1, id2, id4 } ), cpt4 );

/* Create an array of all the Tables. */
Table[] tables = new Table[]{ table0, table1, table2, table3, table4 };

/*
The simple BayesianNetwork constructor takes only one argument:
an array of Tables.
*/
BayesianNetwork model = new BayesianNetwork( tables );

Upon building, you get the following console output.

Happy inferring!

Share

Tag Cloud for Belief Network Sensitivity as Background Knowledge

While we are having fun visualizing with tag clouds, here is one on the following four key papers in the area of pattern mining with Bayesian networks as background knowledge and discovery of interesting patterns based on Bayesian network background knowledge.

 

 

 

Share

Selected Papers in Machine Learning

Share
Go to Top