Close

Finding Interesting Outliers - A Belief Network based Approach @ IEEE SoutheastCon 2015

Presented in the IEEE SoutheastCon 2015

IEEE-SoutheastCon

FindingInterestingOutliers

 

Finding Interesting Outliers - A Belief Network based Approach

 

Abstract: Outliers are deviations from the usual trends of data; to discover interestingness among outliers i.e. finding anomalies which are of real-interest for subject matter experts is an active area of research in data mining and maching learning community. Due to its subjective nature, the definition of what amounts to ’interesting’ varies between domains and subject matter experts. In this research, we explore the quantification for measures of interestingness, using Bayesian Belief Networks as background knowledge. Mining outliers may help discover potential anomalies and fraudulent activities. Meaningful outliers can be retrieved and analyzed by using domain knowledge. Domain knowledge (or background knowledge) is represented using probabilistic graphical models such as Bayesian belief networks. Bayesian networks are graph-based representation used to model and encode mutual relationships between entities. Due to their probabilistic graphical nature, Belief Networks are an ideal way to capture the sensitivity, causal inference, uncertainty and background knowledge in real world data sets. Bayesian Networks effectively present the causal relationships between different entities (nodes) using conditional probability. This probabilistic relationship shows the degree of belief between entities. A quantitative measure which computes changes in this degree of belief acts as a sensitivity measure. In this research paper we provide an overview of interestingness measures, their use to measure sensitivity in belief networks and review the earlier work on so-called Interestingness Filtering Engine. Building upon these foundation, we introduce our algorithm IBOX - Interestingness based Bayesian Outlier eXplainer, which provides progressive improvement in the performance and sensitivity scoring of the earlier works. IBOX provides an iterative model to use multiple interestingness measures resulting in better performance and improved sensitivity analysis. The approach quantitatively validates probabilistic interestingness measures as an effective sensitivity analysis technique in rare class mining.

Topic Category: Data Mining and Machine Learning

 

Share