On Multi-criteria evaluation of interesting dependencies

Discovering meaningful, explainable and interesting rules from mined data has been a challenging research problem due to its highly subjective nature. It is deemed as the central task in knowledge data and discovery to explore and analyze relationship between features. For most practical applications, it is quite difficult to determine the most relevant rule for the underlying dataset mainly due to large amount of discovered rules. This IEEE Congress on Evolutionary Computation published paper by Dominique Francisci and Martine Collard of University of Nice-Sophia Antipolis argues experimentally that overall model quality has to be measured according to several criteria of quality such as accuracy, interestingness or domain-dependent criteria.  There are numerous interestingness measures for classification rules including support and confidence being the most popular and widely used ones. The Support is defined as p(A ∩ B) and confidence represents the conditional probability p(B/A). There are more specific ones such as the Lift or the Sebag and Schoenauer’s measure. A detailed analysis of these interestingness measures can be found in a Cambridge survey paper A Survey of Interestingness Measures for Knowledge Discovery. As per authors, the questions addressed here are

how to combine multiple criteria simultaneously ? Is there equivalence among measures? What is the best solution for finding the best rules according to multi-criteria?

Citing classical works such as R. Agrawal and A. Srikant on “Fast algorithms for mining association rules”, by G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules as well as relatively newer approaches by D. Chou V. Dhar and F. Provost on Discovering interesting patterns for investment decision making with Glower, researchers firmly argue their early version of ensemble technique; stating that the opportunity of varying and combining different quality criteria is an essential advantage in a data mining process, they propose a evolutionary multi-objective method to address this problem.

By doing comparative analysis of Sensitivity against Specificity, Support against Conviction and Sebagfactor against Rule Interest, the researchers show rules obtained by the multi-objective GA explaining the classification. The researchers observe results for the couple (Sensitivity; Specificity) on the Vote data set. This UCI voting data set contains 435 descriptions of politician candidates according to 17 categorical attributes. These data were used for classification. The class attribute politic class takes value democrat or republican.

Using sensitivity and Specificity measures, the following rule with (Se = 0:87; Sp = 0:96) can be explained as

IF (physician-fee-freeze=y) AND (el-salvador-aid=y) AND (duty-free-exports=n) THEN

In essence, this work by Dominique Francisci and Martine Collard is an early effort in trying to answer the key objective of data mining, i.e. it is essential to be able to select useful information and that the innate relationship between the interestingness measures is of key importance. The researcher's work can be concluded in their own words as follows.

This approach emphasizes the idea that interestingness cannot be defined in an absolute way. It allows to easily parameterize the rule search process and to make varying the parameter values according to different and multiple goals.