A Review of the Objective Interestingness Measures

 

As was noted before, the objective measures of interestingness depend only on the structure of the analyzed pattern and of the underlying data used in the discovery process.

We will study these measures in the specific context of the association rules representation.

Notations: Having the implication, Let:

 represents the total number of items,

 the number of items that sustain the premise,

 the number of items that sustain the conclusion,

 the number of items that sustain both the premise and the conclusion (the number of examples of the rule), and

 the number of items that sustain the premise but contradict the conclusion (the number of contra-examples of the rule).

Where ,,, represent the probabilities of the premise, conclusion, examples and respective contra-examples of the rule computed as , ,  and respective .

Associaitions Rules1

Fig 2.1 Notations Used

 

All the measures that we are going to study are functions of  classified as having a probabilistic foundation [Hilderman 1999]. i.e. these measures compute various relations between the , , and .

 

The most two known objective interestingness measures are the Agrawal and Srikant’s Itemset Measures [Agrawal 1993]  – namely support and confidence. These measures are used by all the KDD systems that discover associations. The itemset measures are used to identify frequently occurring association rules in large databases.

 

1. Support

 

The association  has support s in transaction set , if s% of the transactions in  contain . Support corresponds to statistical significance. Those rules which exceed a predetermined minimum threshold for support and confidence are considered to be interesting. Support is defined as :

.

The support of the implication expresses how often  and are True together. From the marketing perspective, support of an itemset in retail sales data justifies the feasibility of promoting the items together. Support measure is also good for pruning the search space since it possesses a nice downward closure property [Tan 2000, 2002].

Beyond that, it may not serve as a reliable interestingness measure. For example, rules with high support quite often correspond to obvious knowledge about the domain and they are not interesting to a data analyst simply because they not reveal any surprising information, despite gathering sufficiently high support.

           

2. Confidence

 

            The association rule  holds in a transaction set  with confidence c, if c% of the transactions in  that contain  also contain .  From this definition we can see that confidence corresponds to the strength of a rule. Unfortunately, confidence can be misleading in many practical situations as showed by [Brin 1997]. The confidence is defined as:

            .

Critiques of this definition show that confidence is invariable when the size of increases or  varies and that is insensitive to cardinal dilatation. Nevertheless,  is more likely to happen when the size of increases or when the size of  decreases and further more this implication will be more meaningful when the size of all the sets grows in the same proportions.

 

Support and Confidence Extensions

 

We are going now, to make a brief presentation of the extensions brought to the support and confidence measures [Kodratoff 1999].

One important observation that stayed at the base of all these extensions is the fact that classical measures might over-estimate a dependency by not explicitly taking into account what disconfirms it, or they can as well under-estimate a dependency by not using explicitly the contraposition.

In the following section, the implication will always refer , the contraposition , and the disconfirmation .

 

3. Laplace

 

            Is an extension of the confidence measure in order to avoid the selection of highly specific rules. A significance test is applied which ensures that the distribution of examples among classes covered by the rule is significantly different from that which would occur by chance.

            However, while a significance test eliminates rules which are below a certain treshold of significance, there is still the problem of rules which just pass the significance test and that will tend to be preffered over more general and reliable but less apparently accurate rules. An approximate measure does exist to measure the expected accuracy directly, namely the  Laplace expected error estimate. This expected accuracy is given by the formula:

            ;   for  ,

This formula is a special case of the m-probability-estimate developed by Cestnik (1990):

            ,

where uniform prior probabilities  for classes are assumed (i.e. ) and the tunable parameter  is set to .

 

4. Causal Support

 

This measure is not usually presented in the literature. It is an improvement to the (descriptive) support measure because it takes in account the size of  and by adding the confirmation of the contraposition.  Causal support measure is defined by:

.

This measure assigns a zero (0) value in the case of uncorrelated variable, and a value of one (1) for maximum correlation.

 

5. Descriptive Confirm

 

It is appropriate for descriptive implications. It takes into account the confirmation and the disconfirmation of the implication. Descriptive confirm is defined by:

.

Notice that , thus confirmation and disconfirmation are not independent.

 

6. Causal Confirm

           

            It is appropriate for causal implications by taking into account the confirmation, the disconfirmation and the contraposition of the implication. It is defined by:

            .

 

7. Descriptive Confirmed-Confidence

 

            It measures the strength of the dependency by taking into account both the confirmation and the disconfirmation of the implication.

.

It is an asymmetric measure taking values between –1 (for no correlation) and +1 (for maximum correlation). 

 

8. Causal Confidence

           

            Extends the confidence, taking in account the size of  and by adding the confirmation of the contraposition. It is defined by:

            .

 

9.  Causal Confirmed Confidence

 

Extends the causal confidence by taking into account the disconfirmation of the rule. .

 

 

Some other Widely Used Interestingness Measures

 

10. Piatetsky-Shapiro’s Rule Interest Measure

 

            Piatetsky-Shapiro’s Rule function was introduced as a simple function that satisfies all of the three fundamental principles of the objective measures.

            This function represent the differences between the actual number of data tuples that occur together  and the number expected if were independent of .

           

The range of this function is between –0.25 and 0.25. Statistical independence occurs at 0.

 

11. Interest Factor

           

Lift or Interest factor is defined to be the ratio between the joint probability of two variables with respect to their expected probabilities under the independence assumption.

            .

A quite often used variant of this definitions is “Taux de liaision” that is defined in order to respect the 1st principle of Piatetsky Shapiro, that demand a zero (0) returned value in the case of variable statistical independency. “Taux de liaision” is defined as:

Interest Measure take values between zero (0) for negative correlation and infinity () for perfect correlation, returning one (1) in the case of variables independency.

 

12. Cosine IS Measure

 

CosineIS is an interestingness measure derivate from statistical correlation, in the region of low support and high interest values [Tan 2000].

.

IS has many desirable properties despite violating the first principle of the objective measures. First of all, it contains a product of two important quantities, interest factor and support. In other words, this measure takes into account both interestingness and significance of a pattern. Second CosineIS Measure is equivalent to the geometric mean of confidence of rules that can be generated from the itempair i.e. . Another interpretation of this measure is as the cosine angle between two random vectors.

           

13. Conviction

 

            Conviction was introduced as an asymmetric version of the interest factor in order to test the independence between and and it is defined by:

            .

Conviction is different from confidence because it does not suffer from the same problem of producing misleading rules. Unlike Interest factor, Conviction will assign the value  if the confidence of the rule is 1 (regardless of what is). This measure it is strongly affected by the direction of the arrow, so it is fitted to rank implication rules [Tan 2000].

           

14. Pavillon Measure          

 

            This measure, also known as dependency [Kodratoff] or Added Value [Tan 2002] is based on the observation that the probability of meeting  given  should increase when . It is used to measure the strength of dependency between variables.

       

 

15. Putative causal dependency

 

It is an extension of the dependency measure [Kodratoff 1999] by demanding that the probability of meeting B given A should also decrease when  is confirmed (when the disconfirmation is true). Putative causal dependency is defined by:

The classical definition of dependency (Pavillon) expresses the fact that when  then B should occur more often when A is true than when it is not. The complete meaning of the extension of this definition is then the following:

When  holds, for the direct theorem, then B should occur more often when  holds (this asks for a high positive), and  should occur less often when  holds, (this asks for a zero or even a highly negative for the contraposed theorem), then should occur more often when  holds, (this asks for a high positive  ), and  should occur less often when holds, and this asks for a for a zero or even a highly negative .

The main differences between the extended and the classical definition of Pavillon is that it might happen that relationships highly confirmed by their contraposition but that occur on very few cases be forgotten by the classical measure. It may also happen that relationships that are both largely confirmed and largely disconfirmed (because they are actually due to some noise) be accepted by the classical dependency (Pavillon) measure while they will be rejected by the extended one.

 

Interestingness Measures Based on Statistical Models

 

 

16. Correlation Coefficient

 

Correlation Coefficient measures the degree of linearity between a pair of random variables. Theoretically it is defined as the covariance between two variables, divided by their standard deviations.

For binary variables the correlation coefficient between  and can be written as:

.

The correlation coefficient is closely related to the  statistic: . [Tan2000]

Its values resides between –1 and +1 for a negative, respective positive strong dependency. If the two variables are independent[1], then correlation coefficient is 0.

 

17. Intensity of implication 

           

            Intensity of implication measures the statistical surprise of having so few negative examples on a rule as compared with a random draw. [Gras 1996]

Let  be the set of all available observations. Let  be the subset of  such that the assertion  is true. Let  be the subset of  such that the assertion  is true. Then  supports the assertion  and  contradicts this assertion. Consider now two sets of the same cardinality as  and , ’ and ’, that are randomly chosen in .

Let  be the random variable measuring the number of random negative examples, and  the number of negative examples observed on the rule.

Then the relation  can be said to be more confirmed than by random choice if  is much smaller than . The intensity of implication measures the difference between these two quantities: the bigger the difference, the bigger the intensity of the implication.

           

 

a) HyperGeometric Distribution

 

The random variable  is approximated using a hypergeometric law, thus intensity of implication is defined as:

           

b) Poisson Distribution

 

This is the asymmetric extension of the intensity of implication. The random variable  is approximated using a Poisson distribution law, thus intensity of implication is defined as:

where  is given by a computation on the observed values,

 

One of the advantages of the Intensity of implication is that the measure is still selective when ranking the logical rules .

Values of the intensity of implication vary between 0 and 1. A value of 0.5 indicating that the strength of the rule is exactly as it would be under the assumption of statistical independence.

 

18. Intensity of Implication – Entropic Version

 

The implication intensity tends to be very close to 1 when the cardinals involved are large (when n>600).

In order to resolve this problem [GRAS 2000] proposed to consider in the computing of implication intensity also: the difference between  and  in favour of  and the difference between  and  in favour of  (this means that the contraposition  is used to sustain the implication ).

In order to measure these differences the conditional entropy is used:

   

   

 

The inclusion factor is defined then as:

where  is a factor used to give more  importance to the contrast between the low and the upper values of entropy.

Finaly, the entropic intensity of implication of the rule  is computed as follows:

 

 

Measures Derived from the Information Theory

 

 

19. Smyth and Goodmans’s J-Measure

 

The J measure is the average information content of a probabilistic classification rule and is used to find the best rules relating discrete-valued attributes. [Hilderman 1999]

 A probabilistic classiffication rule is a logical implication  with some probability p where the left- and right-hand sides correspond to a single attribute. The right-hand side is restricted to simple single-valued assignment expressions, while the left-hand side may be a conjunction of these simple expressions. The J Measure is given by:

where , and  are the probabilities of occurrence of , , and  respectively, and the terms inside the square brackets is the relative (or cross) entropy. Relative entropy is the similarity (or goodness of fit) of two probability distributions.

J Measure is a symmetric function taking on values between 0 and , that, returns a zero (0) value in the case of variable independency.

High values for J Measure are desirable, but are not necessarily associated with the best rules. For example, rare conditions may be associated with the highest values for J Measure (i.e. where a particular B is highly unlikely), but the resulting rule is insufficiently general to provide any new information. Consequently analysis may be required in which the accuracy of a rule is traded for some level of generality or goodness-of-fit.

 

20. Smyth and Goodmans’s J-Measure Variant

           

            J Measure has a high value (indicating that the underlying rule is a high quality one) when there is a large difference between our a posteriori belief about predicted class B and our a priori belief about it. The problem is that this difference is measured in a symmetric way, i.e., the J Measure will have a high value if either our a posteriori belief about the predicted class is much larger than our a priori belief about B or vice-versa. In the former case the rule has a high predictive accuracy. However, in the later case, the rule has a low predictive accuracy. To avoid the discovery of this kind of rule [Dieferson] used an asymmetrical variant of the J Measure:

 

 

 

 

 

Other Interestingness Measures

 

 

21. Gini Index

 

It is one of the most common measures of income or wealth distributions. The Gini index incorporates the more detailed shares data into a single statistic which summarizes the dispersion of the income shares across the whole income distribution [Riese 1997]. The Gini coefficient may be expressed as a proportion or as a percentage.  For association rules, the Gini Index values lies in the range of 0 (when  and are completely independent) to 0.5 (for perfect correlation) [Tan 2000]. This measure treats both negative and positive correlations rules in the same way and is defined by the following formula:

 

22. Collective Strength

 

            The collective strength of an itemset is defined to be a number between 0 to . A value of 0 indicates perfect negative correlation, while a value ofindicates perfectly positive correlation. A value of 1 indicates that the strength of the rule is exactly as it would be under the assumption of statistical independence. The Collective Strength Measure is defined as:       

.

 

23. Jaccard

 

Jaccard Measure is largely used in the text mining techniques to find the similarities between words. Jaccard similarity is computed by making a comparison between the number of elements shared by two objects, and the total number of elements in their union. The similarity values range from 0.0 to 1.0 with 1.0 indicating a strong dependency. Jaccard Measure for association rules is defined as:

.

 

24. Odds Ratio

 

            The odds ratio can be interpreted as a measure of the magnitude of association between two raters. Generally, it shows the relative increase in the odds of one rater making a given rating, given that the other rater made the same rating-the value is invariant regardless of whether one is concerned with a positive or negative rating, or which rater is the reference and which the comparison. The Odds Ratio in the terms of association rules is defined by:

The odds ratio takes values between zero (0) and infinity. One (1) is the neutral value and means that no association exist between  and ; close to zero or infinity means a strong association.

 

25. Yule’s Q

 

Yule's Q is based on the odds ratio and is a symmetric measure taking on values between -1 and +1. 1 (one) implies perfect negative or positive association, 0 (zero) no association. In two by two tables Yule's Q is equal to Goodman and Kruskal's Gamma. Yule’s Q is defines as:

 

26. Yule’s Y

 

Yule's Y, also called Yule's coefficient of colligation, is a variant on Yule's Q, using the geometric mean of diagonal and off-diagonal pairs rather than the number of pairs, as does Q. It is given by the following formula:

Yule’s Y is based on the odds ratio and a symmetric measure taking on values between -1 and +1. 1 (one) implies perfect negative or positive association, 0 (zero) no association. The measure tends to estimate associations more conservatively than Yule's Q. The measure has little substantive or theoretical meaning.

27. Kappa (Cohen's)

 

Cohen's Kappa is used to measure how well two raters agree, taking into account the agreement which could have occurred just by chance. Values of Kappa vary between +1 (perfect agreement) and -1 (complete disagreement). The value of zero indicates that the observed agreement could have occurred by chance. Kappa is defined as:

.

 

28. Kloesgen

 

Kloesgen is an asymmetrical measure that takes into account both interestingness and significance of a pattern.  Kloesgen is defined as the product between the square root of the support and the dependency measure:

.

 

29. Loevinger

 

Loevinger’s H is based on the ratio of observed Guttman errors to total errors expected under the null assumption that items are totally unrelated. Let  = the probability of a Guttman error and let equal the same probability under the null model of totally unrelated items.  Than, Loevinger’s H is defined as:

The Loevinger’s measure in the terms of association rules is defined by:

The range of the values of the Loevinger’s measure is between  and 1. When there are no contra-examples Loevinger’s measure returns 1(maximal correlation). When the variables are independent (the null model), Loevinger is 0.

 

 

Interestingness Measures Based on the Ratio of Examples and Negative-Examples

 

 

30. Sebag & Schoenauer

 

            Sebag & Schoenauer is an interestingness measure used to quantify the quality of an implication by taking in account the rule disconfirmation. Sebag & Schoenauer is defined as:

            .

Sebag & Schoenauer values varies from 0 to , returningin the case of the logical implications and 0 in the case when no examples exist.

 

31. Example and Contra-Example

           

            ExampleContra-Example is a measure that take in account both the implication and the disconfirmation of the rule. Its values are in the range of (for the worst ranked rules) and 1 (for the best ranked rules).

.


 

Objective measures of interestingness

 

 

No.

 

Interestingness Measure

 

Formula en terms of

N, P(A), P(B), P(A,B)

 

Formula en terms of

n, na, nb, nab

1

Support (s)

2

Confidence ( c )

3

Lift (= interest)

 

 

4

Loevinger

5

Piatetsky-Shapiro

Rule Interest (RI)

       

6

Piatesky-Shapiro Variant

7

Gini index (G)

8

- coefficient

9

Sebag_Schoenauer

10

Descriptive Confirmed Confidence

11

Descriptive Confirm

 

12

Causal Support

13

Causal Confidence

14

Causal Confirmed Confidence

15

J-measure (Goodman)

16

J-measure variant

17

Exemple & Contra exemple


18

Laplace

19

Pavillon (DEPENDENCY)

Added Value (AV)

20

Causal Confirm

21

Implication Intensity

Entropique Version

……………………………………………………………………………………………………

 

22

Odds ratio

=

23

Yule’s Q

Idem

24

Yule’s Y

idem

25

Kappa (K)

Cohen’s

26

Conviction (V)

 

27

Interest (I)

 

28

Cosine IS

39

Taux de liaison

 

30

Certainty factor (F)

   

31

Putative Causal Dependency

33

Collective Strength

34

Jaccard

35

Klosgen

36

Ralambrodrainy

.

37

Wang

 

38

Coefficient de corrélation

 

 

39

Interestingness

 

40

Centred confidence

 

41

Contribution orientée au

42

Coverage

 

P(A)

43

Dice

 

44

Ganascia

 

45

Goodman-Kruskal

46

Goodman and Kruskal’s Predictive association

 

47

Indice d’implication

 

48

Indice d’inclusion

 

49

Information gain

 

50

Implication Index

 

51

IPEE

 

52

Kodratoff’s Dependency

 

53

Kulczynski

 

54

Least contradiction

 

55

Leverage

 

56

Multiplicateur de côtes

57

Mutual Information

58

Opposé de l’indice d’implication

 

59

Piatetsky Shapiro Variant

60

Prevalence

 

61

Probabilistic discriminant index

 

62

Recall

63

Relative risk

 

64

Rogers et Tanimoto

 

65

Specificity

 

66

Vraisemblance du lien

 

67

Yao and Liu's One way Support

 

68

Yao and Liu's Two way Support

 

69

Yao and Liu's Two way Support Variation

 

70

Zhang

 

 
 

 

Some Principles used for evaluating the Interestingness measures

1.     Piatetsky-Shapiro 1st Principle: Qm=0 if (The comportment at the independency).

2.     Piatetsky-Shapiro 2nd Principle: Qm monotonically increases with  when others parameters are fixed.

3.     Piatetsky-Shapiro 3rd Principle: Qm monotonically decreases with or  when the others parameters are fixed.

4.     Major Mangano ’93: Qm monotonically increases with the rule coverage, given a fixed confidence factor greater than the baseline confidence factor.

5.     Freitas ’99: A good Qm for association rules must be asymmetrical.(If the Q measure is symmetric or not, if they give the same rank to the rule A®B and B®A )

6.     Null Invariance Principle [Tan 2002]: the Q measure monotonically decreases when the number of contra example increases .

7.     The Q measure sensitivity: The sensibility of the Q measure in the area of good rules. The number of rules that each Q measures returned as “good”.

10.  Foundation [Hilderman 00]: statistics, information theory, probabilistic, distance.

11.  If a statistical test exist for the specified measure (ex chi 2, correlation coefficient, implication intensity)

12.  The sensitivity to the dataset dimensions  (n is a constant for the specified dataset, it is useful only if we have to compare two associations rules ranking values that came from different datasets.

13.  The time/resurses(memory) of computing each Q measure.

14.   The Q measure should monotonically decrease with  when is fixed (the rule is more interesting if  is small)

 

Limit cases:

1. (nab==0) 

2. (na==nab) && (nb!=nab) && (n!=nb) 

3. (nb==nab) && (na!=nab) && (n!=na) 

4. (na==nab) && (nb==nab) && (n!=na) 

5. (na==n) && (nb!=n)

6. (nb==n) && (na!=n)

7. (na==n) && (nb==n)

 

 

 

 



[1] The expression “variable independency” it is actually referring at the case of variable statistical independency, i.e. when .