A
Review of the Objective Interestingness Measures
As was noted before, the objective measures of
interestingness depend only on the structure of the analyzed
pattern and of the underlying data used in the discovery process.
We will study these measures in the specific context
of the association rules representation.
Notations: Having the
implication
, Let:
represents the total
number of items,
the number of items
that sustain the premise,
the number of items
that sustain the conclusion,
the number of items
that sustain both the premise and the conclusion (the number of examples of the
rule), and
the number of items
that sustain the premise but contradict the conclusion (the number of
contra-examples of the rule).
Where
,
,
,
represent the probabilities of the premise, conclusion,
examples and respective contra-examples of the rule computed as
,
,
and respective
.

Fig 2.1 Notations Used
All the measures that we are going to study are
functions of
classified as having a
probabilistic foundation [Hilderman 1999]. i.e. these measures compute various
relations between the
,
, and
.
The most two known objective
interestingness measures are the Agrawal and Srikant’s Itemset Measures [Agrawal
1993] – namely support and
confidence. These measures are used by all the KDD systems that discover
associations. The itemset measures are used to identify frequently occurring
association rules in large databases.
1. Support
The association
has support s in
transaction set
, if s% of the transactions in
contain
. Support corresponds to statistical significance. Those
rules which exceed a predetermined minimum threshold for support and confidence
are considered to be interesting. Support is defined as :
.
The support of
the implication expresses how often
and
are True together. From the marketing perspective,
support of an itemset in retail sales data justifies the feasibility of
promoting the items together. Support measure is also good for pruning the
search space since it possesses a nice downward closure property [Tan 2000,
2002].
Beyond that,
it may not serve as a reliable interestingness measure. For example, rules with
high support quite often correspond to obvious knowledge about the domain and
they are not interesting to a data analyst simply because they not reveal any
surprising information, despite gathering sufficiently high support.
2.
Confidence
The association rule
holds in a transaction
set
with confidence c, if
c% of the transactions in
that contain
also contain
. From this definition
we can see that confidence corresponds to the strength of a rule.
Unfortunately, confidence can be misleading in many practical situations as
showed by [Brin 1997]. The confidence is defined as:
.
Critiques of this definition show that confidence is
invariable when the size of
increases or
varies and that is
insensitive to cardinal dilatation. Nevertheless,
is more likely to
happen when the size of
increases or when the size of
decreases and further
more this implication will be more meaningful when the size of all the sets
grows in the same proportions.
Support and Confidence
Extensions
We are going now, to make a
brief presentation of the extensions brought to the support and confidence
measures [Kodratoff 1999].
One important
observation that stayed at the base of all these extensions is the fact that
classical measures might over-estimate a dependency by not explicitly taking
into account what disconfirms it, or they can as well under-estimate a
dependency by not using explicitly the contraposition.
In the following section,
the implication will always refer
, the contraposition
, and the disconfirmation
.
3.
Is an extension of the confidence measure in order
to avoid the selection of highly specific rules. A significance test is applied
which ensures that the distribution of examples among classes covered by the
rule is significantly different from that which would occur by chance.
However, while a significance test eliminates rules which
are below a certain treshold of significance, there is still the problem of
rules which just pass the significance test and that will tend to be preffered
over more general and reliable but less apparently accurate rules. An
approximate measure does exist to measure the expected accuracy directly,
namely the
; for
, ![]()
This formula is a special
case of the m-probability-estimate developed by Cestnik (1990):
,
where uniform prior
probabilities
for classes are
assumed (i.e.
) and the tunable parameter
is set to
.
4. Causal Support
This measure is not usually presented
in the literature. It is an improvement to the (descriptive) support measure
because it takes in account the
size of
and
by adding the
confirmation of the contraposition.
Causal support measure is defined by:
.
This measure
assigns a zero (0) value in the case of uncorrelated variable, and a value of
one (1) for maximum correlation.
5. Descriptive
Confirm
It is
appropriate for descriptive implications. It takes into account the confirmation
and the disconfirmation of the implication. Descriptive confirm is defined by:
.
Notice
that
, thus confirmation and
disconfirmation are not independent.
6. Causal Confirm
It is appropriate for causal implications by taking
into account the confirmation, the disconfirmation and the contraposition of
the implication. It is defined by:
.
7. Descriptive
Confirmed-Confidence
It measures the strength of the
dependency by taking into account both the confirmation and the disconfirmation
of the implication.
.
It is an asymmetric
measure taking values between –1 (for no correlation) and +1 (for maximum
correlation).
8.
Causal Confidence
Extends the confidence, taking in account the size of
and
by adding the confirmation of the contraposition. It is defined by:
.
9. Causal Confirmed Confidence
Extends the causal
confidence by taking into account the disconfirmation of the rule.
.
Some other Widely Used
Interestingness Measures
10. Piatetsky-Shapiro’s Rule
Interest Measure
Piatetsky-Shapiro’s Rule function was introduced as
a simple function that satisfies all of the three fundamental principles of the
objective measures.
This function represent the differences between the
actual number of data tuples that occur together
and the number
expected if
were independent of
.
![]()
The
range of this function is between –0.25 and 0.25. Statistical independence
occurs at 0.
11.
Interest Factor
Lift or Interest factor is defined to be the ratio between
the joint probability of two variables with respect to their expected
probabilities under the independence assumption.
.
A
quite often used variant of this definitions is “Taux de liaision” that is
defined in order to respect the 1st principle of Piatetsky Shapiro,
that demand a zero (0) returned value in the case of variable statistical
independency. “Taux de liaision” is defined as:
![]()
Interest
Measure take values between zero (0) for negative correlation and infinity (
) for perfect correlation, returning one (1) in the case of
variables independency.
12.
Cosine IS Measure
CosineIS is an
interestingness measure derivate from statistical correlation, in the region of
low support and high interest values [Tan 2000].
.
IS has many desirable
properties despite violating the first principle of the objective measures.
First of all, it contains a product of two important quantities, interest
factor and support. In other words, this measure takes into account both
interestingness and significance of a pattern. Second CosineIS Measure is
equivalent to the geometric mean of confidence of rules that can be generated
from the itempair i.e.
. Another interpretation of this measure is as the cosine
angle between two random vectors.
13. Conviction
Conviction was introduced as an asymmetric version
of the interest factor in order to test the independence
between
and
and it is defined by:
.
Conviction
is different from confidence because it does not suffer from the same problem
of producing misleading rules. Unlike Interest factor, Conviction will assign
the value
if the confidence of
the rule is 1 (regardless of what
is). This measure it is strongly affected by the direction of
the arrow, so it is fitted to rank implication rules [Tan 2000].
14. Pavillon Measure
This measure, also known as dependency [Kodratoff] or
Added Value [Tan 2002] is based on the observation that the probability of meeting
given
should increase when
. It is used to measure the strength of dependency between variables.
15. Putative causal
dependency
It is an
extension of the dependency measure [Kodratoff 1999] by demanding that the
probability of meeting B given A should also decrease when ![]()
is confirmed (when the
disconfirmation is true). Putative causal dependency is defined by:
![]()
The classical
definition of dependency (Pavillon) expresses the fact that when
then B should occur
more often when A is true than when it is not. The complete meaning of the
extension of this definition is then the following:
When
holds, for the direct theorem, then B should
occur more often when
holds (this asks for a
high positive
), and
should occur less
often when
holds, (this asks for
a zero or even a highly negative
for the contraposed theorem), then
should occur more often when
holds, (this asks for
a high positive
), and
should occur less
often when
holds, and this asks for a for a zero or even a highly
negative
.
The main
differences between the extended and the classical definition of Pavillon is
that it might happen that relationships highly confirmed by their
contraposition but that occur on very few cases be forgotten by the classical
measure. It may also happen that relationships that are both largely confirmed
and largely disconfirmed (because they are actually due to some noise) be
accepted by the classical dependency (Pavillon) measure while they will be
rejected by the extended one.
16.
Correlation Coefficient
Correlation
Coefficient measures the degree of linearity between a pair of random variables.
Theoretically it is defined as the covariance between two variables, divided by
their standard deviations.
For
binary variables the correlation coefficient between
and
can be written as:
.
The
correlation coefficient is closely related to the
statistic:
. [Tan2000]
Its values resides between –1 and +1 for a negative, respective
positive strong dependency. If the two variables are independent[1],
then correlation coefficient is 0.
17. Intensity of implication
Intensity of implication measures
the statistical surprise of having so few negative examples on a rule as
compared with a random draw. [Gras 1996]
Let
be the set of all
available observations. Let
be the subset of
such that the
assertion
is true. Let
be the subset of
such that the
assertion
is true. Then
supports the assertion
and
contradicts this
assertion. Consider now two sets of the same cardinality as
and
,
’ and
’, that are randomly chosen in
.
Let
be the random variable
measuring the number of random negative examples, and
the number of negative
examples observed on the rule.
Then the relation
can be said to be more
confirmed than by random choice if
is much smaller than
. The intensity of implication measures the difference
between these two quantities: the bigger the difference, the bigger the
intensity of the implication.
![]()
a) HyperGeometric Distribution
The random variable
is approximated using
a hypergeometric law, thus intensity of implication is defined as:

b) Poisson Distribution
This is the asymmetric extension of the
intensity of implication. The random variable
is approximated using
a Poisson distribution law, thus intensity of implication is defined as:

where
is given by a
computation on the observed values,

One of the advantages of the Intensity of
implication is that the measure is still selective when ranking the logical
rules
.
Values of the intensity of implication vary
between 0 and 1. A value of 0.5 indicating that the strength of the rule is
exactly as it would be under the assumption of statistical independence.
18. Intensity of Implication – Entropic Version
The implication
intensity tends to be very close to 1 when the cardinals involved are large
(when n>600).
In
order to resolve this problem [GRAS 2000] proposed to consider in the computing
of implication intensity also: the difference between
and
in favour of
and the difference
between
and
in favour of
(this means that the
contraposition
is used to sustain the
implication
).
In
order to measure these differences the conditional entropy is used:
![]()
The
inclusion factor is defined then as:

where
is a factor used to
give more importance to the contrast
between the low and the upper values of entropy.
Finaly,
the entropic intensity of implication of the rule
is computed as follows:
![]()
19. Smyth and
Goodmans’s J-Measure
The J measure
is the average information content of a probabilistic classification rule and
is used to find the best rules relating discrete-valued attributes. [Hilderman 1999]
A probabilistic classiffication rule is a
logical implication
with some probability
p where the left- and right-hand sides correspond to a single attribute. The
right-hand side is restricted to simple single-valued assignment expressions,
while the left-hand side may be a conjunction of these simple expressions. The
J Measure is given by:

where
,
and
are the probabilities
of occurrence of
,
, and
respectively, and the
terms inside the square brackets is the relative (or cross) entropy. Relative
entropy is the similarity (or goodness of fit) of two probability
distributions.
J Measure is a
symmetric function taking on values between 0 and ![]()
, that, returns a zero (0) value in the case of variable
independency.
High values for
J Measure are desirable, but are not necessarily associated with the best
rules. For example, rare conditions may be associated with the highest values
for J Measure (i.e. where a particular B is highly unlikely), but the resulting
rule is insufficiently general to provide any new information. Consequently
analysis may be required in which the accuracy of a rule is traded for some
level of generality or goodness-of-fit.
20. Smyth and
Goodmans’s J-Measure Variant
J Measure
has a high value (indicating that the underlying rule is a high quality one)
when there is a large difference between our a posteriori belief about
predicted class B and our a priori belief about it. The problem is that this
difference is measured in a symmetric way, i.e., the J Measure will have a high
value if either our a posteriori belief about the predicted class is much
larger than our a priori belief about B or vice-versa. In the former case the
rule has a high predictive accuracy. However, in the later case, the rule has a
low predictive accuracy. To avoid the discovery of this kind of rule
[Dieferson] used an
asymmetrical variant of the J Measure:
![]()
21. Gini Index
It
is one of the most common measures of income or wealth distributions. The Gini
index incorporates the more detailed shares data into a single statistic which
summarizes the dispersion of the income shares across the whole income
distribution [Riese 1997]. The Gini
coefficient may be expressed as a proportion or as a percentage. For
association rules, the Gini Index values lies in the range of 0 (when
and
are completely independent) to 0.5 (for perfect correlation)
[Tan 2000]. This measure treats both negative and positive correlations rules
in the same way and is defined by the following formula:
![]()
22. Collective
Strength
The collective strength of an itemset is defined to be a number between
0 to
. A value of 0 indicates perfect negative correlation, while
a value of
indicates perfectly positive correlation. A value of 1 indicates that the
strength of the rule is exactly as it would be under the assumption of
statistical independence. The Collective Strength Measure is defined as:
.
23. Jaccard
Jaccard Measure is
largely used in the text mining techniques to find the similarities between
words. Jaccard similarity is computed by making a comparison between the number
of elements shared by two objects, and the total number of elements in their
union. The similarity values range from 0.0 to 1.0
with 1.0 indicating a strong dependency. Jaccard Measure for association rules
is defined as:
.
24. Odds Ratio
The
odds ratio can be interpreted as a measure of the magnitude of association
between two raters. Generally, it shows the relative increase in the odds of
one rater making a given rating, given that the other rater made the same
rating-the value is invariant regardless of whether one is concerned with a
positive or negative rating, or which rater is the reference and which the
comparison. The Odds Ratio in the terms of association rules is defined by:
![]()
The odds ratio takes values between zero
(0) and infinity. One (1) is the neutral value and means that no association exist
between
and
; close to zero or infinity means a strong association.
25. Yule’s Q
Yule's Q is based on the odds ratio and is a symmetric measure taking on values between -1 and +1. 1 (one) implies perfect negative or positive association, 0 (zero) no association. In two by two tables Yule's Q is equal to Goodman and Kruskal's Gamma. Yule’s Q is defines as:
![]()
26.
Yule’s Y
Yule's Y, also called Yule's coefficient of colligation, is a variant on Yule's Q, using the geometric mean of diagonal and off-diagonal pairs rather than the number of pairs, as does Q. It is given by the following formula:

Yule’s Y is based
on the odds ratio and a symmetric measure taking on values between -1 and +1. 1
(one) implies perfect negative or positive association, 0 (zero) no
association. The measure tends to estimate associations more conservatively
than Yule's Q. The measure has little substantive or theoretical meaning.
27. Kappa (Cohen's)
Cohen's Kappa is used to measure how well two raters agree, taking into account the agreement which could have occurred just by chance. Values of Kappa vary between +1 (perfect agreement) and -1 (complete disagreement). The value of zero indicates that the observed agreement could have occurred by chance. Kappa is defined as:
.
28.
Kloesgen
Kloesgen is an asymmetrical measure that takes into account both interestingness and significance of a pattern. Kloesgen is defined as the product between the square root of the support and the dependency measure:
.
29.
Loevinger
Loevinger’s H
is based on the ratio of observed Guttman errors to total errors expected
under the null assumption that items are totally unrelated. Let
= the probability of a
Guttman error and let
equal the same probability under the null model of totally
unrelated items. Than, Loevinger’s H is
defined as: ![]()
The
Loevinger’s measure in the terms of association rules is defined by:
![]()
The range of
the values of the Loevinger’s measure is between
and 1. When there are
no contra-examples Loevinger’s measure returns 1(maximal correlation). When the
variables are independent (the null model), Loevinger is 0.
Interestingness
Measures Based on the Ratio of Examples and Negative-Examples
30. Sebag & Schoenauer
Sebag & Schoenauer is an interestingness measure used to quantify the quality of an implication by taking in account the rule disconfirmation. Sebag & Schoenauer is defined as:
.
Sebag & Schoenauer values varies from 0 to
, returning
in the case of the logical implications and 0 in the case
when no examples exist.
31.
Example and Contra-Example
ExampleContra-Example is a measure that take in
account both the implication and the disconfirmation of the rule. Its values
are in the range of
(for the worst ranked rules) and 1 (for the best ranked
rules).
.
|
No. |
Interestingness Measure |
Formula en terms ofN, P(A), P(B), P(A,B) |
Formula en terms of n, na, nb, nab |
|
1 |
Support (s) |
|
|
|
2 |
Confidence ( c ) |
|
|
|
3 |
Lift (= interest) |
|
|
|
4 |
Loevinger |
|
|
|
5 |
Piatetsky-Shapiro Rule Interest (RI) |
|
|
|
6 |
Piatesky-Shapiro Variant |
|
|
|
7 |
Gini index (G) |
|
|
|
8 |
|
|
|
|
9 |
Sebag_Schoenauer |
|
|
|
10 |
Descriptive Confirmed Confidence |
|
|
|
11 |
Descriptive Confirm |
|
|
|
12 |
Causal Support |
|
|
|
13 |
Causal Confidence |
|
|
|
14 |
Causal Confirmed Confidence |
|
|
|
15 |
J-measure (Goodman) |
|
|
|
16 |
J-measure variant |
|
|
|
17 |
Exemple & Contra exemple |
|
|
|
18 |
Laplace |
|
|
|
19 |
Pavillon (DEPENDENCY) Added Value (AV) |
|
|
|
20 |
Causal Confirm |
|
|
|
21 |
Implication Intensity Entropique Version |
…………………………………………………………………………………………………… |
|
|
22 |
Odds ratio |
|
|
|
23 |
Yule’s Q |
|
Idem |
|
24 |
Yule’s Y |
|
idem |
|
25 |
Kappa (K) Cohen’s |
|
|
|
26 |
Conviction (V) |
|
|
|
27 |
Interest (I) |
|
|
|
28 |
Cosine IS |
|
|
|
39 |
Taux de liaison |
|
|
|
30 |
Certainty factor (F) |
|
|
|
31 |
Putative Causal Dependency |
|
|
|
33 |
Collective Strength |
|
|
|
34 |
Jaccard |
|
|
|
35 |
Klosgen |
|
|
|
36 |
Ralambrodrainy |
. |
|
|
37 |
Wang |
|
|
|
38 |
Coefficient de corrélation |
|
|
|
39 |
Interestingness |
|
|
|
40 |
Centred
confidence |
|
|
|
41 |
Contribution orientée au |
|
|
|
42 |
Coverage |
P(A) |
|
|
43 |
Dice |
|
|
|
44 |
Ganascia |
|
|
|
45 |
Goodman-Kruskal |
|
|
|
46 |
Goodman and Kruskal’s Predictive association |
|
|
|
47 |
Indice
d’implication |
|
|
|
48 |
Indice
d’inclusion |
|
|
|
49 |
Information gain |
|
|
|
50 |
Implication
Index |
|
|
|
51 |
IPEE |
|
|
|
52 |
Kodratoff’s
Dependency |
|
|
|
53 |
Kulczynski |
|
|
|
54 |
Least
contradiction |
|
|
|
55 |
Leverage |
|
|
|
56 |
Multiplicateur
de côtes |
|
|
|
57 |
Mutual
Information |
|
|
|
58 |
Opposé
de l’indice d’implication |
|
|
|
59 |
Piatetsky
Shapiro Variant |
|
|
|
60 |
Prevalence |
|
|
|
61 |
Probabilistic
discriminant index |
|
|
|
62 |
Recall |
|
|
|
63 |
Relative risk |
|
|
|
64 |
Rogers
et Tanimoto |
|
|
|
65 |
Specificity |
|
|
|
66 |
Vraisemblance
du lien |
|
|
|
67 |
Yao and Liu's One way Support |
|
|
|
68 |
Yao and Liu's Two way Support |
|
|
|
69 |
Yao and Liu's Two way Support Variation |
|
|
|
70 |
Zhang |
|
1.
Piatetsky-Shapiro 1st Principle: Qm=0 if
(The comportment at the independency).
2.
Piatetsky-Shapiro 2nd Principle: Qm
monotonically increases with
when others parameters
are fixed.
3.
Piatetsky-Shapiro 3rd Principle: Qm
monotonically decreases with
or
when the others
parameters are fixed.
4.
Major Mangano ’93: Qm monotonically increases with the
rule coverage, given a fixed confidence factor greater than the baseline
confidence factor.
5.
Freitas ’99: A good Qm for association rules must be
asymmetrical.(If the Q measure is symmetric or not, if they give the same rank
to the rule A®B and B®A )
6.
Null Invariance Principle [Tan 2002]: the Q measure
monotonically decreases when the number of contra example increases
.
7.
The Q measure sensitivity: The sensibility of the Q
measure in the area of good rules. The number of rules that each Q measures
returned as “good”.
10.
Foundation [Hilderman 00]: statistics, information
theory, probabilistic, distance.
11.
If a statistical test exist for the specified measure
(ex chi 2, correlation coefficient, implication intensity)
12.
The sensitivity to the dataset dimensions
(n is a constant for
the specified dataset, it is useful only if we have to compare two associations
rules ranking values that came from different datasets.
13.
The time/resurses(memory) of computing each Q measure.
14.
The Q measure
should monotonically decrease with
when
is fixed (the rule is more interesting if
is small)
Limit
cases:
1.
(nab==0)
2.
(na==nab) && (nb!=nab) && (n!=nb)
3. (nb==nab)
&& (na!=nab) && (n!=na)
4. (na==nab)
&& (nb==nab) && (n!=na)
5. (na==n)
&& (nb!=n)
6. (nb==n)
&& (na!=n)
7. (na==n)
&& (nb==n)
[1]
The expression “variable independency” it is actually referring at the case of
variable statistical independency, i.e. when .