11email: [email protected]
11email: [email protected]
An incremental MaxSAT-based model to learn interpretable and balanced classification rules
Abstract
The increasing advancements in the field of machine learning have led to the development of numerous applications that effectively address a wide range of problems with accurate predictions. However, in certain cases, accuracy alone may not be sufficient. Many real-world problems also demand explanations and interpretability behind the predictions. One of the most popular interpretable models that are classification rules. This work aims to propose an incremental model for learning interpretable and balanced rules based on MaxSAT, called IMLIB. This new model was based on two other approaches, one based on SAT and the other on MaxSAT. The one based on SAT limits the size of each generated rule, making it possible to balance them. We suggest that such a set of rules seem more natural to be understood compared to a mixture of large and small rules. The approach based on MaxSAT, called IMLI, presents a technique to increase performance that involves learning a set of rules by incrementally applying the model in a dataset. Finally, IMLIB and IMLI are compared using diverse databases. IMLIB obtained results comparable to IMLI in terms of accuracy, generating more balanced rules with smaller sizes.
Keywords:
Interpretable Artificial Intelligence Explainable Artificial Intelligence Rule Learning Maximum Satisfiability.1 Introduction
The success of Machine Learning (ML) in recent years has led to a growing advancement in studies in this area [2, 8, 12]. Several applications have emerged with the aim of circumventing various problems and situations [4, 14, 20]. One such problem is the lack of explainability of prediction models. This directly affects the reliability of using these applications in critical situations involving, for example, finance, autonomous systems, damage to equipment, the environment, and even lives [1, 7, 23]. That said, some works seek to develop approaches that bring explainability to their predictions [13, 21, 22].
Precise predictions with high levels of interpretability are often not a simple task. There are some works that try to solve this problem by balancing the accuracy of the prediction with the interpretability [5, 9, 17, 24, 6, 16, 15]. It can be seen that some of these works use approaches based on the Boolean Satisfiability Problem (SAT) and the Maximum Boolean Satisfiability Problem (MaxSAT). The choice of these approaches to solve this problem has been increasingly recurrent in recent years. The reasons can be seen in the results obtained by these models.
SAT-based approaches have been proposed recently [18, 19] to learn quantifier-free first-order sentences from a set of classified strings. More specifically, given a set of classified strings, the goal is to find a first-order sentence over strings of minimum size that correctly classifies all the strings. One of the approaches demonstrated is SQFSAT (Synthesis of quantifier-free first-order sentences over strings with SAT). Upon receiving a set of classified strings, this approach generates a quantifier-free first-order sentence over strings in disjunctive normal form (DNF) with a given number of terms. What makes this method stand out is the fact that we can limit both the number of terms and the number of formulas per term in the generated formula. In addition, as the approach generates formulas in DNF, each term of the formula can be seen as a rule. Then, for each rule, its explanation is the conjunction of formulas in the rule, which can be interesting for their interpretability [11, 18]. On the other hand, as the model is based on the SAT problem, in certain situations it may bring results that are not so interesting in terms of interpretability and efficiency, such as in cases where the set of strings is large.
Ghosh, B. et al. created a classification model based on MaxSAT called IMLI [6]. The approach takes a set of classified samples, represented by vectors of numerical and categorical data, and generates a set of rules expressed in DNF or in conjunctive normal form (CNF) that correctly classifies as many samples as possible. In this work, we focus on using IMLI for learning rules in DNF. The number of rules in the set of rules can be defined similarly to SQFSAT, but IMLI does not consider the number of elements per rule. Although IMLI focuses on learning a sparse set of rules, it may obtain a combination of both large and small rules. IMLI also takes into account the option of defining a weighting for correct classifications. As the weighting increases, the accuracy of the model improves, but at the cost of an increase in the size of the generated set of rules. The smaller the weighting, the lower the accuracy of the model, but correspondingly, the generated set of rules tends to be smaller. Furthermore, IMLI uses an incremental approach to achieve better runtime performance. The incremental form consists of dividing the set of samples into partitions in order to generate a set of rules for each partition from the set of rules obtained in the previous partitions.
In this work, we aim to create a new approach for learning interpretable rules based on MaxSAT that unites SQFSAT with the incrementability of IMLI. The motivation for choosing SQFSAT is the possibility of defining the number of literals per clause, allowing us to generate smaller and more balanced rules. The choice of IMLI is motivated by its incrementability technique, which allows the method to train on large sets of samples efficiently. In addition, we propose a technique that reduces the size of the generated rules, removing possible redundancies.
This work is divided into 6 sections. In Section 2, we define the general notions and notations. Since all methods presented in this paper use Boolean logic, we also define in Section 2 how these methods binarize datasets with numerical and categorical data. In Section 3, SQFSAT and IMLI are presented, respectively. We present SQFSAT in the context of our work where samples consist of binary vectors instead of strings, and elements of rules are not first-order sentences over strings. In Section 4, our contribution is presented: IMLIB. In Section 5, we describe the experiments conducted and the results for the comparison of our approach against IMLI. Finally, in the last section, we present the conclusions and indicate future work.
2 Preliminaries
We consider the binary classification problem where we are given a set of samples and their classifications. The set of samples is represented by a binary matrix of size and their classifications by a vector of size . We call the matrix X and the vector y. Each row of X is a sample of the set and we will call it with . To represent a specific value of , we will use with . Each column of X has a label representing a feature and the label is symbolized by . To represent a specific value of y, we will use .
To represent the opposite value of , that is, if it is the opposite value is and vice versa, we use . Therefore, we will use the symbol to represent y with all opposite values. To represent the opposite value of , we use . Therefore, we will use the symbol to represent with all opposite values. Each label also has its opposite label which is symbolized by .
A partition of X is represented by with , where is the number of partitions. Therefore, the partitions of vector y are represented by . Each element of y is symbolized by and represents the class value of sample . We use and . To represent the size of these sets, that is, the number of samples contained in them, we use the notations: and .
Example 1
Let X be the set of samples
and their classifications . The samples are: . The values of each sample are: . The class values of each sample are: . We can divide X into two partitions in several different ways, one of which is: , , e .
Example 2
Let X be the set of samples from Example 1, then
and . The samples are: . The values of each sample are: . The class values of each sample are: . We can divide in partitions as in Example 1: , , e .
We define a set of rules being the disjunction of rules and is represented by R. A rule is a conjunction of one or more features. Each rule in R is represented by with , where is the number of rules. Moreover, represents the application of R to . The notations and are used to represent the number of features in R and , respectively.
Example 3
Let Man, Smoke, Hike be labels of features. Let R be the set of rules (Man) (Smoke Hike). The rules are: (Man) and (Smoke Hike). The application of R to is represented as follows: . For example, Let X be the set of samples from Example 1, then: . Moreover, we have that , and .
As we assume a set of binary samples, we need to perform some preprocessing. Preprocessing consists of binarizing a set of samples with numerical or categorical values. The algorithm divides the features into four types: constant, where all samples have the same value; binary, where there are only two distinct variations among all the samples for the same value; categorical, when the feature does not fit in constant and binary and its values are three or more categories; ordinal, when the feature does not fit into constant and binary and has numerical values.
When the feature type is constant, the algorithm discards that feature. This happens due to the fact that a feature common to all samples makes no difference in the generated rules. When the type is binary, one of the feature variations will receive and the other as new values. If the type is categorical, we employ the widely recognized technique of one-hot encoding. Finally, for the ordinal type feature, a quantization is performed, that is, the variations of this feature are divided into quantiles. With this, Boolean values are assigned to each quantile according to the original value.
We use SAT and MaxSAT solvers to implement the methods presented in this work. A solver receives a formula in CNF, for example: . Furthermore, a MaxSAT solver receives weights that will be assigned to each clause in the formula. A clause is the disjunction of one or more literals. The weights are represented by where is one or more clauses and represents the weight assigned to each one of them. A SAT solver tries to assign values to the literals in such a way that all clauses are satisfied. A MaxSAT solver tries to assign values to the literals in a way that the sum of the weights of satisfied clauses is maximum. Clauses with numerical weights are considered soft. The greater the weight, the greater the priority of the clause to be satisfied. Clauses assigned a weight of are considered hard and must be satisfied.
3 Rule learning with SAT and MaxSAT
3.1 SQFSAT
SQFSAT is a SAT-based approach that, given X, y, and the number of features per rule , tries to find a set of rules R with rules and at most features per rule that correctly classify all samples , that is, for all . In general, the approach takes its parameters X, y, and and constructs a CNF formula to apply it to a SAT solver, which returns an answer that is used to get R.
The construction of the SAT clauses is defined by propositional variables: , , , and , for . If the valuation of is true, it means that th feature label will be the th feature of the rule . Furthermore, if is true, it means that the th feature of the rule will be , in other words, will be positive. Otherwise, it will be negative: . If is true, it means that the th feature is skipped in the rule . In this case, we ignore . If is true, then the th feature of rule contributes to the correct classification of the th sample. If is true, then the rule contributes to the correct classification of the th sample. That said, below, we will show the constraints formulated in the model for constructing the SAT clauses.
Conjunction of clauses that guarantees that exactly one is true for the th feature of the rule :
(1) |
(2) |
Conjunction of clauses that ensures that each rule has at least one feature:
(3) |
We will use the symbol to represent the value of the th sample in the th feature label of X. If this value is , it means that if the th feature label is in the th position of the rule , then it contributes to the correct classification of the th sample. Therefore, . Otherwise, . That said, the following conjunction of formulas guarantees that is true if the th feature in the th rule contributes to the correct classification of the sample :
(4) |
Conjunction of formulas guaranteeing that if the th feature of a rule is skipped, then the classification of this rule is not interfered by this feature:
(5) |
Conjunction of formulas indicating that will be set to true if all the features of rule contribute to the correct classification of sample :
(6) |
Conjunction of clauses that guarantees that R will correctly classify all samples:
(7) |
(8) |
Next, the formula below is converted to CNF. Then, finally, we have the SAT query that is sent to the solver.
(9) |
3.2 IMLI
IMLI is an incremental approach based on MaxSAT for learning interpretable rules. Given X, y, , and a weight , the model aims to obtain the smallest set of rules M in CNF that correctly classifies as many samples as possible, penalizing classification errors with . In general, the method solves the optimization problem , where represents the number of features in M and denotes the application of the set of rules M to . Therefore, the approach takes its parameters X, y, and and constructs a MaxSAT query to apply it to a MaxSAT solver, which returns an answer that is used to generate M. Note that IMLI generates set of rules in CNF, whereas our objective is to obtain sets of rules in DNF. For that, we will have to use as parameter instead of y and negate the set of rules M to obtain a set of rules R in DNF.
The construction of the MaxSAT clauses is defined by propositional variables: and , for . The ranges from to , as it also considers opposite features. If the valuation of is true and , it means that feature will be in the rule , where is the th rule of M. If the valuation of the is true and , it means that feature will be in the rule . If the valuation of is true, it means that sample is not classified correctly, that is, . That said, below, we will show the constraints for constructing MaxSAT clauses.
Constraints that represent that the cost of a misclassification is :
(10) |
Constraints that represent that the model tries to insert as few features as possible in M, taking into account the weights of all clauses:
(11) |
Even though the constraints in 11 prioritize learning sparse rules, they do so by directing attention to the overall set of rules, i.e. in the total number of features in M. Then, IMLI may generate a set of rules that comprises a combination of both large and small rules. In our approach presented in Section 4, we address this drawback by limiting the number of features in each rule.
We will use to represent the set of variables of a rule , that is, , for . To represent the concatenation of two samples, we will use the symbol . We also use the symbol to represent an operation between two vectors of the same size. The operation consists of applying a conjunction between the corresponding elements of the vectors. Subsequently, a disjunction between the elements of the result is applied. The following example illustrates how these definitions will be used:
Example 4
Let be as in Example 1, and . Therefore, .
The objective of this operation is to generate a disjunction of variables that indicates if any of the features associated with these variables are present in , then sample will be correctly classified by . Now, we can show the formula that guarantees that if is false, then :
(12) |
We can see that is not in CNF. Therefore, formula below must be converted to CNF. With that, finally, we have the MaxSAT query that is sent to the solver.
(13) |
The set of samples X, in IMLI, can be divided into partitions: , , …, . Each partition, but the last one, contains the same values of and . Also, the samples are randomly distributed across the partitions. Partitioning aims to make the model perform better in generating the set of rules M. Thus, the conjunction of clauses will be created from each partition in an incremental way, that is, the set of rules M obtained by the current partition will be reused for the next partition. In the first partition, constraints (10), (11), (12) are created in exactly the same way as described. From the second onwards, (11) is replaced by the following constraints:
(14) |
The IMLI also has a technique for reducing the size of the generated set of rules. The technique removes possible redundancies in ordinal features as the one in Example 5. In the original implementation of the model, the technique is applied at the end of each partition. In our implementation for the experiments in Section 5, this technique is applied only at the end of the last partition. The reason for this is training performance.
Example 5
Let R be the following set of rules with redundancy in the same rule:
Then, the technique removes the redundancy and the following set of rules is obtained:
4 IMLIB
In this section, we will present our method IMLIB which is an incremental version of SQFSAT based on MaxSAT. IMLIB also has a technique for reducing the size of the generated set of rules. Therefore, our approach partitions the set of samples X. Moreover, our method has one more constraint and weight on all clauses. With that, our approach receives five input parameters X, y, , , and tries to obtain the smallest R that correctly classifies as many samples as possible, penalizing classification errors with , that is, . That said, below, we will show the constraints of our approach for constructing MaxSAT clauses.
Constraints that guarantee that exactly only one is true for the th feature of the rule :
(15) |
(16) |
Constraints representing that the model will try to insert as few features as possible in R:
(17) |
Conjunction of clauses that guarantees that each rule has at least one feature:
(18) |
The following conjunction of formulas ensures that is true if the th feature label in the th rule contributes to correctly classify sample :
(19) |
Conjunction of formulas guaranteeing that the classification of a specific rule will not be interfered by skipped features in the rule:
(20) |
Conjunction of formulas indicating that the model assigns true to if all the features of rule support the correct classification of sample :
(21) |
Conjunction of clauses designed to generate a set of rules R that correctly classify as many samples as possible:
(22) |
(23) |
Finally, after converting formula below to CNF, we have the MaxSAT query that is sent to the solver.
(24) |
IMLIB can also partition the set of samples X in the same way IMLI. Therefore, all constraints described above are applied in the first partition. Starting from the second partition, the constraints in (17) are replaced by the following constraints:
(25) |
IMLIB also has a technique for reducing the size of the generated set of rules demonstrated in Example 5. Moreover, we added two more cases which are described in Example 6 and Example 7.
Example 6
Let R be the following set of rules with opposite features in the same rule:
Therefore, the technique removes rules with opposite features in the same rule obtaining the following set of rules:
Example 7
Let R be the following set of rules with the same feature occurring twice in a rule:
Accordingly, our technique for removing redundancies eliminates repetitive features, resulting in the following set of rules:
5 Experiments
Databases | Samples | Features | ||
---|---|---|---|---|
lung cancer | 59 | 31 | 28 | 6 |
iris | 150 | 100 | 50 | 4 |
parkinsons | 195 | 48 | 147 | 22 |
ionosphere | 351 | 126 | 225 | 33 |
wdbc | 569 | 357 | 212 | 30 |
transfusion | 748 | 570 | 178 | 4 |
pima | 768 | 500 | 268 | 8 |
titanic | 1309 | 809 | 500 | 6 |
depressed | 1429 | 1191 | 238 | 22 |
mushroom | 8112 | 3916 | 4208 | 22 |
In this section, we present the experiments we conducted to compare our method IMLIB against IMLI. The two models were implemented111Source code of IMLIB and the implementation of the tests performed can be found at the link: https://github.com/cacajr/decision_set_models with Python and MaxSAT solver RC2 [10]. The experiments were carried out on a machine with the following configurations: Intel(R) Core(TM) i5-4460 3.20GHz processor, and 12GB of RAM memory. Ten databases from the UCI repository [3] were used to compare IMLI with IMLIB. Information on the datasets can be seen in Table 1. Databases that have more than two classes were adapted, considering that both models are designed for binary classification. For purposes of comparison, we measure the following metrics: number of rules, size of the set of rules, size of the largest rule, accuracy on test data and training time. The number of rules, size of the set of rules, and size of the largest rule can be used as interpretability metrics. For example, a set of rules with few rules and small rules is more interpretable than one with many large rules.
Each dataset was split into for training and for testing. Both models were trained and evaluated using the same training and test sets, as well as the same random distribution. Then, the way the experiments were conducted ensured that both models had exactly the same set of samples to learn the set of rules.
For both IMLI and IMLIB, we consider parameter configurations obtained by combining values of: , and , where is the number of samples per partition. Since IMLIB has the maximum number of features per rule as an extra parameter, for each parameter configuration of IMLI and its corresponding , we considered ranging from to one less than the size of the largest rule in . Thus, the value of that resulted in the best test accuracy was chosen to be compared with IMLI. Our objective is to evaluate whether IMLIB can achieve higher test accuracy compared to IMLI by employing smaller and more balanced rules. Furthermore, it should be noted that this does not exclude the possibility of our method generating sets of rules with larger sizes than IMLI.
For each dataset and each parameter configuration of , and , we conducted ten independent realizations of this experiment. For each dataset, the parameter configuration with the best average of test accuracy for IMLI was chosen to be inserted in Table 2. For each dataset, the parameter configuration with the best average of test accuracy for IMLIB was chosen to be inserted in Table 3. The results presented in both tables are the average over the ten realizations.
Databases | Models |
|
|
Accuracy | Training time | |||||
---|---|---|---|---|---|---|---|---|---|---|
lung cancer | IMLI | 2.00 0.00 | 3.60 0.84 | 2.20 0.63 | 0.93 0.07 | 0.0062 0.0016 | ||||
IMLIB | 2.00 0.00 | 2.20 0.63 | 1.10 0.32 | 0.93 0.07 | 0.0146 0.0091 | |||||
iris | IMLI | 2.00 0.00 | 7.60 1.35 | 4.50 1.08 | 0.90 0.08 | 0.0051 0.0010 | ||||
IMLIB | 2.00 0.00 | 4.90 1.20 | 2.50 0.71 | 0.84 0.12 | 0.0523 0.0378 | |||||
parkinsons | IMLI | 2.00 0.00 | 5.00 2.05 | 2.90 1.37 | 0.80 0.07 | 0.0223 0.0033 | ||||
IMLIB | 2.00 0.00 | 3.00 1.41 | 1.60 0.84 | 0.79 0.06 | 0.0631 0.0263 | |||||
ionosphere | IMLI | 2.90 0.32 | 12.00 1.63 | 5.20 0.63 | 0.81 0.05 | 0.0781 0.0096 | ||||
IMLIB | 3.00 0.00 | 7.70 3.02 | 2.70 1.16 | 0.79 0.04 | 0.2797 0.1087 | |||||
wdbc | IMLI | 2.90 0.32 | 8.70 2.50 | 3.70 1.34 | 0.89 0.03 | 0.0894 0.0083 | ||||
IMLIB | 3.00 0.00 | 5.30 2.36 | 1.80 0.79 | 0.86 0.06 | 0.2172 0.0800 | |||||
transfusion | IMLI | 1.00 0.00 | 3.10 0.88 | 3.10 0.88 | 0.72 0.08 | 0.0291 0.0026 | ||||
IMLIB | 1.00 0.00 | 2.00 0.82 | 2.00 0.82 | 0.68 0.08 | 0.5287 0.3849 | |||||
pima | IMLI | 1.00 0.00 | 5.10 0.74 | 5.10 0.74 | 0.68 0.09 | 0.0412 0.0032 | ||||
IMLIB | 1.00 0.00 | 1.90 1.10 | 1.90 1.10 | 0.74 0.04 | 0.6130 0.5093 | |||||
titanic | IMLI | 1.00 0.00 | 6.90 1.91 | 6.90 1.91 | 0.71 0.07 | 0.0684 0.0040 | ||||
IMLIB | 1.00 0.00 | 1.70 0.67 | 1.70 0.67 | 0.75 0.06 | 1.9630 3.2705 | |||||
depressed | IMLI | 1.80 0.42 | 7.50 2.64 | 5.30 1.89 | 0.74 0.08 | 0.2041 0.0059 | ||||
IMLIB | 2.00 0.00 | 6.20 3.36 | 3.30 1.95 | 0.79 0.04 | 0.5175 0.2113 | |||||
mushroom | IMLI | 2.90 0.32 | 16.30 2.91 | 8.20 2.20 | 0.99 0.01 | 0.3600 0.0340 | ||||
IMLIB | 3.00 0.00 | 12.30 7.24 | 4.30 2.54 | 0.97 0.03 | 2.3136 0.6294 |
Databases | Models |
|
|
Accuracy | Training time | |||||
---|---|---|---|---|---|---|---|---|---|---|
lung cancer | IMLIB | 2.00 0.00 | 2.20 0.63 | 1.10 0.32 | 0.93 0.07 | 0.0146 0.0091 | ||||
IMLI | 2.00 0.00 | 3.60 0.84 | 2.20 0.63 | 0.93 0.07 | 0.0062 0.0016 | |||||
iris | IMLIB | 2.90 0.32 | 6.80 1.48 | 2.50 0.53 | 0.90 0.07 | 0.0373 0.0095 | ||||
IMLI | 2.50 0.53 | 9.10 1.91 | 4.80 0.92 | 0.86 0.09 | 0.0062 0.0011 | |||||
parkinsons | IMLIB | 3.00 0.00 | 4.90 1.66 | 1.70 0.67 | 0.82 0.07 | 0.0868 0.0510 | ||||
IMLI | 3.00 0.00 | 8.40 1.90 | 3.70 1.06 | 0.79 0.07 | 0.0295 0.0064 | |||||
ionosphere | IMLIB | 2.00 0.00 | 5.00 1.70 | 2.50 0.85 | 0.82 0.06 | 0.2002 0.0725 | ||||
IMLI | 2.00 0.00 | 7.90 1.79 | 4.90 1.45 | 0.80 0.07 | 0.0531 0.0106 | |||||
wdbc | IMLIB | 1.00 0.00 | 1.20 0.42 | 1.20 0.42 | 0.89 0.04 | 0.0532 0.0159 | ||||
IMLI | 1.00 0.00 | 2.50 0.71 | 2.50 0.71 | 0.86 0.09 | 0.0357 0.0048 | |||||
transfusion | IMLIB | 1.00 0.00 | 1.70 0.67 | 1.70 0.67 | 0.72 0.03 | 0.2843 0.1742 | ||||
IMLI | 1.00 0.00 | 3.10 0.74 | 3.10 0.74 | 0.71 0.06 | 0.0273 0.0032 | |||||
pima | IMLIB | 1.00 0.00 | 1.90 1.10 | 1.90 1.10 | 0.74 0.04 | 0.6130 0.5093 | ||||
IMLI | 1.00 0.00 | 5.10 0.74 | 5.10 0.74 | 0.68 0.09 | 0.0412 0.0032 | |||||
titanic | IMLIB | 1.00 0.00 | 1.40 0.97 | 1.40 0.97 | 0.76 0.08 | 0.8523 1.7754 | ||||
IMLI | 1.00 0.00 | 6.80 1.87 | 6.80 1.87 | 0.68 0.12 | 0.0649 0.0047 | |||||
depressed | IMLIB | 3.00 0.00 | 13.30 5.12 | 4.70 1.89 | 0.80 0.04 | 0.7263 0.1692 | ||||
IMLI | 2.90 0.32 | 14.80 2.25 | 6.70 1.70 | 0.69 0.08 | 0.2520 0.0140 | |||||
mushroom | IMLIB | 1.00 0.00 | 6.70 0.95 | 6.70 0.95 | 0.99 0.00 | 1.2472 0.2250 | ||||
IMLI | 1.00 0.00 | 8.90 1.10 | 8.90 1.10 | 0.99 0.01 | 0.1214 0.0218 |
In Table 2, when considering parameter configurations that favor IMLI, we can see that IMLIB stands out in the size of the generated set of rules and in the size of the largest rule in datasets. Furthermore, our method achieved equal or higher accuracy compared to IMLI in four out of ten datasets. In datasets where IMLI outperformed IMLIB in terms of accuracy, our method exhibited a modest average performance gap of only three percentage points. Besides, IMLI outperformed our method in terms of training time in all datasets.
In Table 3, when we consider parameter configurations that favor our method, we can see that IMLIB continues to stand out in terms of the size of the generated set of rules and the size of the largest rule in all datasets. Moreover, our method achieved equal or higher accuracy than IMLI in all datasets. Again, IMLI consistently demonstrated better training time performance compared to IMLIB across all datasets.
As an illustrative example of interpretability, we present a comparison of the sizes of rules learned by both methods in the Mushroom dataset. Table 4 shows the sizes of rules obtained in all ten realizations of the experiment. We can observe that IMLIB consistently maintains a smaller and more balanced set of rules across the different realizations. This is interesting because unbalanced rules can affect interpretability. See realization , for instance. The largest rule learned by IMLI has a size of , nearly double the size of the remaining rules. In contrast, IMLIB learned a set of rules where the size of the largest rule is and the others have similar sizes. Thus, interpreting three rules of size at most is easier than interpreting a rule of size . Also as illustrative examples of interpretability, we can see some sets of rules learned by IMLIB in Table 5.
Realizations | Models | Rules sizes | |
---|---|---|---|
1 | – | IMLI | (4, 6, 5) |
1 | IMLIB | (1, 1, 1) | |
2 | – | IMLI | (6, 11, 0) |
8 | IMLIB | (3, 6, 6) | |
3 | – | IMLI | (3, 8, 5) |
5 | IMLIB | (5, 5, 5) | |
4 | – | IMLI | (6, 4, 4) |
2 | IMLIB | (2, 2, 2) | |
5 | – | IMLI | (3, 3, 4) |
2 | IMLIB | (2, 2, 2) | |
6 | – | IMLI | (5, 10, 6) |
6 | IMLIB | (6, 5, 6) | |
7 | – | IMLI | (5, 9, 3) |
8 | IMLIB | (8, 8, 8) | |
8 | – | IMLI | (4, 10, 3) |
6 | IMLIB | (5, 5, 6) | |
9 | – | IMLI | (9, 5, 4) |
6 | IMLIB | (6, 6, 6) | |
10 | – | IMLI | (4, 9, 5) |
1 | IMLIB | (1, 1, 1) |
Databases | Sets of rules | ||
---|---|---|---|
lung cancer |
|
||
iris |
|
||
parkinsons |
|
||
wdbc |
|
||
depressed |
|
6 Conclusion
In this work, we present a new incremental model for learning interpretable and balanced rules: IMLIB. Our method leverages the strengths of SQFSAT, which effectively constrains the size of rules, while incorporating techniques from IMLI, such as incrementability, cost for classification errors, and minimization of the set of rules. Our experiments demonstrate that the proposed approach generates smaller and more balanced rules than IMLI, while maintaining comparable or even superior accuracy in many cases. We argue that sets of small rules with approximately the same size seem more interpretable when compared to sets with a few large rules. As future work, we plan to develop a version of IMLIB that can classify sets of samples with more than two classes, enabling us to compare this approach with multiclass interpretable rules from the literature [11, 24].
References
- [1] Biran, O., Cotton, C.: Explanation and justification in machine learning: A survey. In: IJCAI-17 workshop on explainable AI (XAI). vol. 8, pp. 8–13 (2017)
- [2] Carleo, G., Cirac, I., Cranmer, K., Daudet, L., Schuld, M., Tishby, N., Vogt-Maranto, L., Zdeborová, L.: Machine learning and the physical sciences. Reviews of Modern Physics 91(4), 045002 (2019)
- [3] Dua, D., Graff, C.: UCI machine learning repository (2017), http://archive.ics.uci.edu/ml
- [4] Ghassemi, M., Oakden-Rayner, L., Beam, A.L.: The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health 3(11), e745–e750 (2021)
- [5] Ghosh, B., Malioutov, D., Meel, K.S.: Efficient learning of interpretable classification rules. Journal of Artificial Intelligence Research 74, 1823–1863 (2022)
- [6] Ghosh, B., Meel, K.S.: IMLI: An incremental framework for MaxSAT-based learning of interpretable classification rules. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. pp. 203–210 (2019)
- [7] Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., Yang, G.Z.: XAI—Explainable artificial intelligence. Science robotics 4(37), eaay7120 (2019)
- [8] Huang, H.Y., Broughton, M., Mohseni, M., Babbush, R., Boixo, S., Neven, H., McClean, J.R.: Power of data in quantum machine learning. Nature communications 12(1), 2631 (2021)
- [9] Ignatiev, A., Marques-Silva, J., Narodytska, N., Stuckey, P.J.: Reasoning-based learning of interpretable ML models. In: IJCAI. pp. 4458–4465 (2021)
- [10] Ignatiev, A., Morgado, A., Marques-Silva, J.: RC2: an efficient MaxSAT solver. Journal on Satisfiability, Boolean Modeling and Computation 11(1), 53–64 (2019)
- [11] Ignatiev, A., Pereira, F., Narodytska, N., Marques-Silva, J.: A SAT-based approach to learn explainable decision sets. In: Automated Reasoning: 9th International Joint Conference, IJCAR 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings 9. pp. 627–645. Springer (2018)
- [12] Janiesch, C., Zschech, P., Heinrich, K.: Machine learning and deep learning. Electronic Markets 31(3), 685–695 (2021)
- [13] Jiménez-Luna, J., Grisoni, F., Schneider, G.: Drug discovery with explainable artificial intelligence. Nature Machine Intelligence 2(10), 573–584 (2020)
- [14] Kwekha-Rashid, A.S., Abduljabbar, H.N., Alhayani, B.: Coronavirus disease (covid-19) cases analysis using machine-learning applications. Applied Nanoscience pp. 1–13 (2021)
- [15] Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: A joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1675–1684 (2016)
- [16] Malioutov, D., Meel, K.S.: MLIC: A MaxSAT-based framework for learning interpretable classification rules. In: Principles and Practice of Constraint Programming: 24th International Conference, CP 2018, Lille, France, August 27-31, 2018, Proceedings. pp. 312–327. Springer (2018)
- [17] Mita, G., Papotti, P., Filippone, M., Michiardi, P.: LIBRE: Learning interpretable boolean rule ensembles. In: AISTATS. pp. 245–255. PMLR (2020)
- [18] Rocha, T.A., Martins, A.T.: Synthesis of quantifier-free first-order sentences from noisy samples of strings. In: 2019 8th Brazilian Conference on Intelligent Systems (BRACIS). pp. 12–17. IEEE (2019)
- [19] Rocha, T.A., Martins, A.T., Ferreira, F.M.: Synthesis of a DNF formula from a sample of strings using Ehrenfeucht–Fraïssé games. Theoretical Computer Science 805, 109–126 (2020)
- [20] Sharma, A., Jain, A., Gupta, P., Chowdary, V.: Machine learning applications for precision agriculture: A comprehensive review. IEEE Access 9, 4843–4873 (2020)
- [21] Tjoa, E., Guan, C.: A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE transactions on neural networks and learning systems 32(11), 4793–4813 (2020)
- [22] Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76, 89–106 (2021)
- [23] Yan, L., Zhang, H.T., Goncalves, J., Xiao, Y., Wang, M., Guo, Y., Sun, C., Tang, X., Jing, L., Zhang, M., et al.: An interpretable mortality prediction model for covid-19 patients. Nature machine intelligence 2(5), 283–288 (2020)
- [24] Yu, J., Ignatiev, A., Stuckey, P.J., Le Bodic, P.: Computing optimal decision sets with SAT. In: Principles and Practice of Constraint Programming: 26th International Conference, CP 2020, Louvain-la-Neuve, Belgium, September 7–11, 2020, Proceedings 26. pp. 952–970. Springer (2020)