This is my final semester project and is based on using Parallel Genetic Algorithms to predict the presence of heart disease based on some attributes and rule sets
A new hybrid knowledge-based system using PRISM classifier and parallel genetic algorithm for diagnosis of Heart disease
In the present research, a new hybrid model (integrating PRISM learner and GA) is to be introduced for effective diagnosis of heart disease, addressing the limitations of the existing systems. In particular, the model suggests here a two-level optimization strategy, where level-1 first attempts to identify parallelly an optimal proportion (Popt) for training and test sets for each Heart dataset using PRISM learner on Cluster-based HPC machine. Next, the best training set (Tbest) for Popt is to be searched again parallelly by the same learner. On the other hand, level-2 optimization aims to refine the rule-set(R) generated by the PRISM learner on Tbest by employing parallel genetic algorithm(PGA).
To build the exact model, each heart dataset (D) is first split into three sub-sets, namely T1, T2 and T3 as follows: • T1 : This set is, indeed, the best training set (containing m% examples of each class) and it is found by adopting the strategy (under level-1 optimization) as discussed in Section-3.1.2. This is used by the PRISM learner to generate a rule set (R). • T2 : 20% examples of each class are selected at random from (D-T1) and included into T2. Simultaneously, these examples are removed from (D-T1). This set is basically used by the suggested PGA to refine the rule set R. • T3 : It is now D –(T1+T2). The optimized rules set finally finds accuracy over this sub-set.
The suggested SGA accepts the rule set induced by PRISM learner. Surely, the pre-condition values of the rules are discrete ones. In other words, the suggested GA accepts here discretized data only. One may note that the value of every non-target attribute in a discretized dataset starts from 1(i.e., not from 0), whereas the starting value of target attribute is 0. Based on this concept, the following coding and decoding schemes are employed here.
From the chosen discretized heart datasets, it has been observed that the largest value of any attributes among all the datasets does not exceed 7. Hence, 3 bits are sufficient to encode each attribute. Accordingly, 3x13=39 bits in total are necessary to represent every rule of the datasets, viz., Heart(Cleveland), Heart(Hungarian) and Heart(Swiss), since each of these databases has total 13 attributes (including) the class attribute. However, this size becomes 3x14=42 bits for Heart(Statlog) because this set contains total 14 attributes (including the class attribute).
Further, tabular like representation of the rules (as discussed in Section-2.2) is considered at the time of encoding. As per this representation, if ‘*’ appears in a rule for an attribute, then this symbol is simply encoded as: 000. On the other hand, an equivalent binary coding is adopted for values other than ‘*’. This technique is undoubtedly one of the easiest implementation techniques for encoding the rules.
We know that decoding is the reverse process of encoding. The decoding strategy adopted in the present study is briefly stated below. In particular, the following steps are applied during decoding.
Step-1: Access the binary stream block by block, each of size 3 bits.
Step-2: For each block, the value v is computed as: , where bi is the i-th bit from the left in
the block.
Step-3: If v=0, then place ‘’ at the position of the attribute corresponding to its block
(except for the target attribute),
else if (v > amax), then place the scaled value at that position, where amax
denotes the maximum value yield by that attribute,
else place simply the computed value at that position.
For illustration, let us take a non-target attributer, say Ak . Suppose that it possesses maximum discrete value, say 5. Now, if the block corresponding to Ak gets bit string: 000 during decoding, then it is simply substituted as: ‘’. On the other hand, if the block gets bit string, say 101, then it is substituted as 5=(101)2 . Let the block be 110. Then, the equivalent decimal value is 6. Now, as it is greater than 5, so scaling is necessary, and it will be as:
Truly, both the operations: crossover and mutation are performed on the converted binary rule. In the present study, each binary rule of any dataset is considered to be of uniform length, assuming 3 bits for each attribute. Let the length of each rule be L. • Selection of rules: In this research, the selection of parents for performing crossover is made at random. Here, the rules in the rule set (i.e., population) are numbered as: 1, 2, . . ., n. Now, two parents say: p1and p2 are picked randomly from the current rule set with n rules, and placed them into a mating pool (say M). Thus, the probability of a selecting rule in a rule set (with n rules) is:
• Crossover (reproduction operator): There are several ways to perform crossover, e.g., single point, two-point, etc. The present study performs two-point cross-over technique, where two distinct points: xi (i = 1, 2) are again chosen randomly within 1 < xi < L. Finally, the heads and tails (the parts respectively before and after the cutting points) of the parents are swapped. Obviously, probability (pc) of selecting any crossover point is : . One may note that cross-over probability indicates a ratio of how many couples are to be picked for mating pool. According to Goldberg(1989), cross-over probability focuses on each mating (not all mattings). In the present experiment, this concept is followed. So, the cross-over probability at each mating is: , as two rules are selected randomly for each mating pool.
• Mutation As per the basic concept of mutation operation in GA, a zero (‘0’) is changed to one (‘1’) and vice versa for a binary coded solution. Here, single point mutation (selected at random within the length L) is performed after crossover. Thus, pm(probability of selecting any mutation point) is . Now, the above mentioned three steps combinedly result two new individuals: say O1 and O2.
A fitness function is essentially an objective function for any problem, and it depends upon the nature of the problem. The function gives a means of evaluating a solution for its inclusion in the resulting set. From the perspective of decision making, factors such as prediction-accuracy, error (i.e., misclassification) rate, true positive rate, etc. are essentially considered for constructing fitness function. Keeping the point in mind that the heart databases are imbalanced in nature, a new promising fitness function for measuring fitness value of each rule (r ) in rule set (I(R)) is suggested here and it is given in equ(3.1). , where m and n represent respectively the numbers of training examples correctly and incorrectly classified by r over T2. Further, k represents the number of pre-conditions present in the rule r and is the number of examples of class ci present in T2 such that the class-label of rule r is also ci. The equ.(3.1) plays an important role to resolve collision that may occur among the rules of same class in I(R) and assists to chose the high quality rules with less number of informative pre-conditions. Also, it does not ignore the refinement of rules for rare cases in the optimized rule set
Stopping criteria: When the specific stopping criteria (e.g., number of iterations, a threshold measure, etc.) is satisfied, the GA process terminates. Here, number of iteration is set for stopping the process.
Variable: R: rule set where each rules in ‘IF- THEN’ format
I(R) : Rule set (R) in tabular form
O1, O2, r1, r2 : rule of I(R) format
ctr = 0, Max_itr // Max_itr says the maximum number of iterations
Step-1 Randomly selects two rules (solutions), say r1 and r2 from I(R)
Step-2: Apply the suggested two-point crossover and then mutation points on r1
and r2 to result two new distinct offspring: O1 and O2 (in decoded form).
Step-3: Check the rule-redundancy and rule-conflict cases for each of Oi (i =1, 2) with the rules
present in R following the strategy presented in Section-3.2.2, and take action(s) appropriately.
Step-4: If Oi (i=1, 2) is correct (i.e., neither redundant nor conflict), then evaluate fitness score of
each Oi (i=1, 2).
Step-5: Chose the best rule (rbest) based on fitness score on T2.
Step-6: Replace the worst rule: rworst (with class of rbest ) with the rbest if possible.
Step-7: ctr =ctr +1. If (ctr < Max_itr), then goto Step-1.
To understand the idea on distinct rule, identical rule and conflict rule, let us take a discretized dataset of a classification problem (P) with 4 non-target attributes, say A1, A2, A3 and A4 and class (i.e., target) attribute, say C.
Two rules: r1 and r2 are identical if min (|pre(r1)|, |pre(r2)|) = match(pre(r1), pre(r2)), where |pre(ri)| results the number of pre-conditions (each with a numerical value) present in rule: ri and the function: min(m1, m2) returns the minimum between two numbers: m1, m2.
Let r1 and r2 be two decision rules for P as follows:
• r1: If (A1=4) and (A2= 2) and (A4=1), then C=1
• r2: If (A1=4) and (A4=1), then C =1
The number of pre-conditions present in r1 is 3, whereas it is 2 in r2. In fact, the rules match at two places except for (A2=2) in r1. Clearly, the number of matched pre-conditions (m) is here 2 (i.e., m =2). Again, min (|pre(r1)|, |pre(r2)|)= min(3,2) returns 2, and it equals to m.
Hence, both the rules are identical but one supersedes the other. In other words, out of these two rules, one is the super rule of the other. It is observed that |pre(r2)| =2 , and it is less than |pre(r1)|=3. So, rule r2 is here treated as the super rule of r1, and r2 (instead of r1) is well expected to be present in rule set with the aim to classify more test examples. Definitely, r1 is redundant, and it is to be removed from rule set, R.
Two rules are termed as conflict rules if their antecedent parts are identical but consequent parts (i.e., class values) are different. For better realization, let r1 and r2 be two rules of P as:
• r1: If (A1=4) and (A2= 2) and (A4=1), then C=1
• r2: If (A1=4) and (A2= 2) and (A4=1), then C=2
Clearly, these two rules present an example of conflict rules, since their antecedent parts are same but class values are different (i.e., these are C=1 and C=2 respectively).
Now, if r1 and r2 are conflict, then keep the rule with maximum prediction rate on T2 and discard the other.