Association Rule Classification (arc)

This package for R implements the Classification based on Associations algorithm (CBA):

Liu, B. Hsu, W. and Ma, Y (1998). Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI Press. pp 80-86.

The arules package is used for the rule generation step.

The package is also available in R CRAN repository as Association Rule Classification (arc) package.

Citing this package

This package for R is described in an R Journal article

Hahsler, M., Johnson, I., Kliegr, T., & Kuchar, J. (2019). Associative Classification in R: arc, arulesCBA, and rCBA. R Journal, 9(2).

Key features

Automatic discretization of predictor attributes
Automatic tuning of support and confidence thresholds
Pure R package

Installation

The package can be installed directly from CRAN using the following command executed from the R environment:

install.packages("arc")

Development version can be installed from github from the R environment using the devtools package.

devtools::install_github("kliegr/arc")

Examples

Use case 1: Building an interpretable classifier

library(arc)
set.seed(101)

# dataset setup

iris_shuffled <- datasets::iris[sample(nrow(datasets::iris)),]
train <- iris_shuffled[1:100,]
test <- iris_shuffled[101:nrow(iris_shuffled),]
classatt <- "Species"

# learn, apply and evaluate the CBA classifier
rm <- cba(train, classatt)
prediction <- predict(rm, test)
acc <- CBARuleModelAccuracy(prediction, test[[classatt]])
print(acc)

# interpret by listing the rules in the classifier
inspect(rm@rules)

Use case 2: Reducing the number of association rules (pruning)

Association rule learning often generates large number of rules. This shows how to use the arc package to reduce the size of the rule set.

library(arc)
data(Adult)
classitems <- c("income=small","income=large") #define target attribute (consequent)
rules <- apriori(Adult, parameter = list(supp = 0.05, conf = 0.5, target = "rules"), appearance=list(rhs=classitems, default="lhs"))
# now we have 1266 rules
pruned <- prune(rules,Adult,classitems)
inspect(pruned)
# only 174 after pruning with arc

Additional reduction of the size of the rule set can be achieved by setting greedy_pruning=TRUE.

pruned <- prune(rules, Adult, classitems, greedy_pruning=TRUE)
inspect(pruned)
# produces 141 rules

The resulting rule list can also be used as a classifier.

Use case 3: Creating a classifier with upper bound on the number of rules in it

In some cases, pruning does not produce sufficiently concise rule list. Function topRules allows the user to set the target number of rules that will be used as an input for classifier building, thus serving as the upper bound on rule count.

The arules documentation gives the following example:

data("Adult")
rules <- topRules(Adult, target_rule_count = 100, init_support = 0.5, init_conf = 0.9, minlen = 1, init_maxlen = 10)
summary(rules)

This will return exactly 100 rules. These can then be passed to CBA for pruning:

pruned <- prune(rules, Adult, classitems, greedy_pruning=TRUE)

The resulting classifier stored in pruned has 33 rules.

Use case 4: Explaining predictions

First, let's consider a classifier similar to the one learnt in Use case 1, which in prediction contains predicted classes for each instance in test: Consider test instance 1:

test[1,]

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51            7.7         3.8          6.7         2.2 virginica

The prediction is

prediction[1]

[1] virginica
Levels: setosa versicolor virginica

Which rule classified a particular instance?

Consider test instance 1:

firingRuleIDs <- predict(rm, testFold, outputFiringRuleIDs=TRUE) 
inspect(rm@rules[firingRuleIDs[1]])

Obtaining prediction confidence

confidence_scores <- predict(rm, testFold, outputConfidenceScores=TRUE)

For a particular instance:

rm@rules[firingRuleIDs[1]]@quality$confidence
rm@rules[firingRuleIDs[1]]@quality$orderedConf
rm@rules[firingRuleIDs[1]]@quality$cumulativeConf

Explanation:

rule confidence is computed as $a/(a+b)$, where $a$ is the number of instances matching both the antecedent and consequent (available in slot support) and $b$ is the number of instances matching the antecedent but not matching the consequent of the given rule.

The arc package provides two alternative measures:

order-sensitive confidence is computed only from instances reaching the given rule. Note that CBA generates ordered rule lists.
cumulative confidence is an experimental measure computed as the accuracy of the rule list comprising the given rule and all higher priority rules (rules with lower index) with uncovered instances excluded from the computation.

AUC and ROC curve for binary classification

library(ROCR)
set.seed(101)
classitems <- c("income=small","income=large")
adult <- read.table('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', 
                    sep = ',', fill = F, strip.white = T, col.names = c('age', 'workclass', 'fnlwgt', 'educatoin', 
                                                                        'educatoin_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 
                                                                        'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income'))
split = sample(c(TRUE, FALSE), nrow(adult), replace=TRUE, prob=c(0.75, 0.25))

trainFold <- adult[split,]
testFold <- adult[!split,]

classAtt <- "income"
positiveClass<-">50K"
rm <- cba(trainFold, classAtt, list(target_rule_count = 1000))
confidence_scores <- predict(rm, testFold, outputConfidenceScores=TRUE,positiveClass=positiveClass)

pred_cba <- ROCR::prediction(confidence_scores, factor(testFold[[classAtt]]))
roc_cba <- ROCR::performance(pred_cba, "tpr", "fpr")
ROCR::plot(roc_cba, lwd=2, colorize=TRUE)
lines(x=c(0, 1), y=c(0, 1), col="black", lwd=1)
auc <- ROCR::performance(pred_cba, "auc")
auc <- unlist(auc@y.values)
auc

> auc
[1] 0.8946532

Performance tweaks

Rule learning

When invoking topRules, set init_maxlen parameter to a low value:

data("Adult")
classitems <- c("income=small","income=large")
rules <- topRules(Adult, target_rule_count = 100, init_support = 0.05, init_conf = 0.5, minlen = 1, init_maxlen = 2, appearance=list(rhs=classitems, default="lhs"))
inspect(rules)

Rule pruning

Experiment with the value of the rule_window parameter. This has no effect on the quality of the classifier.
Set greedy_pruning to TRUE. This will have generally slightly adverse impact on the quality of the classifier, but it will decrease the size of the rule set and reduce the time required for pruning. Greedy pruning is not part of the CBA algorithm as published by Liu et al (1998).

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
R		R
data		data
man		man
tests/testthat		tests/testthat
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md
arc.Rproj		arc.Rproj
arc.Rproj.RData		arc.Rproj.RData

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Association Rule Classification (arc)

Citing this package

Key features

Installation

Examples

Use case 1: Building an interpretable classifier

Use case 2: Reducing the number of association rules (pruning)

Use case 3: Creating a classifier with upper bound on the number of rules in it

Use case 4: Explaining predictions

Which rule classified a particular instance?

Obtaining prediction confidence

AUC and ROC curve for binary classification

Performance tweaks

Rule learning

Rule pruning

About

Releases 3

Packages

Languages

kliegr/arc

Folders and files

Latest commit

History

Repository files navigation

Association Rule Classification (arc)

Citing this package

Key features

Installation

Examples

Use case 1: Building an interpretable classifier

Use case 2: Reducing the number of association rules (pruning)

Use case 3: Creating a classifier with upper bound on the number of rules in it

Use case 4: Explaining predictions

Which rule classified a particular instance?

Obtaining prediction confidence

AUC and ROC curve for binary classification

Performance tweaks

Rule learning

Rule pruning

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages