This folder contains the source code to execute the ODM loan validation sample in an Apache Spark cluster.
You need an IBM ODM 892 or higher installation to build the application. Root of your ODM installation is referred as in the instructions below. Maven files will look for the ODM jars under /executionserver/libs directory.
Clone this repository.
git clone
Open an terminal where your have cloned this repository.
cd decisions-on-spark/simple-loan-validation-spark-runner
For ODM 8.10.X releases
mvn clean install -Dodm.install=<INSTALLDIR> -Dodm.version=<VERSION>
Or ODM 8.9.2
mvn clean install -f pom-8.9.xml -Dodm.install=<INSTALLDIR>
INSTALLDIR is the ODM 892 or upper version installation directory. VERSION is the version of ODM by example 8.10.3.0. This number has to match with the jar names.
Automate loan validation on a CSV applications dataset to produce a CSV decision set.
java -cp target/simpleloanvalidationsparkrunner-1.0-SNAPSHOT-withspark.jar com.ibm.decisions.spark.loanvalidation.LoanValidationSparkRunner --input ../data/loanvalidation/1K/loanvalidation-requests-1K.csv --output ../data/loanvalidation/1K/loanvalidation-decisions-1K.csv --master local[8]
Automate loan validation on a JSON applications dataset to produce a JSON decision set.
java -cp target/simpleloanvalidationsparkrunner-1.0-SNAPSHOT-withspark.jar com.ibm.decisions.spark.loanvalidation.LoanValidationSparkRunner --input ../data/loanvalidation/1K/loanvalidation-requests-1K.json --output ../data/loanvalidation/1K/loanvalidation-decisions-1K.json --master local[8]
Automate loan validation on a JSON applications dataset to produce a JSON decision set and to display a Rule coverage.
java -cp target/simpleloanvalidationsparkrunner-1.0-SNAPSHOT-withspark.jar com.ibm.decisions.spark.loanvalidation.LoanValidationSparkRunner --input ../data/loanvalidation/1K/loanvalidation-requests-1K.json --output ../data/loanvalidation/1K/loanvalidation-decisions-1K.json --master local[8] --rulecoverage
Rule based automation works in a cluster with the same integration pattern and code than in standalone. Only differences of the application are about:
- the access to the datasets, as the Spark driver and executors run on different machines and local file systems. In consequence data have to be stored in hdfs or other shared persistence.
- the packaging, as Spark jars are not needed in the uber jar but already deployed in the cluster.
The target/simpleloanvalidationsparkrunner-1.0-SNAPSHOT-withodmrt.jar contains required classes to submit a Spark job.
The LoanValidationSparkRunner application can read or generate in memory the requests, then applies the loan validation decision logic, and computes metrics and finally KPIs.
Below is the submit command as tested with the IBM Cloud Spark service with a random generation of the requests.
./spark-submit.sh \
--vcap ./vcap-odm123.json \
--name “loan-validation” \
--deploy-mode cluster \
--conf spark.service.spark_version=2.1 \
--class com.ibm.decisions.spark.loanvalidation.LoanValidationSparkRunner \
target/simpleloanvalidationsparkrunner-1.0-SNAPSHOT-withodmrt.jar \
--inputgen 1000 \
--output loanvalidation-decisions-1K.json
By submitting the application you get a trace similar to this one.
When opening the stdout file you can check the loan approval traces and obtain the KPIs.
...
Loan approved=false with a yearly repayment=0.0 insurance required:false messages= [The borrower's age is not valid., The loan amount is under the maximum authorized] executed in thread Executor task launch worker for task 8
Loan approved=true with a yearly repayment=1464.7636429039499 insurance required:true messages= [ The loan amount is under the maximum authorized, Low risk loan, Congratulations! Your loan has been approved] executed in thread Executor task launch worker for task 8
Decision batch metrics
Number of loan applications processed: 1000 in 2995 ms
Number of decision per sec: 333.0
Number of approved loan applications: 291 on a 1000 total
Number of loans approved with a YearlyInterestRate > 5%: 291
- Running in IBM Analytic Engine & HDP: Shows how to automatically approve or reject loan applications by applying a business rules reasoning in a Spark map reduce approach.
- Running in Microsoft Azure Databricks: Shows how to automatically approve or reject loan applications by applying a business rules reasoning in a Spark map reduce approach.