Python library for conveniently constructing and executing Machine Learning (ML) pipelines represented by Knowledge Graphs (KGs). It features a coding interface and a CLI, and allows the user to:
- Construct an ML pipeline that gets a CSV as input and processes the data using any of the available tasks and methods.
- Save the constructed pipeline as a KG in Turtle format.
- Execute the generated KG.
The coding interface is demonstrated with three sample Python files. The pipelines represented by the generated sample KGs are briefly explained below:
- ML pipeline: Loads features and labels from an input CSV dataset, splits the data, trains and tests a k-NN model, and visualizes the prediction errors.
- Statistics pipeline: Loads a feature from an input CSV dataset, normalizes it, and plots its values (before and after normalization) using a scatter plot.
- Visualization pipeline: Loads a feature from an input CSV dataset and plots its values using a line plot.
Under the hood, ExeKGLib uses well-known Python libraries for data processing and visualization and performing predictions such as pandas, matplotlib, and scikit-learn.
ExeKGLib is described in the following paper published as part of ESWC 2023:
Klironomos A., Zhou B., Tan Z., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E. ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics
Detailed information (installation, documentation etc.) about ExeKGLib can be found in its website and basic information is shown below.
To install, run pip install exe-kg-lib
.
For detailed installation instructions, refer to the installation page of ExeKGLib's website.
Click to expand
KG schema (abbreviation) | Task | Method | Properties | Input (data structure) | Output (data structure) | Implemented by Python class |
---|---|---|---|---|---|---|
Machine Learning (ml) | Train | KNNTrain | - | DataInTrainX (Matrix or Vector) DataInTrainY (Matrix or Vector) |
DataOutPredictedValueTrain (Matrix or Vector) DataOutTrainModel (SingleValue) |
TrainKNNTrain |
Machine Learning (ml) | Train | MLPTrain | - | DataInTrainX (Matrix or Vector) DataInTrainY (Matrix or Vector) |
DataOutPredictedValueTrain (Matrix or Vector) DataOutTrainModel (SingleValue) |
TrainMLPTrain |
Machine Learning (ml) | Train | LRTrain | - | DataInTrainX (Matrix or Vector) DataInTrainY (Matrix or Vector) |
DataOutPredictedValueTrain (Matrix or Vector) DataOutTrainModel (SingleValue) |
TrainLRTrain |
Machine Learning (ml) | Test | KNNTest | - | DataInTestModel (SingleValue) DataInTestX (Matrix or Vector) |
DataOutPredictedValueTest (Matrix or Vector) | TestKNNTest |
Machine Learning (ml) | Test | MLPTest | - | DataInTestModel (SingleValue) DataInTestX (Matrix or Vector) |
DataOutPredictedValueTest (Matrix or Vector) | TestMLPTest |
Machine Learning (ml) | Test | LRTest | - | DataInTestModel (SingleValue) DataInTestX (Matrix or Vector) |
DataOutPredictedValueTest (Matrix or Vector) | TestLRTest |
Machine Learning (ml) | PerformanceCalculation | PerformanceCalculationMethod | - | DataInTrainRealY (Matrix or Vector) DataInTrainPredictedY (Matrix or Vector) DataInTestPredictedY (Matrix or Vector) DataInTestRealY (Matrix or Vector) |
DataOutMLTestErr (Vector) DataOutMLTrainErr (Vector) |
PerformanceCalculationPerformanceCalculationMethod |
Machine Learning (ml) | Concatenation | ConcatenationMethod | - | DataInConcatenation (list of Vector) | DataOutConcatenatedData (Matrix) | ConcatenationConcatenationMethod |
Machine Learning (ml) | DataSplitting | DataSplittingMethod | - | DataInDataSplittingX (Matrix or Vector) DataInDataSplittingY (Matrix or Vector) |
DataOutSplittedTestDataX (Matrix or Vector) DataOutSplittedTrainDataY (Matrix or Vector) DataOutSplittedTrainDataX (Matrix or Vector) DataOutSplittedTestDataY (Matrix or Vector) |
DataSplittingDataSplittingMethod |
Visualization (visu) | CanvasTask | CanvasMethod | hasCanvasName (string) hasLayout (string) |
- | - | CanvasTaskCanvasMethod |
Visualization (visu) | PlotTask | LineplotMethod | hasLineStyle (string) hasLineWidth (int) hasLegendName (string) |
DataInVector (Vector) | - | PlotTaskLineplotMethod |
Visualization (visu) | PlotTask | ScatterplotMethod | hasLineStyle (string) hasLineWidth (int) hasScatterSize (int) hasLegendName (string) |
DataInVector (Vector) | - | PlotTaskScatterplotMethod |
Statistics (stats) | TrendCalculationTask | TrendCalculationMethod | - | DataInTrendCalculation (Vector) | DataOutTrendCalculation (Vector) | TrendCalculationTaskTrendCalculationMethod |
Statistics (stats) | NormalizationTask | NormalizationMethod | - | DataInNormalization (Vector) | DataOutNormalization (Vector) | NormalizationTaskNormalizationMethod |
Statistics (stats) | ScatteringCalculationTask | ScatteringCalculationMethod | - | DataInScatteringCalculation (Vector) | DataOutScatteringCalculation (Vector) | ScatteringCalculationTaskScatteringCalculationMethod |
- Via code: See the provided examples. To fetch them to your working directory for easy access, run
typer exe_kg_lib.cli.main run get-examples
. - Step-by-step via CLI: Run
typer exe_kg_lib.cli.main run create-pipeline
.
- Via code: See example code.
- Via CLI: Run
typer exe_kg_lib.cli.main run run-pipeline <pipeline_path>
.
To perform this type of ExeKGLib extension, there are 3 required steps:
- Selection of a relevant bottom-level KG schema (Statistics, ML, or Visualization) according to the type of the new task and method.
- Addition of new semantic components (entities, properties, etc) to the selected KG schema.
- Addition of a Python class to the corresponding module of
exe_kg_lib.classes.tasks
package.
For steps 2 and 3, refer to the relevant page of ExeKGLib's website.
See the Code Reference and Development sections of the ExeKGLib's website.
- Top-level: Data Science
- Bottom-level: Visualization | Statistics | Machine Learning
The above KG schemata are included in the ExeKGOntology repository.
The dataset was generated using the sklearn.datasets.make_classification()
function of the scikit-learn Python library.
ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.