Skip to content
Dian SUN edited this page Jul 29, 2020 · 3 revisions

Welcome to the SCALPEL-Analysis wiki!

This will guide you to understand and fully use the SCALPEL-Analysis library.

SCALPEL: A Scalable Pipeline

As SCALPEL-Flattening and SCALPEL-Extraction perform batch operations, they need to read (resp. write) input (resp. output) data from the file-system (local or HDFS). They are implemented in Scala in order to access Spark's low-level API and take advantage of functional programming and static typing, resulting in rigorous automated testing (94% of the Scala code is covered by unit tests). Both can be configured through textual configuration files or be used as libraries. SCALPEL-Analysis is a python module implemented in Python/PySpark and designed for interactive use. It can be used in a Jupyter notebook for instance. This workflow is illustrated in following Fig.

jpg

Clone this wiki locally