Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Latest commit

 

History

History
107 lines (73 loc) · 3.6 KB

README.md

File metadata and controls

107 lines (73 loc) · 3.6 KB

sbt test License: MIT

GraphSense Transformation Pipeline (Moved to graphsense-spark)

CAUTION: Code is now maintained in repository Graphsense Spark

The GraphSense Transformation Pipeline reads raw block data, which is ingested into Apache Cassandra by the graphsense-blocksci / graphsense-bitcoin-etl component. The transformation pipeline computes de-normalized views using Apache Spark, which are again stored in Cassandra.

Access to computed de-normalized views is subsequently provided by the GraphSense REST interface, which is used by the graphsense-dashboard component.

This component is implemented in Scala using Apache Spark.

Local Development Environment Setup

Prerequisites

Make sure Java 8 and sbt >= 1.0 is installed:

java -version
sbt about

Download, install, and run Apache Spark (version 3.2.1) in $SPARK_HOME:

$SPARK_HOME/sbin/start-master.sh

Download, install, and run Apache Cassandra (version >= 3.11) in $CASSANDRA_HOME

$CASSANDRA_HOME/bin/cassandra -f

Ingest Raw Block Data

Run the following script for ingesting raw block test data

./scripts/ingest_test_data.sh

This should create a keyspace btc_raw (tables exchange_rates, transaction, block, block_transactions). Check as follows

cqlsh localhost
cqlsh> USE btc_raw;
cqlsh:btc_raw> DESCRIBE tables;

Execute Transformation Locally

Create the target keyspace for transformed data

cqlsh -f scripts/schema_transformed.cql

Compile and test the implementation

sbt test

Package the transformation pipeline

sbt package

Run the transformation pipeline on localhost

./submit.sh

macOS only: make sure gnu-getopt is installed (brew install gnu-getopt).

Check the running job using the local Spark UI at http://localhost:4040/jobs

Submit on a standalone Spark Cluster

Use the submit.sh script and specify the Spark master node (e.g., -s spark://SPARK_MASTER_IP:7077) and other options:

./submit.sh -h
Usage: submit.sh [-h] [-m MEMORY_GB] [-c CASSANDRA_HOST] [-s SPARK_MASTER]
                 [--currency CURRENCY]
                 [--raw_keyspace RAW_KEYSPACE]
                 [--tgt_keyspace TGT_KEYSPACE]
                 [--bucket_size BUCKET_SIZE]
                 [--bech32-prefix BECH32_PREFIX]
                 [--checkpoint-dir CHECKPOINT_DIR]
                 [--coinjoin-filtering]