Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.2 KB

data-generation.md

File metadata and controls

30 lines (22 loc) · 1.2 KB

Data generation

You can generate your own data sets for LSQB. Note that these may slightly differ in size for different versions of the data generator – for publications, it's recommended to use the pre-generated data sets linked above.

  1. Run the LDBC Spark Datagen using CSV outputs and raw mode (see its README for instructions).

  2. Use the scripts in the converter repository:

    cd out/csv/raw/composite_merge_foreign/
    export DATAGEN_DATA_DIR=`pwd`
  3. Go to the data converter repository:

    ./spark-concat.sh ${DATAGEN_DATA_DIR}
    ./load.sh ${DATAGEN_DATA_DIR} --no-header
    ./transform.sh
    cat export/snb-export-only-ids-projected-fk.sql | ./duckdb ldbc.duckdb
    cat export/snb-export-only-ids-merged-fk.sql    | ./duckdb ldbc.duckdb
  4. Copy the generated files:

    export SF=1
    cp -r data/csv-only-ids-projected-fk/ ${LSQB_REPOSITORY_DIRECTORY}/data/social-network-sf${SF}-projected-fk
    cp -r data/csv-only-ids-merged-fk/    ${LSQB_REPOSITORY_DIRECTORY}/data/social-network-sf${SF}-merged-fk