A node script to transform datagraft RDF mappings to Arango DB values
This script uses the datalib [http://vega.github.io/datalib/] to parse csv. Make sure this is installed/avaiable before running the script.
This can be installed using eg node npm install datalib
or check the github pages for more info.
To run the script localy start a comand-window to run node localy
- Add / Edit the RDF mapping values, and vocabulary for the RDF mapping. The RDF mapping is in the file
rdfMapping.js
and the vocab inrdfVocab.js
- load the
transformscript.js
into Node. (In node repl it can be done with.load transformscript.js
) - Read the csv file with
read([path_to_file]);
specifying the file path as input. - Build a list of haddings with
build();
this is used to get the propper colum when inserting data trhought the transformation - Now run the mapping function with
run();
this should start the entire process and outputs a .json file for edges and for values corresponding to the arangoformat.
The json files has one object per line, therby noe filesize limitation when using the arango import function
Run these lines of code to copy the files to the docker container, and import values into Arango.
NB! substitut "arangodb" with the name of your docker instance
To copy the files:
docker cp arango_value.json arangodb:/arangovalue.json
docker cp arango_edge.json arangodb:/arangoedge.json
To import and make new collection (exchange "test" / "test_edge" with the collection name)
docker exec -i arangodb arangoimp --file arangovalue.json --collection test --create-collection true
docker exec -i arangodb arangoimp --file arangoedge.json --collection test_edge --create-collection true --create-collection-type edge --from-collection-prefix test --to-collection-prefix test
docker cp csvdocument.csv arangodb:/csvdocument.csv
docker exec -i arangodb arangoimp --file csvdocument.csv --type csv --collection test_csv --create-collection true
- Implementing a csv parser, so we ensure it is not braking on bad formatting / different formatting than whats coded.
- Run the mapping per line while reading the csv, don't read everything in to memory before transforming.
- Inserting information into the DB with the REST API inserting information on a stream basis?