Skip to content

triple_store

Kai Blumberg edited this page Oct 29, 2021 · 25 revisions

Commands to get the PM paper 3 triple store and other technology working.

Tarql

https://tarql.github.io/

https://github.com/tarql/tarql

blog posts about Tarql: https://www.bobdc.com/blog/tarql/, https://thecaglereport.com/2021/05/18/using-tarql-to-convert-excel-spreadsheets-to-rdf/, https://www.bobdc.com/blog/sparqlcsvjoin/

requires java 1.8 or abve

git clone https://github.com/cygri/tarql

brew install maven //On my mac linux different

mvn clean install -DskipTests //Make sure to be in the tarql/ directory


// probably be good to add the `/target/appassembler/bin/` to PATH so it can be used anywhere

### testing

cd target/appassembler //get to tarql executable


sh bin/tarql --ntriples ../../examples/sample-2.sparql ../../examples/TechCrunchcontinentalUSA.csv
sh bin/tarql ../../examples/sample-2.sparql ../../examples/TechCrunchcontinentalUSA.csv
sh bin/tarql ../../examples/sample-2.sparql ../../examples/TechCrunchcontinentalUSA.csv > ../../examples/outputs/test1.ttl
sh bin/tarql --ntriples ../../examples/sample-2.sparql ../../examples/TechCrunchcontinentalUSA.csv > ../../examples/outputs/test1.rdf
sh bin/tarql ../../examples/sample-arsenal-table_2.sparql ../../examples/arsenal_table_2.csv > ../../examples/outputs/arsenal.ttl

tarql /Users/kai/Desktop/software/tarql/examples/sample-2.sparql /Users/kai/Desktop/software/tarql/examples/TechCrunchcontinentalUSA.csv

in ~/Desktop/scratch/planet_microbe/planet_microbe_functional_annotation_scripts/triples

in test1

Run tarql --tabs mini_test_go_out.sparql mini_test_go_out.tsv > mini_test_go_out.ttl //original csv version with just go term and count

in test2

tarql --tabs --dedup 100 mini_test_go_out_sample.sparql mini_test_go_out_sample.tsv > mini_test_go_out_sample.ttl

in test3

tarql -H --tabs --dedup 100 test2.sparql test_headerless.tsv > test3.ttl

in test4

tarql -H --tabs --dedup 100 go.sparql go_input.tsv > test4.ttl

Tarql is actually built using the Jena toolkit (ARQ), which means that it has many of the same capabilities and limitations that the Jena/Fuseki2 RDF server has, and can be extended in the same way that ARQ can (see https://jena.apache.org/documentation/query/library-function.html for details about the ARQ extension library).

installing apache jena

downloaded tar.gz from https://jena.apache.org/download/index.cgi

gunzip -c apache-jena-4.2.0.tar.gz | tar xopf -

add to path

# Apache Jena
export JENA_HOME=/Users/kai/scripts/apache-jena-4.2.0/
export PATH=$PATH:$JENA_HOME/bin

unfortunately this didn't quite work it sees the scripts but the java version is wrong. Probably like this post

fixed it with the following to my .bash_profile

export JAVA_8_HOME=$(/usr/libexec/java_home -v1.8)
export JAVA_12_HOME=$(/usr/libexec/java_home -v12)

alias java8='export JAVA_HOME=$JAVA_8_HOME'
alias java12='export JAVA_HOME=$JAVA_12_HOME'

# default to Java 12
java12

For RDF queries (I'm pretty sure this is what I used in my masters).

Tripple store, Peter said to use tbd2. TDB2

https://jena.apache.org/documentation/tdb2/tdb2_admin.html

https://jena.apache.org/documentation/tdb2/tdb2_cmds.html // this is super helpful

https://jena.apache.org/documentation/tdb/faqs.html

Expose triples as a SPARQL end-point accessible over HTTP. Peter said to use most recent Fuseki as a front end for data management (along with TBD).

fuseki-quick-start will need Apache Tomcat for it's webapp service.

https://jena.apache.org/documentation/fuseki2/fuseki-layout.html

This post has some scripts to automate these processes which might be useful.

in theory the following command should work fuseki-server --tdb2 --loc=DB --update /test_DB but it's giving an error about the db is TBD2 and I'm not using the right version of the server so I should use the --tbd2 flag but I am maybe a mac OS issue?

downloads

To install on my computer followed the directions in this post it worked when I go to http://localhost:8080/

apache-tomcat-8.5.72/bin$ ./startup.sh

Clone this wiki locally