Skip to content

Commit

Permalink
Merge pull request #147 from luigi-asprino/master
Browse files Browse the repository at this point in the history
Upload FBDA query generator and executor
  • Loading branch information
dachafra authored Jul 22, 2024
2 parents 3b6c8db + b451bc1 commit f368e31
Show file tree
Hide file tree
Showing 42 changed files with 1,631 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ Additionally to the generator engine, that provides the data at desirable scales

Our experiences testing (virtual) knowledge graph engines have revealed the difficulties for setting up an infrastructure where many variables and resources are involved: databases, raw data, mappings, queries, data paths, mapping paths, databases connections, etc. For that reason, and in order to facilitate the use of the benchmark to any developer or practitioner, we provide a set of [utils](https://github.com/oeg-upm/gtfs-bench/tree/master/utils) such as docker-compose templates or evaluation bash scripts that, in our opinion, can reduce the time for preparing the testing set up.

Moreover, the utils folder contains a series of scripts for evaluating Façade-based data access engines (e.g. [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything)) [more details](utils/fbda-bench/README.md).

## Desirable Metrics:

We highly recommend that (virutalizers or materializers) KG construction engines tested with this benchmark provide (at least) the following metris:
Expand Down
69 changes: 69 additions & 0 deletions utils/fbda-bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Façade-based Data Access Benchmark

This folder provides a benchmark derived from GTFS-Madrid-Bench for evaluating Façade-based Data Access (FBDA) engines, such as [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything).

The extension consists of:
- a *set of query templates* that translate the GTFS-Madrid-Bench's queries and RML mappings into FBDA queries;
- a *query executor* which fires the queries and measures the performance of the FBDA engines under four experimental regimes:
- In-memory execution over a complete materialised view (in-memory+complete);
- In-memory execution optimised by a triple-filtering approach (in-memory+triple-filtering);
- In-memory execution over a sliced materialised view and optimised by triple-filtering (sliced+triple-filtering);
- On-disk execution optimised by triple-filtering (on-disk+triple-filtering).

More details can be found in this [article](https://www.semantic-web-journal.net/content/materialisation-approaches-fa%C3%A7ade-based-data-access-sparql).


## Requirements for the use

To have locally installed Java 11 (or later versions).

## Using FBDA Benchmark

1. Generate data using GTFS-Madrid-Bench and move the result folder generated by GTFS within experiments folder. At the moment only csv, json and xml formats are allowed.

2. Generate FBDA queries for the scales passed to GTFS-Madrid-Bench (e.g. 1, 10, 100)

```
./generate_queries.sh "1 10 100" "TMP_FOLDER" "xml csv json"
```

where:
- `TMP_FOLDER` is the path to a temporary folder that will be used during the experiments
- "xml csv json" are the formats passed to GTFS-Madrid-Bench

3. Download the executable jar file of the FBDA engine to evaluate (e.g. [SPARQL Anything v0.9.0](https://github.com/SPARQL-Anything/sparql.anything/releases/download/0.9.0/sparql-anything-0.9.0.jar))

4. Run the the queries

```
./execute_queries.sh /path/to/fbda_engine.jar "1 10 100" "xml csv json" "/path/to/results" "TMP_FOLDER"
```

where:
- "1 10 100" are the scales passed to GTFS-Madrid-Bench
- "xml csv json" are the formats passed to GTFS-Madrid-Bench
- "/path/to/results" is the path to a folder where the results of the execution of the queries (i.e. measures) will be stored
- `TMP_FOLDER` is the path to a temporary folder that will be used during the experiments


## Analysing the results

The execution of the queries generates two TSV files for each query executed on a given format, namely `time_q<query_id>_<format>.tsv` and `mem_q<query_id>_<format>.tsv`.
These files trace the execution of the queries in terms of computational resources used by the engine (i.e. memory footprint, CPU and time).

The files are stored in the directory `/path/to/results` passed as argument of `execute_queries.sh`.

The `time_q<query_id>_<format>.tsv` file keeps track of the execution time of the queries on a experimenting format. The table has the following structure:

| Query | InputSize | Strategy | Slice | Ondisk | MemoryLimit | Run | Time | Unit | Status | STDErr |
|-------|-----------|----------|-------|--------|-------------|-----|------|------|--------|--------|
| | | | | | | | | | | |

The `mem_q<query_id>_<format>.tsv` file keeps track of the usage by the engine of the CPU and memory during the evaluation of the queries. The table has the following structure:

| Query | InputSize | Strategy | Slice | Ondisk | MemoryLimit | Run | PID | %cpu | %mem | vsz | rss |
|-------|-----------|----------|-------|--------|-------------|-----|-----|------|------|-----|-----|
| | | | | | | | | | | | |



64 changes: 64 additions & 0 deletions utils/fbda-bench/execute_queries.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash
#
# Copyright (c) 2024 SPARQL Anything Contributors @ http://github.com/sparql-anything
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

SPARQL_ANYTHING_JAR=$1
RESULTS_DIR=$(pwd)/$4
TMP_FOLDER=$5

if [ ! -d $RESULTS_DIR ]; then
mkdir $RESULTS_DIR
else
echo "$RESULTS_DIR already exists!"
fi

if [ ! -d $TMP_FOLDER ]; then
mkdir $TMP_FOLDER
else
echo "$TMP_FOLDER already exists! Cleaning it.."
rm -rf $TMP_FOLDER/*
fi

source functions.sh

if [ -n "$6" ]; then
QUERIES_TO_EXECUTE=$6
else
QUERIES_TO_EXECUTE="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18"
fi


for format in $3
do
for size in $2
do
for query in $QUERIES_TO_EXECUTE
do

#echo "Monitoring q$query strategy0 no_slice size $size $format"
#monitor-query $size "q$query" "strategy0" "no_slice" $format
#echo "Monitoring q$query strategy1 no_slice size $size $format"
#monitor-query $size "q$query" "strategy1" "no_slice" $format
#echo "Monitoring q$query strategy1 slice size $size $format"
#monitor-query $size "q$query" "strategy1" "slice" $format

# ON_DISK
echo "Monitoring q$query strategy1 no_slice size $size $format ondisk"
monitor-query $size "q$query" "strategy1" "no_slice" $format $TMP_FOLDER

done
done
done
Loading

0 comments on commit f368e31

Please sign in to comment.