This repository contains the dataset, source code, scripts, and results of experiments conducted on hybrid querying with Web-SPARQL.
- Query dataset covering a usage scenario about purchasing books.
- Source code for the tools and scripts used in the evaluation.
- Experimental results analyzing microdata availability and Web-SPARQL query performance.
This work aims to explore the feasibility and efficiency of hybrid querying over disparate data sources.
To execute the queries, follow these steps:
Ensure that the input files are in the correct directory before executing the scripts:
- Place all query files inside the
working_files/dataset/complex_queries
folder. - Ensure the
working_files/results
directory exists to store output results.
Modify the run_complex_evaluation.py
script to set execution parameters:
log_level
: Set tologging.INFO
orlogging.DEBUG
depending on verbosity needed.exec_mode
: Choose between'cold'
(fresh start) or'hot'
(reuse cached data).web_service
: Select from available web services (BFP_CWTS
,BFP_SWSS
,DDG_WSS
).web_extractor
: Choose an extraction method (BFP_CWTME
,W3_ME
).
Run the script using:
python run_complex_evaluation.py
This will process multiple queries from the dataset and store the results in working_files/results/
.
For evaluating simple queries, run the following script:
python run_simple_evaluation.py
This will process specific query files from working_files/dataset/simple_queries/
and generate corresponding result files in working_files/results/
.
Ensure that the result files are correctly placed:
- The output of the query execution should be in
working_files/results_complex/
for complex queries andworking_files/results_simple/
for simple queries. - The
working_files/stats
directory should exist to store analysis results.
Execute the analysis script:
python scripts/analyzer_main_script.py
This script will:
- Process and link JSON data using
MicrodataOfferLinker
andOfferFeatureLinker
. - Count and collect global data values.
- Compute query execution statistics.
- Analyze execution times for both simple and complex queries.
The analysis results will be stored in the working_files/stats/
directory:
complex_data_counts.json
: Query data count statistics for complex queries.simple_data_counts.json
: Query data count statistics for simple queries.complex_data_values.json
: Extracted global values for complex queries.simple_data_values.json
: Extracted global values for simple queries.complex_query_stats.json
: Query execution statistics for complex queries.simple_query_stats.json
: Query execution statistics for simple queries.complex_time_stats.json
: Execution time analysis for complex queries.simple_time_stats.json
: Execution time analysis for simple queries.
These files can be used to further analyze the efficiency and performance of hybrid queries over Web-SPARQL.