Repeating the experiments

The code for our leapfrog implementation for Apache Jena is available here

Repeating the experiments

Prerequisites

Java 11
- apt install openjdk-11-jdk
any x64 linux distribution with glib support
Docker
bzip2
- On a debian-based distro: sudo apt install bzip2
Some of the following steps can take hours to complete, so we recommend using tmux to execute them.

Getting the repo and the dataset

Clone this repository. if you use ssh keys, use:

git clone git@gitlab.com:learnedrdf/benchmark.git

otherwise

git clone https://gitlab.com/learnedrdf/benchmark.git

Download the dataset used and move it to the benchmark folder
Or you can ask administrators to get access to full English-only Wikidata dataset.
Extract it
```
bzip2 -d wikidata-filtered.nt.bz2
```
Or you can construct the dataset from the latest truthy wikidata dump

Create the database for Jena, Jenaclone and Leapfrog

Pull the ubuntu image docker pull ubuntu
Download the files apache-jena-3.17.0.tar.gz from Apache Jena downloads page and move it into jena folder
Change directory into jena folder
Extract it tar -xf apache-jena-3.17.0.tar.gz

Open a containerized environment

docker run -v ${PWD}/../:/srv -i -t ubuntu /bin/bash

Change directory to srv/jena
Upgrade packages apt update
Install JDK 11 apt install openjdk-11-jdk -y
Install the Java runtime environment apt install default-jre -y
Create the database for Jena and Jenaclone apache-jena-3.17.0/bin/tdbloader2 --loc=db/jena ../wikidata-filtered.nt
Edit the file apache-jena-3.9.0/bin/tdbloader2index and go to the line following generate_index "$K3 $K1 $K2" "$DATA_TRIPLES" OSP

Add the following lines

generate_index "$K1 $K3 $K2" "$DATA_TRIPLES" SOP
generate_index "$K2 $K1 $K3" "$DATA_TRIPLES" PSO
generate_index "$K3 $K2 $K1" "$DATA_TRIPLES" OPS

Create the database for the leapfrog impementation apache-jena-3.17.0/bin/tdbloader2 --loc=db/leapfrog ../wikidata-filtered.nt

Adding Jenaclone

Build the Jenaclone project using the mvn install Maven lifecycle, as specified in the documentation of the Jenaclone repository
From the root directory of Jenaclone, copy the file jena-fuseki2/jena-fuseki-server/target/jena-fuseki-server-3.17.0 into benchmark/jena/jars of the benchmark root directory
Make sure the Jenaclone .jar file is named fuseki-jenaclone.jar

Run the benchmark

Change directory into the root directory of this repository
docker build -t jenaclone_benchmark .

Run the container

docker run \
    --rm  \
    -v ${PWD}/benchmark/:/benchmark \
    jenaclone_benchmark

Now the results are available in the folders queries/*/output

Results

You can find our results in the results folder. For each query pattern you will find a folder containing two files, one for the Original Apache Jena and for for our modification of Apache Jena. Each line of a file contains three values separated by a semicolon: queryNumber;numberOfResutls;executionTimeInNanoseconds

Analysis

After running the benchmark and the results can be found in benchmark/benchmark/queries/*/output, move them to the corresponding folder in results. Convert the benchmark results to .csv files

python3 to_csv.py

This script requiret matplotlib, which can be install with

pip3 install matplotlib

Now, to run the analysis tool, run the command

python3 analysis.py bgps optionals existence_check

You can specify youself the folder containing results you want an analysis of. For example the following command will only provide an analysis of the optional query benchmark results

python3 analysis.py optionals

To print the query benchmark running times along their query with intermediate result set sizes, run the command

python3 result_size_analysis.py <RDF_DATASET> <RESULT_FOLDER1_> {RESULT_FOLDER_N}

<RDF_DATASET> is the dataset file to compute intermediate result set sizes. Make sure this is the same dataset file used to load the Jena triplestore used in the benchmark results. Following arguments are benchmark result folders, just like when running the analysis.py tool. At least one folder is required.

Common Issues

Command tdbloader2 Loads TDB2 Data

Even though the command seems to load the data into TDB2 instead of TDB1, it is not true. Therefore, make sure not to add the --tdb2 flag when running the server, since this would make the server return empty result sets because there is no TDB2 data. According to the Apache Jena documentation, tdbloader translates into tdb1.xloader, where the prefix determines which version of TDB to use.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
benchmark		benchmark
figures/query-shapes		figures/query-shapes
query-generation		query-generation
results		results
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repeating the experiments

Prerequisites

Getting the repo and the dataset

Create the database for Jena, Jenaclone and Leapfrog

Adding Jenaclone

Run the benchmark

Results

Analysis

Common Issues

Command tdbloader2 Loads TDB2 Data

About

Releases

Packages

Contributors 2

Languages

dkw-aau/leapfrog-rdf-benchmark

Folders and files

Latest commit

History

Repository files navigation

Repeating the experiments

Prerequisites

Getting the repo and the dataset

Create the database for Jena, Jenaclone and Leapfrog

Adding Jenaclone

Run the benchmark

Results

Analysis

Common Issues

Command tdbloader2 Loads TDB2 Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages