Skip to content
Tim L edited this page Jul 31, 2014 · 21 revisions

What is first

What we will cover

This page describe the scripts that help load RDF data into Virtuoso.

Let's get to it!

vload is a shell script that makes it a bit easier to load local RDF files into a Virtuoso triple store. It was originally created by Zhenning Shangguan for RPI's data.gov effort, but I've adopted it and added some whistles.

vload is part of the csv2rdf4lod-automation repository, so you get it with a [git clone](Installing csv2rdf4lod automation). After it is on your path, running it without arguments will show its usage:

usage: vload [--target] {rdf, ttl, nt, nq} <data_file> <graph_uri> [-v | --verbose]

Note that this script works independently of the rest of the csv2rdf4lod conversion process, so it can be used for any Virtuoso server and any RDF file (not just those produced by csv2rdf4lod-automation). However, if you are using csv2rdf4lod-automation, you never need to run vload on your own -- scripts in your conversion cockpits's publish/ directory are created for you to use, which give vload the right parameters and load the converted data into named graphs that are consistently organized. If you are using csv2rdf4lod-automation and are trying to load converted data into a Virtuoso server, see Conversion process phase: publish instead of trying to call vload yourself (i.e., you don't need the rest of this page).

Knowing where the load will go

Running the --target flag will show you the underlying Virtuoso isql (or, isql-v) command that vload uses, along with the port and username it will use to connect to the virtuoso server. It also shows where it will store a log. The essential parts of how vload works are controlled by setting CSV2RDF4LOD environment variables, which are described in the following section.

vload --target

/opt/virtuoso
/opt/virtuoso/bin/isql 1111 dba
dba
/opt/csv2rdf4lod-automation/tmp/vload/input-files/load_2012-06-09T05_19_25-04-00_13450.log

The rest of the parameters should be self-explanatory:

  • rdf, ttl, nt, nq is the format of the <data_file> RDF file,
  • <graph_uri> is the graph name that the triples will be loaded into, and the
  • -v or --verbose flag will show a bit more output (including a path to the log file, and the contents of the log).

CSV2RDF4LOD environment variables used

vload changes its behavior when the following variables are changed, and does its best when these are not set:

  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME is "/opt/virtuoso" by default. This is used to get to "bin/isql" if the following is not set.
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH defaults to $virtuoso_home/bin/isql.
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT defaults to 1111
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD defaults to dba
  • CSV2RDF4LOD_HOME/tmp/vload/input-files is the directory for logs.
  • CSV2RDF4LOD_CONVERT_DATA_ROOT is used to avoid needless file copies if Virtuoso already has permissions for the directory that the loading RDF file is in.
  • CSV2RDF4LOD_CONCURRENCY is fed to Virtuoso when loading.

pvload.sh

Script: pvload.sh pvload.sh wraps vload to include provenance of the RDF download and named graph load.

Provenance

What is next?

Clone this wiki locally