vload

What is first

http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoader - Virtuoso's tool
Publishing conversion results with a Virtuoso triplestore describes how to configure csv2rdf4lod-automation to talk to a Virtuoso triple store. Installing Virtuoso and configuring it to work with csv2rdf4lod-automation is handled automatically by our follow-on project Prizms.

What we will cover

This page describe the scripts that help load RDF data into Virtuoso.

Let's get to it!

vload is a shell script that makes it a bit easier to load local RDF files into a Virtuoso triple store. It was originally created by Zhenning Shangguan for RPI's data.gov effort, but I've adopted it and added some whistles.

vload is part of the csv2rdf4lod-automation repository, so you get it with a [git clone](Installing csv2rdf4lod automation). After it is on your path, running it without arguments will show its usage:

usage: vload [--target] {rdf, ttl, nt, nq} <data_file> <graph_uri> [-v | --verbose]

Note that this script works independently of the rest of the csv2rdf4lod conversion process, so it can be used for any Virtuoso server and any RDF file (not just those produced by csv2rdf4lod-automation). However, if you are using csv2rdf4lod-automation, you never need to run vload on your own -- scripts in your conversion cockpits's publish/ directory are created for you to use, which give vload the right parameters and load the converted data into named graphs that are consistently organized. If you are using csv2rdf4lod-automation and are trying to load converted data into a Virtuoso server, see Conversion process phase: publish instead of trying to call vload yourself (i.e., you don't need the rest of this page).

Knowing where the load will go

Running the --target flag will show you the underlying Virtuoso isql (or, isql-v) command that vload uses, along with the port and username it will use to connect to the virtuoso server. It also shows where it will store a log. The essential parts of how vload works are controlled by setting CSV2RDF4LOD environment variables, which are described in the following section.

vload --target

/opt/virtuoso
/opt/virtuoso/bin/isql 1111 dba
dba
/opt/csv2rdf4lod-automation/tmp/vload/input-files/load_2012-06-09T05_19_25-04-00_13450.log

The rest of the parameters should be self-explanatory:

rdf, ttl, nt, nq is the format of the <data_file> RDF file,
<graph_uri> is the graph name that the triples will be loaded into, and the
-v or --verbose flag will show a bit more output (including a path to the log file, and the contents of the log).

CSV2RDF4LOD environment variables used

vload changes its behavior when the following variables are changed, and does its best when these are not set:

CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME is "/opt/virtuoso" by default. This is used to get to "bin/isql" if the following is not set.
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH defaults to $virtuoso_home/bin/isql.
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT defaults to 1111
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD defaults to dba
CSV2RDF4LOD_HOME/tmp/vload/input-files is the directory for logs.
CSV2RDF4LOD_CONVERT_DATA_ROOT is used to avoid needless file copies if Virtuoso already has permissions for the directory that the loading RDF file is in.
CSV2RDF4LOD_CONCURRENCY is fed to Virtuoso when loading.

pvload.sh

Script: pvload.sh pvload.sh wraps vload to include provenance of the RDF download and named graph load.

Provenance

What is next?

Script: pvload.sh pvload loads an RDF file from the web into a named graph in Virtuoso, while recording the provenance of the file retrieval (pcurl) and the load into the graph itself.
Named graphs that know where they came from

Provide feedback

Saved searches

Use saved searches to filter your results more quickly