-
Notifications
You must be signed in to change notification settings - Fork 36
vload
- http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoader - Virtuoso's tool
- Publishing conversion results with a Virtuoso triplestore describes how to configure csv2rdf4lod-automation to talk to a Virtuoso triple store. Installing Virtuoso and configuring it to work with csv2rdf4lod-automation is handled automatically by our follow-on project Prizms.
This page describe the scripts that help load RDF data into Virtuoso.
vload is a shell script that makes it a bit easier to load local RDF files into a Virtuoso triple store. It was originally created by Zhenning Shangguan for RPI's data.gov effort, but I've adopted it and added some whistles.
vload is part of the csv2rdf4lod-automation repository, so you get it with a [git clone](Installing csv2rdf4lod automation). After it is on your path, running it without arguments will show its usage:
usage: vload [--target] {rdf, ttl, nt, nq} <data_file> <graph_uri> [-v | --verbose]
Note that this script works independently of the rest of the csv2rdf4lod conversion process, so it can be used for any Virtuoso server and any RDF file (not just those produced by csv2rdf4lod-automation). However, if you are using csv2rdf4lod-automation, you never need to run vload on your own -- scripts in your conversion cockpits's publish/
directory are created for you to use, which give vload the right parameters and load the converted data into named graphs that are consistently organized. If you are using csv2rdf4lod-automation and are trying to load converted data into a Virtuoso server, see Conversion process phase: publish instead of trying to call vload yourself (i.e., you don't need the rest of this page).
Running the --target
flag will show you the underlying Virtuoso isql (or, isql-v) command that vload uses, along with the port and username it will use to connect to the virtuoso server. It also shows where it will store a log. The essential parts of how vload works are controlled by setting CSV2RDF4LOD environment variables, which are described in the following section.
vload --target
/opt/virtuoso
/opt/virtuoso/bin/isql 1111 dba
dba
/opt/csv2rdf4lod-automation/tmp/vload/input-files/load_2012-06-09T05_19_25-04-00_13450.log
The rest of the parameters should be self-explanatory:
-
rdf, ttl, nt, nq
is the format of the<data_file>
RDF file, -
<graph_uri>
is the graph name that the triples will be loaded into, and the -
-v
or--verbose
flag will show a bit more output (including a path to the log file, and the contents of the log).
vload changes its behavior when the following variables are changed, and does its best when these are not set:
-
CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME
is "/opt/virtuoso" by default. This is used to get to "bin/isql" if the following is not set. -
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH
defaults to $virtuoso_home/bin/isql. -
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT
defaults to 1111 -
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD
defaults to dba -
CSV2RDF4LOD_HOME
/tmp/vload/input-files is the directory for logs. -
CSV2RDF4LOD_CONVERT_DATA_ROOT
is used to avoid needless file copies if Virtuoso already has permissions for the directory that the loading RDF file is in. -
CSV2RDF4LOD_CONCURRENCY
is fed to Virtuoso when loading.
Script: pvload.sh pvload.sh wraps vload to include provenance of the RDF download and named graph load.
- Script: pvload.sh pvload loads an RDF file from the web into a named graph in Virtuoso, while recording the provenance of the file retrieval (pcurl) and the load into the graph itself.
- Named graphs that know where they came from