Skip to content

Example: linking animal surveillance from fludb.org

Timothy Lebo edited this page Feb 14, 2012 · 46 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](/~https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

See Examples.

The website fludb.org offers data about animal surveillance via a search user interface. "Animal surveillance" involves some person going out, finding a bird, and swabbing it at "both ends" to take a biological sample (i.e., some "goop") for further analysis.

Although the search user interface allows some filtering of data stored in an Oracle RDBMS, it does not allow for programmatic access. The site does offer search results as a tab-delimited file that a human can manually download, but the references within this "export" are limited in scope to their own dataset and isolated from others. This prevents more elaborate search and analysis of the data provided.

One of the challenges of working with their data in its original form is the steep learning curve for some of the references and identifiers given. Without domain expertise or familiarity with the data, a curious novice is hard pressed to find it useful.

Although the site maintainers are not focusing on developing a web API, we can help them by doing one better and exposing their data on the semantic web as RDF so that others may access it however they wish (in its original tab-delimited form, as an RDF dump file, via query to a SPARQL endpoint, or as dereferenceable linked data).

Besides the typical restructuring (normalization) of the tabular data, casting values to datatypes, and promoting values to URIs, the conversion:links_via enhancement is used to assert owl:sameAs links from the states named within the animal surveillance dataset to states named in DBPedia, GeoNames, and GovTrack. Because other datasets reference these same popular URIs, the RDF version of the animal surveillance dataset now connects to a wealth of other datasets.

Aggregating subsets of converted datasets shows how to query for other datasets containing geonames:parentFeature.

http://logd.tw.rpi.edu/source/fludb-org/dataset/animal-surveillance/version/2011-Jan-28 identifies 675 surveillance events whose State/Province do not match the location field in the Strain Name.

More than 235,000 geonames:parentFeatures relations within the existing datasets:

prefix geonames:   <http://www.geonames.org/ontology#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>

select count(*)
where {
  graph ?dataset {
    ?some geonames:parentFeature ?link
  }
}

The following query shows 61 datasets that reference the states within the animal surveillance dataset:

PREFIX owl:        <http://www.w3.org/2002/07/owl#>
PREFIX dct:        <http://purl.org/dc/terms/>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT distinct ?dataset ?link
WHERE {
  GRAPH <http://logd.tw.rpi.edu/source/fludb-org/dataset/animal-surveillance/version/2010-Nov-30>  {
    ?region owl:sameAs ?link .
  }
  GRAPH ?dataset  {
    ?region2 owl:sameAs ?link .
  }
  FILTER(?region != ?region2)
}

Determining 6 unique geographic parent features through 10 datasets:

PREFIX geonames:   <http://www.geonames.org/ontology#>
PREFIX owl:        <http://www.w3.org/2002/07/owl#>
PREFIX dct:        <http://purl.org/dc/terms/>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT distinct(?parent)
WHERE {
  GRAPH <http://logd.tw.rpi.edu/source/fludb-org/dataset/animal-surveillance/version/2010-Nov-30>  {
    ?region owl:sameAs ?link .
  }
  GRAPH ?dataset  {
    ?region2 owl:sameAs ?link;
             geonames:parentFeature ?parent .
  }
  FILTER(?region != ?region2)
}

The geonnames children features of everything in fludb (results):

PREFIX owl:      <http://www.w3.org/2002/07/owl#>
PREFIX geonames: <http://www.geonames.org/ontology#>

SELECT distinct(?child) 
WHERE { 
  GRAPH <http://logd.tw.rpi.edu/source/fludb-org/dataset/animal-surveillance/version/2010-Nov-30> {
    ?region owl:sameAs ?link . 
  }
  GRAPH ?dataset {
    ?child geonames:parentFeature [ owl:sameAs ?link ]
  }
}

See frbr: AMIA 148 T2011 Lebo FLU.

Clone this wiki locally