Skip to content

Example: SEAD DataNet project

timrdf edited this page Nov 29, 2012 · 83 revisions

SEAD People: The latest RDF results will always be listed here (and the list of known issues are here):

Timeline

2012-11-28:

  • Tim implemented conversion:object_label_property
  • Tim modified enhancement parameters to handle the "Name:URI" string that medici needs; published dump file on logd ("metadata" dump file like at top of this page).

2012-10-07:

2012-09-26:

Jim, Ram, Robin, and Tim met at Winslow.

2012-09-16:

Jim responded to Tim's comments on google doc.

2012-09-14:

Tim responded to Jim's comments on google doc.

2012-09-12:

Ram set up a google doc to maintain a list of outstanding "RDF issues" that Tim and Robin will review and respond to.

2012-09-11:

Email from Jim re: tagging the top level collections with “nced_dataset”

2012-09-07:

Email from Jim re: VIVO person mapping with xlsx attachment and link to tag:medici@uiuc.edu,2009:data_Ar1RvuuqefqIVlS7um1XOA

  • Tim showed Robin and Ram how to set up version 2012-Sep-06
  • added link to people dataset dump file in the notes below in the section for "2012-Aug-27". These four dump files continue to be the latest RDF conversions of the SEAD data.

2012-Aug-28:

Putting it all together (against http://logd.tw.rpi.edu/sparql):

prefix dce:   <http://purl.org/dc/elements/1.1/>
prefix nfo:   <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
prefix droid: <http://logd.tw.rpi.edu/source/nced/dataset/droid-output/vocab/enhancement/1/> 
prefix roots: <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/nced-dataset-filepaths/vocab/enhancement/1/>

select ?dataset ?title ?root ?child
where {
  graph <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/metadata/version/2012-Aug-21-v4> {
     ?dataset dce:title ?title .
  }
  graph <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/nced-dataset-filepaths/version/2012-Aug-27> {
     ?dataset roots:has_file_root ?root .
  }
  graph <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/droid-output/version/2012-Aug-15-v2> {
                            ?root a nfo:Folder .
     ?child droid:parent_id ?root .
  }
} order by ?title

2012-Aug-27:

I can distribute as:

prefix dcat: <http://www.w3.org/ns/dcat#>
prefix dce: <http://purl.org/dc/elements/1.1/>

select ?title ?dataset
where {
  graph <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/metadata/version/2012-Aug-21-v4> {
    ?dataset 
       a dcat:Dataset; 
       dce:title ?title
    .
  }
} order by ?title
prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
prefix e1: <http://logd.tw.rpi.edu/source/nced/dataset/droid-output/vocab/enhancement/1/> 

select ?directory ?parent_path
where {
  graph <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/droid-output/version/2012-Aug-15-v2> {
     ?parent e1:file_path ?parent_path .

     ?directory
        a nfo:Folder;
        e1:file_path "/home/scratch/nced_repo/long_term_dynamics";
        e1:parent_id ?parent 
  }
}
prefix dce: <http://purl.org/dc/elements/1.1/>

select ?dataset ?title
where {
  graph <http://logd.tw.rpi.edu/source/nced-umn-edu/dataset/nced-dataset-filepaths/version/2012-Aug-27> {
     # TODO (URIs need to align)
  }
}
  • dump files (metadata, droid-output, nced-dataset-filepaths, people) hosted on LOGD
    • an update would be a new dump file URL
    • you can have it in 20 minutes.
    • you can load into medici/tupelo.
  • a file manually posted to medici.
    • an update would be a new "dataset" with a new UUID (and a dcterms:replaces annotation)
    • you can have it 30 seconds after I give me upload privs.
    • you can load into medici/tupelo.
  • a SVN url of the enhancement parameters
    • an update would be a "svn update"
    • would require a converter installed on your side.
    • you can load into medici/tupelo.
  • I'd rather not email this stuff.
metadata:dataset_5 rdfs:label "dataset_5" ;
   rdfs:label "Angelo Basic GIS Coverages" ;

done: needed to tie to:

droid-output:node_178012 rdfs:label "node_178012" ;
   dcterms:identifier "node_178012" ;
   coin:slug "node_178012" ;
   a droid-output_vocab:Node ;
   dcterms:identifier "178012" ;
   e1:parent_id droid-output:node_177805 ;
   e1:uri "file:/home/scratch/nced_repo/angelo_reserve/ACRR%20Basic%20GIS%20coverages/" ;
   e1:file_path "/home/scratch/nced_repo/angelo_reserve/ACRR Basic GIS coverages" ;
  • done: domain_template in metadata (metadata:dataset_5 -> metadata:angelo-basic-gis-coverages)
  • skipped: domain_template in droid (droid-output:node_178012 -> droid-output:nced_repo/angelo_reserve/ACRR Basic GIS coverages) (keeping node numbers, and using links_via to find them out)
  • skipped: domain_tempalte and range_template in nced-dataset-filepaths (both above) (used links_via to find the node id from the file path when converting nced-dataset-filepaths google spreadsheet with associate the dataset to the file path.)

http://www.nced.umn.edu/

2012-Aug-22:

  • Luigi: Regarding adding the triples, the easiest thing might be to use tupelo to add the triples that Tim is creating. The ingestion code is easy to understand and should have everything needed to write to the repository: https://opensource.ncsa.illinois.edu/svn/mmdb/trunk/medici-ingest/
  • Luigi: There is a uriqa based tupelo endpoint available, but it might be a lot more difficult to use (unless you use the java tupelo client but at that point you might as well write directly to mysql).
  • Jim suggests two-FRBR layers with triple:
tag:medici@uiuc.edu,2009:data_jqjmawlRELNygOBq7gYE6A (the collection ID)
     [FRBR:isembodiment/isrealization/isexemplar of]  (whatever level is best)
          tag:sead.ncsa.uiuc.edu,2008:/nced/AggregateDataSet/{dataset_name) (the logical NCED aggregate dataset)

2012-Aug-21:

  • Jim emailed notification that nced_repo_datasets_v4.xlsx is available at http://sead.ncsa.illinois.edu.
  • Downgraded priority for DROID dataset, focus is now exclusively on "nced_repo".
  • @waiting Jim to hear from NCSA to let us know when the IDs for the 20 data sets are assigned and are available.
  • created first cut of enhancement parameters in projects/sead/data/source/nced/metadata/version/2012-Aug-21-v4
  • @waiting to find out how to post to mendici
  • Played with the DROID output and got a first cut enhancement:
@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> .

droid-output:node_633883
   rdfs:label "node_633883" ;
   dcterms:identifier "node_633883" ;
   coin:slug "node_633883" ;
   dcterms:isReferencedBy <http://localhost/source/nced/dataset/droid-output/version/2012-Aug-15-v2> ;
   void:inDataset <http://localhost/source/nced/dataset/droid-output/version/2012-Aug-15-v2> ;
   a droid-output_vocab:Node ;
   dcterms:identifier "633883" ;
   e1:parent_id droid-output:node_633850 ;
   e1:uri "file:/home/scratch/nced_repo/XES_Archive/Consortium%20Meetings/consortium%202005/original%20materials/fidele/figs1-2.ppt" ;
   e1:file_path "/home/scratch/nced_repo/XES_Archive/Consortium Meetings/consortium 2005/original materials/fidele/figs1-2.ppt" ;
   nfo:fileName "figs1-2.ppt" ;
   e1:method "Extension" ;
   e1:status "Done" ;
   nfo:fileSize "22528"^^xsd:integer ;
   a nfo:Document ;
   e1:ext "ppt" ;
   dcterms:modified "2005-10-24T18:00:00-04:00"^^xsd:dateTime ;
   e1:extension_mismatch "false"^^xsd:boolean ;
   e1:format_count "4"^^xsd:integer ;
   dcterms:format <http://localhost/source/nced/dataset/droid-output/fmt/179> , 
                  <http://localhost/source/nced/dataset/droid-output/fmt/180> , 
                  <http://localhost/source/nced/dataset/droid-output/fmt/181> ;
   ov:csvRow "128452"^^xsd:integer .


<http://localhost/source/nced/dataset/droid-output/fmt/179> 
   dcterms:identifier "fmt/179" ;
   a droid-output_vocab:Format ;
   rdfs:label "fmt/179" ;
   dcterms:title "Microsoft PowerPoint for Macintosh" ;
   e1:version "4" .

<http://localhost/source/nced/dataset/droid-output/fmt/180> 
   dcterms:identifier "fmt/180" ;
   a droid-output_vocab:Format ;
   rdfs:label "fmt/180" ;
   dcterms:title "Microsoft PowerPoint for Macintosh" ;
   e1:version "98" .

<http://localhost/source/nced/dataset/droid-output/fmt/181>
   dcterms:identifier "fmt/181" ;
   a droid-output_vocab:Format ;
   rdfs:label "fmt/181" .

2012-Aug-15:

Jim emailed three Excel files:

  • nced_repo_datasets_v3.xlsx has data on first sheet (ignore the rest)
    • second sheet a list of draft terms we’ve been thinking of using
  • NCED Projects and Associated People..xlsx
    • map from people names to URIs – was going to have to be a sparql query but the folks at Indiana created the spreadsheet.
  • nced_droid_nozip_stats_v2.xlsx has data on first sheet (ignore the rest) (note: priority for this was downgraded)

2012-Aug-01:

Met with Jim, received printouts of two spreadsheets:

  • NCED example instance data (dataset descriptions: { name: "3D Maps", contact_name : "Karen Campbell" ...})
  • Draft mappings (e.g. "all_authors" : dcterms:creator )

Agreed to get access to http://sead.ncsa.illinois.edu to access:

  • NCED DataSet Metadata spreadsheet and the
  • IRBO DROID directory/file analysis spreadsheet

2012-Jun-22:

When emailed four files, we want to nail down the source, dataset, and version numbers according to Conversion process phase: name. We'll use the date that we received the data files by email for the version identifier.

  • complex_amoriflux.csv - ornl-gov / ameriflux / 2012-Jun-22

    • based on "ARM USDA UNL OSU Woodward Switchgrass 1 California USA"
    • based on "Data Policy -- The AmeriFlux data provided on this site "
    • based on "File Origin - This file was created at Oak Ridge National Laboratory by the AmeriFlux and FLUXNET data management groups."
    • based on "Questions about these standardized files should be addressed to Tom Boden (bodenta@ornl.gov)."
  • GHCN1.csv - ncdc-noaa-gov / global-historical-climant-network-ghcn / 2012-Jun-22

  • USGS_ODM.CSV - usgs-gov / odm / 2012-Jun-22

    • based on file name and google search.
  • USGS_raw.xlsx

Clone this wiki locally