-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This project supports http://inference-web.org by providing version-controlled pointers to PML instances with some shell script automation to gather them for yourself. See the instance analysis effort for details. See Are my PML instances listed? to see if your PML gets slurped up in our crawl.
PML is accessible in a variety of forms, and to gather it all we need to account for each.
- Nested web directories
- Web directory of documents (Li/Cynthia/eScience/linked proofs)
Document
- Web directory of documents (Li/Cynthia/eScience/linked proofs)
- SPARQL query for list of documents (Tim's LOGD)
Document
- Subject and Predicate URIs
-
Document
: SPARQL DESCRIBE to get PML (Jim's granite)
-
- some input
-
Document
: SPARQL (UTEP)
-
http://inference-web.org/proofs/wino/ is both a nested and a web directory. So is http://inference-web.org/proofs/tonys.moto.stanford.edu/
wget will do this with it's --mirror option, but you have to tell it to ignore robots.txt, since inference-web.org explicitly prohibits web crawlers:
wget --mirror -e robots=off -A owl,rdf,ttl,nt --no-parent http://inference-web.org/proofs/tonys.moto.stanford.edu/
logd.rq
contains the query, logd.ep
names the endpoint to submit query.
(Some turmoil here. encoded in PML 'as the file' or 'separate from the file'?, RDF-type referencing for p:Information or PML-type referencing with p:hasURL or BOTH?)
Since PML is RDF, we can use VoID to describe them.
Using void:vocabulary
http://inference-web.org/2.0/pml if I don't know whether the instances reference PML-P, PML-J, etc.
manual
vs. automatic
: directly asserted vs. programmatically generated (just like in csv2rdf4lod). Note: the direction of flow is twisted differently from data aggregation use case. manual went to automatic because it was processing source into something, while here manual is going to source because the manual is being used to determine what to retrieve.
distinction: stuff I produced with what I already had vs. stuff I got from somewhere. Once you have it, you're just doing stuff with it. Getting it makes contact with the external and should be considered more carefully than just internal processing.
implementation requiring pcurl.sh
and md5.sh
and cache-queries.sh from csv2rdf4lod.
Requires wget
Initial DRAFT of workflow diagram:
bash-3.2$ cd plunk/instances/web-directories
bash-3.2$ ./2source.sh
wget --mirror -e robots=off -A owl,rdf,ttl,nt --no-parent http://inference-web.org/proofs/wino/
wget --mirror -e robots=off -A owl,rdf,ttl,nt --no-parent http://inference-web.org/proofs/tonys.moto.stanford.edu/
wget --mirror -e robots=off -A owl,rdf,ttl,nt --no-parent http://escience.rpi.edu/2010/mlso/PML/
wget --mirror -e robots=off -A owl,rdf,ttl,nt --no-parent http://www.rpi.edu/~michaj6/escience/pml/
(run with -w or --write to invoke mirroring)