Skip to content

Commit

Permalink
Add ckanext-dcat custom profiles
Browse files Browse the repository at this point in the history
- Add profiles for DCAT-AP, GeoDCAT-AP and NTI-RISP/DCAT (Spanish context).

- Added new codelists generator/downloader to improve DCAT-AP mapping values.
  • Loading branch information
mjanez committed Aug 22, 2024
1 parent 34aedfb commit d949571
Show file tree
Hide file tree
Showing 34 changed files with 4,160 additions and 12 deletions.
97 changes: 85 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,24 @@
<a href="#configuration">Configuration</a> •
<a href="#schemas">Schemas</a> •
<a href="#harvesters">Harvesters</a> •
<a href="#dcat-profiles">DCAT Profiles</a> •
<a href="#running-the-tests">Running the Tests</a>
</p>

## Overview
This CKAN extension provides functions and templates specifically designed to extend `ckanext-scheming` and includes DCAT and Harvest enhancements to adapt CKAN Schema to [GeoDCAT-AP](./ckanext/schemingdcat/schemas/geodcat_ap/es_geodcat_ap_2.yaml).
This CKAN extension provides functions and templates specifically designed to extend `ckanext-scheming` and `ckanext-dcat` and includes RDF profiles and Harvest enhancements to adapt CKAN Schema to multiple metadata profiles as: [GeoDCAT-AP](./ckanext/schemingdcat/schemas/geodcat_ap/eu_geodcat_ap_2.yaml) or [DCAT-AP](./ckanext/schemingdcat/schemas/dcat_ap/eu_dcat_ap_2.1.yaml).

> [!WARNING]
> Requires [mjanez/ckanext-dcat](/~https://github.com/mjanez/ckanext-dcat), [ckan/ckanext-scheming](/~https://github.com/ckan/ckanext-scheming) and [ckan/ckanext-spatial](/~https://github.com/ckan/ckanext-spatial) to work properly.
> Requires [mjanez/ckanext-dcat](/~https://github.com/mjanez/ckanext-dcat) (newer releases) or [ckan/ckanext-dcat](/~https://github.com/ckan/ckanext-dcat) (stables), [ckan/ckanext-scheming](/~https://github.com/ckan/ckanext-scheming) and [ckan/ckanext-spatial](/~https://github.com/ckan/ckanext-spatial) to work properly. Also, if you want to use custom schemas with multilingualism, it is necessary to use ckanext-fluent. There is a version with corrections: [mjanez/ckanext-fluent](/~https://github.com/mjanez/ckanext-fluent)
> [!TIP]
> It is **recommended to use with:** [`ckan-docker`](/~https://github.com/mjanez/ckan-docker) deployment or only use [`ckan-pycsw`](/~https://github.com/mjanez/ckan-pycsw) to deploy a CSW Catalog.
![image](/~https://github.com/mjanez/ckanext-schemingdcat/assets/96422458/6b3d6fd4-7119-4307-8be7-5e17d41292fe)

Enhancements:
- Could use schemas for `ckanext-scheming` in the plugin like [CKAN GeoDCAT-AP custom schemas](ckanext/schemingdcat/schemas#readme)
- Custom schemas for `ckanext-scheming` in the plugin like [CKAN GeoDCAT-AP custom schemas](ckanext/schemingdcat/schemas#readme)
- [`ckanext-dcat` profiles](#dcat-profiles) for RDF serialization according to profiles such as DCAT, DCAT-AP, GeoDCAT-AP and in the Spanish context, NTI-RISP.
- Improve metadata management forms to include tabs that make it easier to search metadata categories and simplify metadata editing.
- Improve the search functionality in CKAN for custom schemas. It uses the fields defined in a scheming file to provide a set of tools to use these fields for scheming, and a way to include icons in their labels when displaying them. More info: [`ckanext-schemingdcat`](/~https://github.com/mjanez/ckanext-schemingdcat)
- Add improved harvesters for custom metadata schemas integrated with `ckanext-harvest` in CKAN using [`mjanez/ckan-ogc`](/~https://github.com/mjanez/ckan-ogc).
- Add Metadata downloads for Linked Open Data formats ([`mjanez/ckanext-dcat`](/~https://github.com/mjanez/ckanext-dcat)) and Geospatial Metadata (ISO 19139, Dublin Core, etc. with [`mjanez/ckan-pycsw`](/~https://github.com/mjanez/ckanext-pycsw))
Expand All @@ -40,17 +42,20 @@ This plugin is compatible with CKAN 2.9 or later and needs the following plugins
## ckan/ckanext-scheming: /~https://github.com/ckan/ckanext-scheming/tags (e.g. release-3.0.0)
pip install -e git+/~https://github.com/ckan/ckanext-scheming.git@release-3.0.0#egg=ckanext-scheming

## mjanez/ckanext-dcat: /~https://github.com/mjanez/ckanext-dcat/tags (e.g. 1.2.0-geodcatap)
pip install -e git+/~https://github.com/mjanez/ckanext-dcat.git@1.2.0-geodcatap#egg=ckanext-dcat
## mjanez/ckanext-dcat: /~https://github.com/mjanez/ckanext-dcat/tags (e.g. 1.8.0)
pip install -e git+/~https://github.com/mjanez/ckanext-dcat.git@1.8.0#egg=ckanext-dcat
pip install -r https://raw.githubusercontent.com/mjanez/ckanext-dcat/master/requirements.txt

## ckan/ckckanext-spatial: /~https://github.com/ckan/ckanext-spatial/tags (e.g. v2.1.1)
## ckan/ckanext-spatial: /~https://github.com/ckan/ckanext-spatial/tags (e.g. v2.1.1)
pip install -e git++/~https://github.com/ckan/ckanext-spatial.git@v2.1.1/#egg=ckanext-spatial#egg=ckanext-spatial
pip install -r https://raw.githubusercontent.com/ckan/ckanext-spatial/v2.1.1/requirements.txt

## ckan/ckckanext-harvest: /~https://github.com/ckan/ckanext-harvest/tags (e.g. v1.5.6)
## ckan/ckanext-harvest: /~https://github.com/ckan/ckanext-harvest/tags (e.g. v1.5.6)
pip install -e git++/~https://github.com/ckan/ckanext-harvest.git@v1.5.6#egg=ckanext-spatial
pip install -r https://raw.githubusercontent.com/ckan/ckanext-harvest/v1.5.6/requirements.txt

## ckan/ckanext-fluent: /~https://github.com/mjanez/ckanext-fluen/tags (e.g. v1.0.1)
pip install -e git++/~https://github.com/mjanez/ckanext-fluent.git@v1.0.1#egg=ckanext-fluent
```

## Installation
Expand Down Expand Up @@ -119,13 +124,13 @@ Examples:

* LOD endpoint: A Linked Open Data endpoint is a DCAT endpoint that provides access to RDF data. More information about the catalogue endpoint, how to use the endpoint, (e.g. `https://{ckan-instance-host}/catalog.{format}?[page={page}]&[modified_since={date}]&[profiles={profile1},{profile2}]&[q={query}]&[fq={filter query}]`, and more at [`ckanext-dcat`](/~https://github.com/mjanez/ckanext-dcat?tab=readme-ov-file#catalog-endpoint)
```yaml
- name: euro_dcat_ap_2_rdf
- name: eu_dcat_ap_2_rdf
display_name: RDF DCAT-AP
type: lod
format: rdf
image_display_url: /images/icons/endpoints/euro_dcat_ap_2.svg
image_display_url: /images/icons/endpoints/eu_dcat_ap_2.svg
description: RDF DCAT-AP Endpoint for european data portals.
profile: euro_dcat_ap_2
profile: eu_dcat_ap_2
profile_label: DCAT-AP
version: null
```
Expand All @@ -138,7 +143,7 @@ Examples:
format: xml
image_display_url: /images/icons/endpoints/csw_inspire.svg
description: OGC-INSPIRE Endpoint for spatial metadata.
profile: spain_dcat
profile: es_dcat
profile_label: INSPIRE
version: 2.0.2
```
Expand Down Expand Up @@ -769,6 +774,74 @@ The `ckan schemingdcat` command offers utilites:

ckan schemingdcat download-rdf-eu-vocabs


## DCAT Profiles
This plugin also contains a custom [`ckanext-dcat` profiles](./ckanext/schemingdcat/profiles) to serialize a CKAN dataset to a:

**European context**:
* [DCAT-AP v2.1.1](https://semiceu.github.io/DCAT-AP/releases/2.1.1/) (default): `eu_dcat_ap_2`
* [GeoDCAT-AP v2.0.0](https://semiceu.github.io/GeoDCAT-AP/releases/2.0.0/): `eu_geodcat_ap_2`
* [GeoDCAT-AP v3.0.0](https://semiceu.github.io/GeoDCAT-AP/releases/3.0.0/): `eu_geodcat_ap_3`

**Spanish context**:
* Spain [NTI-RISP v1.0.0](https://datos.gob.es/es/documentacion/normativa-de-ambito-nacional): `es_dcat`
* Spain [DCAT-AP v2.1.1](https://semiceu.github.io/DCAT-AP/releases/2.1.1/): `es_dcat_ap_2`
* Spain [GeoDCAT-AP v2.0.0](https://semiceu.github.io/GeoDCAT-AP/releases/2.0.0/): `es_geodcat_ap_2`

To define which profiles to use you can:

1. Set the `ckanext.dcat.rdf.profiles` configuration option on your CKAN configuration file:

ckanext.dcat.rdf.profiles = eu_dcat_ap_2 es_dcat eu_geodcat_ap_2

2. When initializing a parser or serializer class, pass the profiles to be used as a parameter, eg:

```python
parser = RDFParser(profiles=['eu_dcat_ap_2', 'es_dcat', 'eu_geodcat_ap_2'])
serializer = RDFSerializer(profiles=['eu_dcat_ap_2', 'es_dcat', 'eu_geodcat_ap_2'])
```

Note that in both cases the order in which you define them is important, as it will be the one that the profiles will be run on.

### Multilingual RDF support
To add multilingual values from CKAN to RDF, the [`SchemingDCATRDFProfile` method `_object_value](./ckanext/schemingdcat/profiles/base.py)` can be called with optional parameter `multilang=true` (defaults to `false`)).
If `_object_value` is called with the `multilang=true`-parameter, but no language-attribute is found, the value will be added as Literal with the default language (en).

>[!TIP]
> The custom `ckanext-dcat` profiles have multi-language compatibility, see the ckanext-dcat documentation for more information on [writing custom profiles](/~https://github.com/ckan/ckanext-dcat?tab=readme-ov-file#writing-custom-profiles).

Example RDF:
```xml
<dct:title xml:lang="en">Dataset Title (EN)</dct:title>
<dct:title xml:lang="de">Dataset Title (DE)</dct:title>
<dct:title xml:lang="fr">Dataset Title (FR)</dct:title>
```
```json
{
"title":
{
"en": "Dataset Title (EN)",
"de": "Dataset Title (DE)",
"fr": "Dataset Title (FR)"
}
}
```

Example with missing language in RDF:
```xml
<dct:title>Dataset Title</dct:title>
```
```json
{
"title":
{
"en": "Dataset Title"
}
}
```

## Running the Tests
To run the tests:

Expand Down
176 changes: 176 additions & 0 deletions ckanext/schemingdcat/codelists.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
import csv
import requests
from datetime import datetime
from pathlib import Path
import os
import logging

# third-party libraries
from rdflib import Graph, Namespace, RDF, URIRef, Literal
from xml.etree import ElementTree as ET

from ckanext.dcat.profiles.base import (
RDF,
SKOS
)

from ckanext.schemingdcat.profiles.dcat_config import (
EU_VOCABS_DIR,
INSPIRE_CODELISTS_DIR,
EUROVOC
)

log = logging.getLogger(__name__)


def load_inspire_csv_codelists():
# Check if the codelists directory exists
csv_subdir = INSPIRE_CODELISTS_DIR.joinpath("csv")
if csv_subdir.exists() and csv_subdir.is_dir():
codelist_paths = list(csv_subdir.glob("*.csv"))
else:
codelist_paths = list(INSPIRE_CODELISTS_DIR.glob("*.csv"))

codelists_dfs = {}

log.debug('INSPIRE_CODELISTS_DIR: %s', INSPIRE_CODELISTS_DIR)

# Iterate over file paths and read in data
for file_path in codelist_paths:
with file_path.open("r") as f:
reader = csv.DictReader(f)
df = list(reader)
file_name = file_path.stem.lower()
codelists_dfs[file_name] = df

# INSPIRE Codelists
MD_INSPIRE_REGISTER = [item for df in codelists_dfs.values() for item in df]

return {
'MD_INSPIRE_REGISTER': MD_INSPIRE_REGISTER,
'MD_FORMAT': codelists_dfs.get('file-type'),
'MD_ES_THEMES': codelists_dfs.get('theme_es'),
'MD_EU_THEMES': codelists_dfs.get('theme-dcat_ap'),
'MD_EU_LANGUAGES': codelists_dfs.get('languages'),
'MD_ES_FORMATS': codelists_dfs.get('format_es')
}

class RdfFile:
def __init__(self, name, url, description, title):
self.name = name
self.url = url
self.title = title
self.description = description

def extract_description(self, rdf_content, rdf_url):
raise NotImplementedError

def parse_graph(self, rdf_content):
return Graph().parse(data=rdf_content, format='xml')

def get_label_from_uri(self, uri):
return uri.split('/')[-1]

def save_to_csv(self, data, filename):
file_path = EU_VOCABS_DIR / 'csv' / filename
# Remove any None elements from the data list
data = [d for d in data if d is not None]
sorted_data = sorted(data, key=lambda x: x[1]) # Sort by label (2nd column)
# Open the file in write mode, which will overwrite the file if it exists
with open(file_path, 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(sorted_data)
log.info(f"Data extracted and saved to {file_path}")

def download_rdf(self, rdf_url):
try:
response = requests.get(rdf_url)
response.raise_for_status()
log.info(f"Successfully downloaded RDF from {rdf_url}")
return response.content
except requests.RequestException as e:
log.error(f"Failed to download RDF from {rdf_url}: {e}")
return None

def save_to_rdf(self, data, filename):
graph = Graph()
for item in data:
uri = URIRef(item[0])
label = Literal(item[1])
graph.add((uri, RDF.type, SKOS.Concept))
graph.add((uri, SKOS.prefLabel, label))
if len(item) > 2:
eu_uri = URIRef(item[2])
graph.add((uri, SKOS.exactMatch, eu_uri))

file_path = f"{filename}.rdf"
graph.serialize(destination=file_path, format='xml')
log.info(f"Data saved to RDF file at {file_path}")

class BasicRdfFile(RdfFile):
def extract_description(self, rdf_content, rdf_url):
graph = self.parse_graph(rdf_content)
data = set()

for concept in graph.subjects():
uri = str(concept)
label = self.get_label_from_uri(uri)
if uri != rdf_url and label != self.get_label_from_uri(rdf_url):
data.add((uri, label))

return data

class LicenseRdfFile(RdfFile):
def extract_description(self, rdf_content, rdf_url):
graph = self.parse_graph(rdf_content)
data = set()

for concept in graph.subjects(RDF.type, SKOS.Concept):
label = self.get_label_from_uri(concept)
eu_uri = concept
uri = str(graph.value(concept, SKOS.exactMatch, default=eu_uri))
if concept != rdf_url and label != self.get_label_from_uri(rdf_url):
data.add((uri, label, eu_uri))

return data

class FileTypesRdfFile(RdfFile):
def extract_description(self, rdf_content, rdf_url):
graph = self.parse_graph(rdf_content)
data = set()
non_proprietary_data = set()
machine_readable_data = set()

for concept in graph.subjects(RDF.type, EUROVOC.FileType):
uri = str(concept)
label = self.get_label_from_uri(uri)
non_prop_ext = str(graph.value(concept, EUROVOC.nonPropExt, default="false"))

if uri != rdf_url and label != self.get_label_from_uri(rdf_url):
data.add((uri, label, non_prop_ext))
machine_readable_data.add((uri, label))
if non_prop_ext == "true":
non_proprietary_data.add((uri, label))

self.save_to_csv(non_proprietary_data, "non-propietary.csv")
self.save_to_csv(machine_readable_data, "machine-readable.csv")

return data

class MediaTypesRdfFile(RdfFile):
def extract_description(self, xml_content, rdf_url):
data = set()
tree = ET.ElementTree(ET.fromstring(xml_content))
root = tree.getroot()

for record in root.findall(".//{http://www.iana.org/assignments}record"):
name_elem = record.find("{http://www.iana.org/assignments}file")
name = name_elem.text if name_elem is not None else ""
label_elem = record.find("{http://www.iana.org/assignments}file")
label = label_elem.text if label_elem is not None else ""

if name != self.get_label_from_uri(rdf_url):
uri = f"http://www.iana.org/assignments/media-types/{name}"
data.add((uri, label))

return data
7 changes: 7 additions & 0 deletions ckanext/schemingdcat/codelists/inspire/csv/IACSData.es.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
id,label
http://inspire.ec.europa.eu/metadata-codelist/IACSData/lpis,lpis
http://inspire.ec.europa.eu/metadata-codelist/IACSData/gsaa,gsaa
http://inspire.ec.europa.eu/metadata-codelist/IACSData/iacs,iacs
http://inspire.ec.europa.eu/metadata-codelist/IACSData/referenceParcel,referenceParcel
http://inspire.ec.europa.eu/metadata-codelist/IACSData/agriculturalArea,agriculturalArea
http://inspire.ec.europa.eu/metadata-codelist/IACSData/ecologicalFocusArea,ecologicalFocusArea
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
id,label
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/continual,continual
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/daily,daily
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/weekly,weekly
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/fortnightly,fortnightly
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/monthly,monthly
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/quarterly,quarterly
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/biannually,biannually
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/annually,annually
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/asNeeded,asNeeded
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/irregular,irregular
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/notPlanned,notPlanned
http://inspire.ec.europa.eu/metadata-codelist/MaintenanceFrequency/unknown,unknown
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
id,label
http://inspire.ec.europa.eu/metadata-codelist/OnLineDescriptionCode/accessPoint,accessPoint
http://inspire.ec.europa.eu/metadata-codelist/OnLineDescriptionCode/endPoint,endPoint
Loading

0 comments on commit d949571

Please sign in to comment.