Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce Namespaced UUID and check if landing page is present and correct-ish #124

Closed
johtoblan opened this issue Feb 6, 2023 · 15 comments
Closed
Labels

Comments

@johtoblan
Copy link
Collaborator

No description provided.

@johtoblan johtoblan added the Epic label Feb 6, 2023
@charlienegri
Copy link
Contributor

how is this different from #126? and is the "Enforce Namespaced UUID" still relevant after ab90485 ?

@johtoblan
Copy link
Collaborator Author

This is the epic, so just the collection of issues, but yeah the name is maybe misleading

@charlienegri
Copy link
Contributor

ah I see

@magnarem
Copy link
Contributor

We should have namespace uri prefix for the different environments. I have tested this with solr-drupal and it is working on the portals.

So the uri namespace and landingpage mapping should be as follows:

  • no.met.dev: <--> data-test.met.no <-> for dev environment
  • no.met.staging: <--> data-staging.met.no <-> for staging environment
  • no.met: <--> data.met.no <-> for production environment

this apply for both metadata_identifier and related_dataset

@charlienegri
Copy link
Contributor

so for having the right dataset landing page url in dmci the parameter added to the config is catalog_url, similar to csw_service_url (#146):

dmci:
  distributors:
    - file
    - pycsw
  distributor_cache: null
  rejected_jobs_path: null
  max_permitted_size: 100000
  mmd_xsl_path: null
  mmd_xsd_path: null
  path_to_parent_list: null

pycsw:
  csw_service_url: http://localhost

web_catalog:
  catalog_url: http://localhost

file:
  file_archive_path: null

for the name space we need something similar, like a enviroment_string such that in dmci the current namespace can be rewritten as {namespace}.{environment_string}... maybe a

environment: 
  environment_url

in the config... does this sound ok or does anyone see a better way?

@magnarem
Copy link
Contributor

MMD files that are sent to dmci now have no.met: in the metadata_identifier.
But the MMDs that are a part of a collection (children), do not have no.met: in the related_dataset relation="parent" element. So we have to make sure of this. Should dmci fix this if missing, or should it then reject the incoming mmd?

@charlienegri
Copy link
Contributor

@magnarem I think we can fix that in dmci but see my question in the chat to see if I got it right

@mortenwh
Copy link
Collaborator

It should be enough to update the ids of the parents with the namespace included. This can be done manually. But it means we also need to update all the MMD files that refer to the parents.

DMCI should reject all MMD files without namespaced ids, as far as I have understood.

@charlienegri
Copy link
Contributor

we just need to merge this now https://gitlab.met.no/tjenester/s-enda/-/merge_requests/2758/
and then we are done unless we want to do something about the related_dataset issue in this same epic

@charlienegri
Copy link
Contributor

this Epic, at least with the design choices made so far (i.e. doing most of the stuff in dmci), has been completed, as far as I can tell

@mortenwh
Copy link
Collaborator

MMD files that are sent to dmci now have no.met: in the metadata_identifier.
But the MMDs that are a part of a collection (children), do not have no.met: in the related_dataset relation="parent"
element. So we have to make sure of this. Should dmci fix this if missing, or should it then reject the incoming mmd?

@magnarem - is this required by solr? In principle, I think we should just naively use what we get. It shouldn't matter where the parent is present. In principle, it could also be on another "data center", couldn't it?

@magnarem
Copy link
Contributor

magnarem commented Apr 11, 2023

@mortenwh - it will be required by both solr and csw.

We have now defined that the metadata_identifier element in the MMD-file have identifiers on the form naming_authority:uuid. Since child-datasets have a related_dataset element witch have an identifier pointing to a parent, this id will also need to be on the form naming_authority:uuid, or else the children will point to a non-existing parent.

The parent-child relationship in csw is as far as I know are using standard database relations, so the identifiers will have to match.

For solr it is required for the search interface to know about parent-child relationships. In solr which is not a relational database, we have two flags that help solr figure out if a dataset is a parent, and then to know the children.

example: a parent dataset in solr will get a flag isParent=true. Then when searching if a dataset have isParent=true, then solr looks for all other datasets in the index which have a related_dataset id which points to the metadata_identifier of the given parent.

So the parent dataset does not know about its children, but the children knows which are the parent.

So if we have a parent dataset with metadata_identifier=no.met:1234abcd-aa33-ffaa-aaff-fffffffff, and then we get a new dataset with metadata_identfier=no.niva:uuuid, that will point to the parent, then this child need to have related_dataset=no.met:1234abcd-aa33-ffaa-aaff-fffffffff.

I have not come across parent/child relationships that comes from different "data centers" /"naming_authorities" that have a parent-child relationships across datacenters/naming_authotities in ADC at least, both internal and external datasets.

Maybe Lara or Øystein can answer that.

@mortenwh
Copy link
Collaborator

Ok. Then we need another issue under this epic. I'll create it now, then it would be good if you could review if I understand correctly :)

@johtoblan
Copy link
Collaborator Author

#152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants