Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metadata documentation #1602

Merged
merged 8 commits into from
Sep 14, 2023
Merged

Conversation

pabloarosado
Copy link
Contributor

Update metadata documentation in schemas.

@pabloarosado pabloarosado marked this pull request as ready for review September 13, 2023 16:27
Copy link
Member

@lucasrodes lucasrodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Pablo! LGTM in general, added minor comments.

"description": "Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset (follow the guidelines of the origin's `data_product_title`). NOTE: Dataset titles should be propagated automatically from snapshots. By default, the title of the dataset will be the title of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset title, it can be done by editing `dataset.title`.",
"requirement_level": "required (often automatic)"
"title": "Title of the ETL dataset",
"description": "Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset. NOTE: Dataset titles should be propagated automatically from snapshots. By default, the title of the dataset will be the title of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset title, it can be done by editing `dataset.title`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to manually edit the dataset title, it can be done by editing dataset.title.

I find it strange to refer to the field itself with the explicit name. I'd go for

to manually edit the dataset title, it can be done by editing this field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the description seems long enough that it would benefit from some line breaks?

Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset.\n\nNote: Dataset titles should be propagated automatically from snapshots. By default, the title of the dataset will be the title of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset title, it can be done by editing this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten it a bit.

"description": "Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the dataset (follow the guidelines of the origin's `description`). NOTE: Dataset descriptions should be propagated automatically from snapshots. By default, the description of the dataset will be the description of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset description, it can be done by editing `dataset.description`.",
"requirement_level": "recommended (often automatic)"
"title": "Description of the ETL dataset",
"description": "Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the content of the tables. NOTE: Dataset descriptions should be propagated automatically from snapshots. By default, the description of the dataset will be the description of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset description, it can be done by editing `dataset.description`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem comments as in dataset.title. Would probably add a line break and make it dataset.descriptionthis field.

"guidelines": [
["Must start with a capital letter."],
["Must end with a period."],
["Must not mention other metadata fields like `producer`.", {"type": "exceptions", "value": ["Other metadata fields are crucial in the description of the data product."]}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would phrase it like

"Must not mention other metadata fields (e.g. producer or ...)." (add another metadata field)

and then

"The other metadata fields are crucial in describing the data product."

"description": "Title of the table (mostly for internal purposes, or for users of our data catalog) which is a few words description of the table (follow the guidelines as origin's `title`). NOTE: Table titles should be propagated automatically from snapshots (from origin's `title`). But, if there are multiple tables, or if the user wants to manually edit the title, it can be done by editing `table.title`.",
"requirement_level": "required (often automatic)"
"title": "Title of the ETL table",
"description": "Title of the table (mostly for internal purposes, or for users of our data catalog) which is a few words description of the table. NOTE: Table titles should be propagated automatically from snapshots (from origin's `title`). But, if there are multiple tables, or if the user wants to manually edit the title, it can be done by editing `table.title`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as in dataset.title and dataset.description (e.g. breakline and referring to the field as this field).

},
"description": {
"type": "string",
"title": "Description of this ETL table",
"description": "Description of the table (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the table (follow the guidelines of origin's `description`). NOTE: Table descriptions should be propagated automatically from snapshots (from origin's `description`). But, if there are multiple tables, or if the user wants to manually edit the description, it can be done by editing `table.description`.",
"requirement_level": "recommended (often automatic)"
"description": "Description of the table (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the table. NOTE: Table descriptions should be propagated automatically from snapshots (from origin's `description`). But, if there are multiple tables, or if the user wants to manually edit the description, it can be done by editing `table.description`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as in tables.title (e.g. breakline and referring to the field as this field).

@@ -462,8 +490,16 @@
"properties": {
"title_public": {
"type": "string",
"description": "Indicator title to be shown in data pages (follow the guidelines of indicator's `title`). NOTE: This may be needed, for example, when the indicator comes from a big dataset where titles can't easily be curated manually. For those cases, `title_public` will override the indicator's `title`.",
"requirement_level": "optional"
"description": "Indicator title to be shown in data pages. NOTE: This may be needed, for example, when the indicator comes from a big dataset where titles can't easily be curated manually. For those cases, `title_public` will override the indicator's `title`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in dataset.title (linebreaks)

@@ -488,22 +524,29 @@
"attribution": {
"type": "string",
"title": "Indicator's attribution",
"description": "",
"description": "Citation of the indicator's origins, to be used when the automatic format `producer1 (year1); producer2 (year2)` needs to be overridden.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase it as

Citation of the indicator's origins. Use it only when you want to overwrite the automatic format producer1 (year1); producer2 (year2).

but not really important

@@ -269,43 +272,52 @@
"guidelines": [
["Must start with a capital letter."],
["Must end with a period."],
["Should describe the data product, not the snapshot (i.e. the subset of data we extract from the data product)."]
["Must not mention other metadata fields like `producer` or `version_producer`.", {"type": "exceptions", "value": ["Other metadata fields are crucial in the description of the data product."]}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would modify

"Other metadata fields are crucial in the description of the data product."

to

"These other metadata fields are crucial in the description of the data product."

Comment on lines 286 to 297
"examples": [
"Global Carbon Budget - Fossil fuels",
"Neutron star mergers"
],
"examples_bad": [
[
"Global Carbon Budget",
"Neutron star mergers (NASA, 2023)",
"Data on neutron star mergers",
"Neutron star mergers dataset"
]
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To correctly render the list of examples in the docs or wizard we should either:

  • not have any examples or examples_bad
  • just have examples
  • have the same number of examples and examples_bad.

Here I'd do sth like

"examples": [
  "Global Carbon Budget - Fossil fuels",
  "Neutron star mergers"
],
"examples_bad": [
  [
    "Global Carbon Budget",
  ],
  [
    "Neutron star mergers (NASA, 2023)",
    "Data on neutron star mergers",
    "Neutron star mergers dataset"
  ]
],

(see how it is done for attribution_short)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I may have other examples that don't fulfil this rule.

["Should only be used if the automatic attribution format `producer (year)`is incomplete. For example, when the dataset title is well known and should be cited along with the producer, or when the original dataset version should also be mentioned."],
["Should be constructed using placeholders, e.g. `{title}, {producer} ({year})`."]
["Should only be used if the automatic attribution format `producer (year)` is considered uninformative. For example, when the title of the data product is well known and should be cited along with the producer, or when the original version of the data product should also be mentioned."],
["Should follow the format `{producer} - {title} {version_producer} ({year})`."],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused with this one. Thought attribution was free, but from this guideline it seems that there is a specific style we should follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rephrased it. This is a "should" not a "must". My point here is that, if you are simply including the data product, then that should be the preferred way to do so. If you want to do something completely different, you are free.

@pabloarosado pabloarosado merged commit c7a8ae8 into master Sep 14, 2023
@pabloarosado pabloarosado deleted the update-metadata-documentation branch September 14, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants