-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update metadata documentation #1602
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Pablo! LGTM in general, added minor comments.
schemas/dataset-schema.json
Outdated
"description": "Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset (follow the guidelines of the origin's `data_product_title`). NOTE: Dataset titles should be propagated automatically from snapshots. By default, the title of the dataset will be the title of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset title, it can be done by editing `dataset.title`.", | ||
"requirement_level": "required (often automatic)" | ||
"title": "Title of the ETL dataset", | ||
"description": "Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset. NOTE: Dataset titles should be propagated automatically from snapshots. By default, the title of the dataset will be the title of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset title, it can be done by editing `dataset.title`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to manually edit the dataset title, it can be done by editing
dataset.title
.
I find it strange to refer to the field itself with the explicit name. I'd go for
to manually edit the dataset title, it can be done by editing this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the description seems long enough that it would benefit from some line breaks?
Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset.\n\nNote: Dataset titles should be propagated automatically from snapshots. By default, the title of the dataset will be the title of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset title, it can be done by editing this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten it a bit.
schemas/dataset-schema.json
Outdated
"description": "Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the dataset (follow the guidelines of the origin's `description`). NOTE: Dataset descriptions should be propagated automatically from snapshots. By default, the description of the dataset will be the description of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset description, it can be done by editing `dataset.description`.", | ||
"requirement_level": "recommended (often automatic)" | ||
"title": "Description of the ETL dataset", | ||
"description": "Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the content of the tables. NOTE: Dataset descriptions should be propagated automatically from snapshots. By default, the description of the dataset will be the description of the containing table. But, if there are multiple tables, or if the user wants to manually edit the dataset description, it can be done by editing `dataset.description`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem comments as in dataset.title
. Would probably add a line break and make it dataset.description
→ this field.
schemas/dataset-schema.json
Outdated
"guidelines": [ | ||
["Must start with a capital letter."], | ||
["Must end with a period."], | ||
["Must not mention other metadata fields like `producer`.", {"type": "exceptions", "value": ["Other metadata fields are crucial in the description of the data product."]}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would phrase it like
"Must not mention other metadata fields (e.g. producer
or ...)." (add another metadata field)
and then
"The other metadata fields are crucial in describing the data product."
schemas/dataset-schema.json
Outdated
"description": "Title of the table (mostly for internal purposes, or for users of our data catalog) which is a few words description of the table (follow the guidelines as origin's `title`). NOTE: Table titles should be propagated automatically from snapshots (from origin's `title`). But, if there are multiple tables, or if the user wants to manually edit the title, it can be done by editing `table.title`.", | ||
"requirement_level": "required (often automatic)" | ||
"title": "Title of the ETL table", | ||
"description": "Title of the table (mostly for internal purposes, or for users of our data catalog) which is a few words description of the table. NOTE: Table titles should be propagated automatically from snapshots (from origin's `title`). But, if there are multiple tables, or if the user wants to manually edit the title, it can be done by editing `table.title`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as in dataset.title
and dataset.description
(e.g. breakline and referring to the field as this field).
schemas/dataset-schema.json
Outdated
}, | ||
"description": { | ||
"type": "string", | ||
"title": "Description of this ETL table", | ||
"description": "Description of the table (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the table (follow the guidelines of origin's `description`). NOTE: Table descriptions should be propagated automatically from snapshots (from origin's `description`). But, if there are multiple tables, or if the user wants to manually edit the description, it can be done by editing `table.description`.", | ||
"requirement_level": "recommended (often automatic)" | ||
"description": "Description of the table (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the table. NOTE: Table descriptions should be propagated automatically from snapshots (from origin's `description`). But, if there are multiple tables, or if the user wants to manually edit the description, it can be done by editing `table.description`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as in tables.title
(e.g. breakline and referring to the field as this field).
schemas/dataset-schema.json
Outdated
@@ -462,8 +490,16 @@ | |||
"properties": { | |||
"title_public": { | |||
"type": "string", | |||
"description": "Indicator title to be shown in data pages (follow the guidelines of indicator's `title`). NOTE: This may be needed, for example, when the indicator comes from a big dataset where titles can't easily be curated manually. For those cases, `title_public` will override the indicator's `title`.", | |||
"requirement_level": "optional" | |||
"description": "Indicator title to be shown in data pages. NOTE: This may be needed, for example, when the indicator comes from a big dataset where titles can't easily be curated manually. For those cases, `title_public` will override the indicator's `title`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in dataset.title
(linebreaks)
schemas/dataset-schema.json
Outdated
@@ -488,22 +524,29 @@ | |||
"attribution": { | |||
"type": "string", | |||
"title": "Indicator's attribution", | |||
"description": "", | |||
"description": "Citation of the indicator's origins, to be used when the automatic format `producer1 (year1); producer2 (year2)` needs to be overridden.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rephrase it as
Citation of the indicator's origins. Use it only when you want to overwrite the automatic format
producer1 (year1); producer2 (year2)
.
but not really important
schemas/definitions.json
Outdated
@@ -269,43 +272,52 @@ | |||
"guidelines": [ | |||
["Must start with a capital letter."], | |||
["Must end with a period."], | |||
["Should describe the data product, not the snapshot (i.e. the subset of data we extract from the data product)."] | |||
["Must not mention other metadata fields like `producer` or `version_producer`.", {"type": "exceptions", "value": ["Other metadata fields are crucial in the description of the data product."]}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would modify
"Other metadata fields are crucial in the description of the data product."
to
"These other metadata fields are crucial in the description of the data product."
schemas/definitions.json
Outdated
"examples": [ | ||
"Global Carbon Budget - Fossil fuels", | ||
"Neutron star mergers" | ||
], | ||
"examples_bad": [ | ||
[ | ||
"Global Carbon Budget", | ||
"Neutron star mergers (NASA, 2023)", | ||
"Data on neutron star mergers", | ||
"Neutron star mergers dataset" | ||
] | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To correctly render the list of examples in the docs or wizard we should either:
- not have any
examples
orexamples_bad
- just have
examples
- have the same number of
examples
andexamples_bad
.
Here I'd do sth like
"examples": [
"Global Carbon Budget - Fossil fuels",
"Neutron star mergers"
],
"examples_bad": [
[
"Global Carbon Budget",
],
[
"Neutron star mergers (NASA, 2023)",
"Data on neutron star mergers",
"Neutron star mergers dataset"
]
],
(see how it is done for attribution_short
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I may have other examples that don't fulfil this rule.
schemas/definitions.json
Outdated
["Should only be used if the automatic attribution format `producer (year)`is incomplete. For example, when the dataset title is well known and should be cited along with the producer, or when the original dataset version should also be mentioned."], | ||
["Should be constructed using placeholders, e.g. `{title}, {producer} ({year})`."] | ||
["Should only be used if the automatic attribution format `producer (year)` is considered uninformative. For example, when the title of the data product is well known and should be cited along with the producer, or when the original version of the data product should also be mentioned."], | ||
["Should follow the format `{producer} - {title} {version_producer} ({year})`."], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused with this one. Thought attribution
was free, but from this guideline it seems that there is a specific style we should follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rephrased it. This is a "should" not a "must". My point here is that, if you are simply including the data product, then that should be the preferred way to do so. If you want to do something completely different, you are free.
Update metadata documentation in schemas.