-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type specific validating formats (stringFormat, numberFormat) #1391
Comments
This is unnecessary API churn, and additionally churn which sends a poor message about the vocabulary system, in my opinion. |
Can you elaborate? I believe I addressed all of those points... specifically that the existing "format" keyword is not reliable enough for many uses. It doesn't deprecate any existing functionality, so I'm not sure how "churn" is a concern. |
This isn't correct, as I've mentioned previously when you brought this up, and Karen did recently as well. The only core formats are defined over strings, but we explicitly allow extra formats to be defined outside the spec, and for them to be defined on any primitive type. Folks almost certainly have done so. By churn I mean "you are proposing changing the way something is done in a way that might be clearer but isn't sufficiently better to overcome the cost of retraining". I don't know what more to elaborate on unfortunately, I simply don't see value in this kind of change personally. I think
I don't see any relationship FWIW -- if somehow people agree this is useful, of course if the spec defines keywords they're not unknown and wouldn't be annotation keywords. |
Right, I was just pointing out there are no core formats except string formats—and if others have defined custom non-string formats, this proposal doesn't affect those. And if you want to move to a typed format, you just have to pick the correct type.
Not quite, I believe what I'm proposing has no current equivalent, there is no way to require "format" to validate. It is allowed to be inconsistent and there's no way to tell if an annotation was intended, or if validation was intended. typed-format neatly solves this.
The benefit of the typed formats is they validate, and won't be ignored if the format is unknown, which is the current situation. But this depends on "unknown keywords prohibited" first being written in. |
The current way is you declare your schema to use a dialect which uses the format assertion vocabulary (rather than the annotation vocabulary which the default dialect uses). |
JSON Schema cannot validate the text format in which a value has been encoded because it operates on the JSON data model. The text has already been parsed into a number by the time JSON Schema gets it. The best we can do is detect of the number has a fractional part. This is akin to trying to get JSON Schema to validate that the input JSON had no line breaks or extra whitespace (minified). |
This isn't a blocker anymore. We've decided to do this. It's just not in the document yet. |
When it comes to "numberFormat" what I'm proposing may be a bit of a change to the data model... I don't think it's completely unreasonable, there are parsers that distinguish It's definitely not the same as whitespace, which is unambiguously not important. And if this actually is unpopular or a bad idea, then we can omit integer/float/exponential from "numberFormat" and only permit numberFormat to distinguish the mathematical (scalar) values.
The devil may be in the details... and what I mean is, this can't be written in until "prohibit unknown keywords" is written in. |
Unless all parsers/models do this, we can't require it. Supporting only the subset of parsers that result in a data model that makes this distinction is not interoperable.
And the tests we have for this support are optional for this reason.
I'm not opposed to these formats. I'm opposed to validating textual encoding when this is not within the capability of all parsers, many of which are built into the language/framework. We can't require the ability to differentiate between the text encodings Essentially, if we want {
"type": "number",
"format": "integer"
} that's fine, but it needs to pass validation for both
This does work as expected. We have tests that ensure this (and I had to make code changes to pass those tests). Formats are already typed in that they only respond to a particular type, like other validator keywords. However, to allow multiple formats, you do have to do the |
I'm not suggesting this should be required, the same way that supporting big numbers isn't required... but for the validators that do make a distinction between
Hm, I remember this now, I'm going to have to re-think this then. |
Yes you're right, though this is much extra work, I don't think I've seen this in the wild. You have to ship two schemas as separate documents, instead of one. It seems to be much more overhead than what most people are willing to accept. |
I don't know what you mean, it's not any work on behalf of the schema author once someone (one person, undoubtedly someone has already done this) publishes some dialect with As I say, I'm pretty -1 on this kind of idea personally, but you may find someone else who sympathizes obviously. (I'm ignoring all the discussion above on integer formats, I don't agree with some of the back and forth, but I don't think it's central to what you're proposing anyhow). |
Let's try an exercise, I have an object like I want "isbn" to annotate and I want "published" to validate against a RFC3339 date. How do I do that? If you can't do that without looking it up the right keywords, then I'd like to suggest it's too complicated. |
That's different than what you previously said, but also not an issue today, you use either:
or
Or an |
@Julian Neither of these are standard solutions that will work across validators, since you're talking about a $schema value that doesn't even exist yet, and has to be written. Or if the custom schema does work across validators, then presumably, you omitted what the contents of it because it's lengthy or difficult to write. Right? |
What I wrote will work in any implementation with support for vocabularies (and the format assertion vocabulary". Yes I didn't Google for who has previously written the trivial metaschema enabling assertion for format, as I say undoubtedly someone has done so and published it at a URI anyone can use. I also honestly think with all due respect that I've both spent enough time thinking about this idea and also have explained why I don't see value in it personally, so I'll probably bow out from the issue at this point. |
Chiming in at this point, I've speculated, and others have agreed it should be viable, that you can bundle the meta-schema with the schema. If the vocabularies are known (such as the format assertion vocabulary) and supported by the implementation, that should be enough. Do implementations support that today? I've not verified it. |
Is that really the best solution to this problem though? So far nobody has been able to provide a schema that would reliably across validators. If you can't post it here, nobody's going to understand it on Stack Overflow. Is the argument seriously that separate annotation and validation keywords is inferior? |
I think we also need to consider the reason we all hate Personally, I'd prefer a different solution that doesn't allow custom values, but the only thing I can think of is a bunch of (e.g.) Regarding the format-assertion meta-schema, we had a discussion somewhere about creating and publishing one, but we decided against doing so since it was trivial to make one. (Take the standard meta-schema and change "format-annotation" to "format-assertion"... and probably change the meta-schema |
So that I understand, this is the only (major) problem; you think having separate keywords for annotation format and validation format could work, if not for this? Afaik, using My solution would make this guarantee, by treating the keyword as unrecognized, when the format name is unrecognized. (More specifically: Only known format names would be in the range of valid values for the keyword.) |
The spec is very clear on what
and (for custom formats):
With those two requirements, it can be understood that if an implementation processes the schema, it supports validation of any formats present within it. |
Ok, this is essentially what I'm proposing for the typed/assertion "format" keywords. You said you'd prefer something like |
FWIW - I'm using a I'm pretty happy with the |
@karenetheridge This is a good insight, but I think that typed format keywords would still be an improvement, for situations where you may support multiple types, and you want to specify a different format for each, e.g. to specify an RFC3339 datetime, or a unix timestamp, you could write |
I don't see how that's any better than what we'd do today: format: date-time
type: [string, integer] Or even, if we deprecated type: [string, number]
anyOf:
- format: date-time
- format: integer or anyOf:
- type: string
format: date-time
- type: number
format: integer |
I would expect usage to be somewhat esoteric (as most validators use a parser that can't make that distinction), but according to JSON, it doesn't appear to be illegal to make that distinction in general.
Necessitating "anyOf" is what I'm trying to avoid. |
I agree with @karenetheridge. I don't think this is a significant improvement. You might be able to make a case for an array-form {
"type": [ "string", "number" ],
"format": [ "date-time", "integer" ]
} |
My assessment so far is that the The vocabulary system works, but it isn't well designed to be used by schema authors in their everyday work. It's better designed for organizations (like OpenAPI) to create a custom dialect that will be used as part of a domain specific system. So, I don't think relying on the vocabulary system for the everyday decision of whether or not While I'm not in favor of introducing type-specific format keywords, I would be in favor of introducing one new keyword that would allow users to use annotation-format and assertion-format in the same schema using the standard dialect. Personally, I think |
I would be open to creating a single new validation-dedicated "format" keyword, leaving I looked up synonyms for "format" and "form" to see what we could use, and there's not much. Maybe "model," but that's not quite right. I'm going to split off the |
As mentioned in #1520 (comment), I'd like to move forward with this by creating a proposal document for several |
The "format" keyword has historically changed functionality, it's gone back and forth from being a validation keyword, to annotation, back to a validation keyword if you specify (out-of-band) that it's a validation keyword. The fact this is specified out-of-band makes it impossible to determine the right way to "upgrade" the keyword between dialects of a schema.
This ambiguous behavior doesn't make a lot of sense, it seems to me there ought to be a keyword that's for annotation, and a keyword that's for validation.
When defining a validation "format" keyword, usually validation only happens when the instance is of one type or another (e.g. "minimum" doesn't do anything if the input is a boolean). If I'm using "format" as a validation keyword, and I want it to apply only to strings, I have to use more sophisticated logic. This won't work as expected:
Number inputs will always fail. Instead, I have to write:
But this is complicated. I should be able to do something like:
Then, these new type-specific formats could be validation keywords, leaving "format" to be an annotation-only keyword:
1.0
), "float", or "exponential" (e.g.1e5
)Some additional features:
The keyword would only be defined for values that the validator knows. That is, if an unknown value for the typed-format keywords was provided, it would fail the same way unknown keyword would.
A URI could be provided, to allow for one-off, user-defined formats that bypass standardization. For example, if I want to represent an ISO 8601 period, I could write down
{"stringFormat": "http://example.com/format/period"}
Formats could refer not just to standard syntaxes, but also references to outside validators, or nonstandard sets. e.g. I could write
{"numberFormat": "https://example.org/numberFormat/A000045"}
to refer to all numbers that are in the Fibonacci sequence.(as an idea) In the event a format is renamed, or a URI format is standardized, the typed-format keywords could accept a space-delimited list of format names or URI names; this would mean "all of these formats are the same, use any one that you understand." e.g. if the above period format gets standardized as "period", then you could write
{ "stringFormat": "period http://example.com/format/period" }
to indicate using either definition is OK, they're the same thing.Blockers: This depends on "unknown keywords prohibited" being a feature, otherwise these proposed keywords will just be annotation keywords.
Related: #1383, #1284
The text was updated successfully, but these errors were encountered: