-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disallow upper characters (/A-F/) in hex-encoded portion #34
Conversation
algorithm.go
Outdated
@@ -164,3 +173,20 @@ func (a Algorithm) FromBytes(p []byte) Digest { | |||
func (a Algorithm) FromString(s string) Digest { | |||
return a.FromBytes([]byte(s)) | |||
} | |||
|
|||
// ValidateEncoded validates the encoded portion string | |||
func (a Algorithm) ValidateEncoded(encoded string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fine as Validate
: it is implied that it is validating the encoded portion.
@@ -109,18 +109,11 @@ func (d Digest) Validate() error { | |||
return ErrDigestInvalidFormat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regexp check is no longer needed, the second regexp will do all the needed validation and the algorithm lookup validates the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regexp check is no longer needed…
I think it is. What if someone tries to make a digest with foobar
? Or foobar:!!!
? I'd rather get ErrDigestInvalidFormat
for those than ErrDigestUnsupported
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only talking about the regexp check, foobar
would get caught for lacking :
. And who knows, !!!
could be a valid encoding for foobar
. It is better for the algorithm to make that distinction since we are only talking here whether ErrDigestUnsupported
or ErrDigestInvalidFormat
is returned, both of which indicate invalid input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And who knows,
!!!
could be a valid encoding forfoobar
.
Dropping the digest regexp is going to make this package much more permissive than image spec (which has a number of charset restrictions). That's fine, go-digest is free to be more permissive, but it would be nice if there was a way to ask “is this a valid image-spec digest?”. “Does go-digest have any personal beef with this?” is a useful question, but potentially less useful for folks using go-digest to handle image-spec digests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not suggesting we drop the drop the regexp from the package. I am pointing out that running this regexp is redundant when the input is valid. If returning ErrDigestInvalidFormat
for foobar:!!!
is seen a more useful error than ErrDigestUnsupported
, which I am not convinced that either is more actionable than the other, then only do this regexp match when the algorithm is not recognized to return the right error. Digests are frequently parsed and validated and we should optimize for reducing the number of operations for valid input, if additional checks are done on invalid data I am fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If returning
ErrDigestInvalidFormat
forfoobar:!!!
is seen a more useful error thanErrDigestUnsupported
, which I am not convinced that either is more actionable than the other…
The distinction is that ErrDigestUnsupported
is “hmm, I haven't heard of that one”, while ErrDigestInvalidFormat
is “that is wrong! Nobody should accept it”. The latter you can reject in middleman software (e.g. registries) regardless of who the end consumers will eventually be.
… then only do this regexp match when the algorithm is not recognized to return the right error.
Yeah, I'm fine with that.
algorithm.go
Outdated
|
||
// anchoredRegexps contains anchored regular expressions for hex-encoded digests. | ||
// Note that /A-F/ disallowed. | ||
anchoredRegexps = map[Algorithm]*regexp.Regexp{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably anchoredEncodedRegexps
in case someone wants to anchor algorithm-identifier regexps, etc. in the future.
algorithm.go
Outdated
func (a Algorithm) ValidateEncoded(encoded string) error { | ||
r, ok := anchoredRegexps[a] | ||
if !ok { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still do some validation in this case. How about:
encodedRegexp = regexp.MustCompile(`[a-zA-Z0-9=_-]+`)
encodedRegexpAnchored = regexp.MustCompile(`^` + encodedRegexp.String() + `$`)
up where you define the per-algo regexps, and then down here:
if !ok {
r = encodedRegexpAnchored
}
You can stay DRY by updating digest.go
:
var DigestRegexp = regexp.MustCompile(`[a-z0-9]+(?:[.+_-][a-z0-9]+)*:` + encodedRegexp.String())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context, that regexp has already been run from the top-level validate.
updated PR |
func (a Algorithm) Validate(encoded string) error { | ||
r, ok := anchoredEncodedRegexps[a] | ||
if !ok { | ||
return ErrDigestUnsupported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safer than before, but we can still do better (assuming go-digest is ok with the spec restrictions on the encoded charset). @stevvooe felt that had already been handled in “the top-level validate”, but Algorithm.Validate
is a public method, so you can hit it without going through Digest.Validate
. I expect we want to handle the following cases:
- Algorithm is in
anchoredEncodedRegexps
, which you already handle correctly. - Algorithm is not in
anchoredEncodedRegexps
and:- The encoded part validates vs. the generic spec charset, in which case you should return
ErrDigestUnsupported
like you're doing here. - The encoded part does not validate vs. the generic spec charset, in which case we can return
ErrDigestInvalidFormat
even without recognizing the algorithm.
- The encoded part validates vs. the generic spec charset, in which case you should return
I'm not clear which case zero-length encoded parts fall into; they should be either ErrDigestInvalidFormat
(which you've chosen below) or ErrDigestInvalidLength
. Either way, the comment for this function should document which applies to that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price. Please stop trying to expand the scope of things.
This is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price.
But if your Algorithm
is not listed in anchoredEncodedRegexps
, we can still validate the encoded portion of the algorithm using the generic spec charset and distinguish between “I don't know about your algorithm; maybe the encoded part here is ok” and “I don't know about your algorithm but I know your encoded part is wrong”.
algorithm.go
Outdated
if !ok { | ||
return ErrDigestUnsupported | ||
} | ||
if encoded == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered by length check.
digest.go
Outdated
|
||
// validate i then run through regexp | ||
if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) { | ||
if i < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i+1 == len(s)
should remain
digest.go
Outdated
return ErrDigestInvalidFormat | ||
} | ||
algorithm, encoded := Algorithm(s[:i]), s[i+1:] | ||
if algorithm == "" || encoded == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion was instead of doing this here, when the algorithm is not available, do !DigestRegexpAnchored.MatchString(s)
and return ErrDigestInvalidFormat
if it does not match instead of ErrDigestUnsupported
updated PR |
digest.go
Outdated
|
||
// validate i then run through regexp | ||
if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) { | ||
if i < 0 || i+1 == len(s) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first condition should be i <= 0
to catch the empty-algorithm-string case, which will let you drop the algorithm == ""
block below. Unit tests for Digest("")
, Digest(":")
(which we already have), Digest(":foo")
, and Digest("foo:")
(we already have "sha256:"
) would ensure we'd handled all the empty-component cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
// Digests much always be hex-encoded, ensuring that their hex portion will | ||
// always be size*2 | ||
if a.Size()*2 != len(encoded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop this block. For algorithms in anchoredEncodedRegexps
we're already checking the length (via the anchored regexp). And I'm not sure what would happen here if the user failed to import crypto/sha512
or similar. Would Algorithm("sha512:abc…").Size()
return zero and then erroneously trigger this condition? Better to drop the redundant check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, also removed ErrDigestInvalidLength
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, also removed ErrDigestInvalidLength
While I think we may want to keep the ErrDigestInvalidLength
constant so old consumers still compile, I think we want to re-drop this block (unwinding part of your recent 902cd80 → 4ca1301. Using .Size() * 2
assumes a hex encoding, and I don't see why callers would need to distinguish between hash portions that are the wrong length (traditionally handled by ErrDigestInvalidLength
) and hash portions that contain invalid characters (traditionally handled by ErrDigestInvalidFormat
). Is there a reason that distinction is useful?
algorithm.go
Outdated
SHA256 Algorithm = "sha256" // sha256 with hex encoding | ||
SHA384 Algorithm = "sha384" // sha384 with hex encoding | ||
SHA512 Algorithm = "sha512" // sha512 with hex encoding | ||
SHA256 Algorithm = "sha256" // sha256 with hex encoding (without upper hex characters. i.e. /A-F/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 😅
digest.go
Outdated
@@ -68,9 +68,6 @@ var ( | |||
// ErrDigestInvalidFormat returned when digest format invalid. | |||
ErrDigestInvalidFormat = fmt.Errorf("invalid checksum digest format") | |||
|
|||
// ErrDigestInvalidLength returned when digest has invalid length. | |||
ErrDigestInvalidLength = fmt.Errorf("invalid checksum digest length") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep this (and just deprecate it) to satisfy any backwards-compat requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to remain. Are we no longer throwing this error?
digest.go
Outdated
return ErrDigestInvalidFormat | ||
} | ||
algorithm, encoded := Algorithm(s[:i]), s[i+1:] | ||
if algorithm == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Revived ErrInvalidLength. |
1 similar comment
…rithms In some cases (e.g. Algorithm("foo").Validate("!")) we know the encoded portion is invalid even if we do not have an entry for foo in anchoredEncodedRegexps. Whatever algorithm-specific additional restrictions are applied via anchoredEncodedRegexps, the encoded portion must satisfy the generic encoded-portion restrictions from the DigestRegexp. This commit adjusts the Algorithm("foo").Validate("!") result to match the current Digest("foo:!").Validate() result, and factors the encoded portion of DigestRegexp out into encodedRegexp and encodedRegexpAnchored. It also guards the Size()-based length check with Available(), because Size() returns zero for unavailable algorithms. Now that we have a check for algorithms that aren't in anchoredEncodedRegexps, we can't rely on lookup-misses there to protect us from running the Size() check on unavailable algorithms. And equating anchoredEncodedRegexps hits with Available was dicey anyway, since it depended on what packages get loaded, etc. Adding Algorithm.ValidateIdentifier() allows us to avoid doubling up on the encoded check in Digest.Validate(), since both DigestRegexpAnchored and algorithm.Validate() were covering the encoded portion. The new tests are based on digest_test.go's TestParseDigest. I've also dropped the Available check from Digest.Valid, because we can tell: sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is a valid digest even if Algorithm("sha256") happens to be unavailable (e.g. because the consumer didn't import crypto/sha256). Available-ness just affects whether you're capable of verifying content against the digest, not digest-validity. Running Algorithm.Validate before Algorithm.ValidateIdentifier uses the presence of an identifier in anchoredEncodedRegexps as a hint to skip the identifier validation. This saves some time (ad Derek's suggestion [1]) and is safe as long as we don't add any invalid identifiers to that map. [1]: opencontainers#34 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>
For opencontainers/image-spec#667
Signed-off-by: Akihiro Suda suda.akihiro@lab.ntt.co.jp