Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disallow upper characters (/A-F/) in hex-encoded portion #34

Merged
merged 1 commit into from
Jun 7, 2017

Conversation

AkihiroSuda
Copy link
Member

algorithm.go Outdated
@@ -164,3 +173,20 @@ func (a Algorithm) FromBytes(p []byte) Digest {
func (a Algorithm) FromString(s string) Digest {
return a.FromBytes([]byte(s))
}

// ValidateEncoded validates the encoded portion string
func (a Algorithm) ValidateEncoded(encoded string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fine as Validate: it is implied that it is validating the encoded portion.

@stevvooe
Copy link
Contributor

A few nits.

@vbatts @dmcgowan PTAL

@@ -109,18 +109,11 @@ func (d Digest) Validate() error {
return ErrDigestInvalidFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp check is no longer needed, the second regexp will do all the needed validation and the algorithm lookup validates the algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp check is no longer needed…

I think it is. What if someone tries to make a digest with foobar? Or foobar:!!!? I'd rather get ErrDigestInvalidFormat for those than ErrDigestUnsupported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only talking about the regexp check, foobar would get caught for lacking :. And who knows, !!! could be a valid encoding for foobar. It is better for the algorithm to make that distinction since we are only talking here whether ErrDigestUnsupported or ErrDigestInvalidFormat is returned, both of which indicate invalid input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And who knows, !!! could be a valid encoding for foobar.

Dropping the digest regexp is going to make this package much more permissive than image spec (which has a number of charset restrictions). That's fine, go-digest is free to be more permissive, but it would be nice if there was a way to ask “is this a valid image-spec digest?”. “Does go-digest have any personal beef with this?” is a useful question, but potentially less useful for folks using go-digest to handle image-spec digests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting we drop the drop the regexp from the package. I am pointing out that running this regexp is redundant when the input is valid. If returning ErrDigestInvalidFormat for foobar:!!! is seen a more useful error than ErrDigestUnsupported, which I am not convinced that either is more actionable than the other, then only do this regexp match when the algorithm is not recognized to return the right error. Digests are frequently parsed and validated and we should optimize for reducing the number of operations for valid input, if additional checks are done on invalid data I am fine with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If returning ErrDigestInvalidFormat for foobar:!!! is seen a more useful error than ErrDigestUnsupported, which I am not convinced that either is more actionable than the other…

The distinction is that ErrDigestUnsupported is “hmm, I haven't heard of that one”, while ErrDigestInvalidFormat is “that is wrong! Nobody should accept it”. The latter you can reject in middleman software (e.g. registries) regardless of who the end consumers will eventually be.

… then only do this regexp match when the algorithm is not recognized to return the right error.

Yeah, I'm fine with that.

algorithm.go Outdated

// anchoredRegexps contains anchored regular expressions for hex-encoded digests.
// Note that /A-F/ disallowed.
anchoredRegexps = map[Algorithm]*regexp.Regexp{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably anchoredEncodedRegexps in case someone wants to anchor algorithm-identifier regexps, etc. in the future.

algorithm.go Outdated
func (a Algorithm) ValidateEncoded(encoded string) error {
r, ok := anchoredRegexps[a]
if !ok {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still do some validation in this case. How about:

encodedRegexp = regexp.MustCompile(`[a-zA-Z0-9=_-]+`)
encodedRegexpAnchored = regexp.MustCompile(`^` + encodedRegexp.String() + `$`)

up where you define the per-algo regexps, and then down here:

if !ok {
    r = encodedRegexpAnchored
}

You can stay DRY by updating digest.go:

var DigestRegexp = regexp.MustCompile(`[a-z0-9]+(?:[.+_-][a-z0-9]+)*:` + encodedRegexp.String())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, that regexp has already been run from the top-level validate.

@AkihiroSuda
Copy link
Member Author

updated PR

func (a Algorithm) Validate(encoded string) error {
r, ok := anchoredEncodedRegexps[a]
if !ok {
return ErrDigestUnsupported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safer than before, but we can still do better (assuming go-digest is ok with the spec restrictions on the encoded charset). @stevvooe felt that had already been handled in “the top-level validate”, but Algorithm.Validate is a public method, so you can hit it without going through Digest.Validate. I expect we want to handle the following cases:

  • Algorithm is in anchoredEncodedRegexps, which you already handle correctly.
  • Algorithm is not in anchoredEncodedRegexps and:
    • The encoded part validates vs. the generic spec charset, in which case you should return ErrDigestUnsupported like you're doing here.
    • The encoded part does not validate vs. the generic spec charset, in which case we can return ErrDigestInvalidFormat even without recognizing the algorithm.

I'm not clear which case zero-length encoded parts fall into; they should be either ErrDigestInvalidFormat (which you've chosen below) or ErrDigestInvalidLength. Either way, the comment for this function should document which applies to that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price. Please stop trying to expand the scope of things.

This is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price.

But if your Algorithm is not listed in anchoredEncodedRegexps, we can still validate the encoded portion of the algorithm using the generic spec charset and distinguish between “I don't know about your algorithm; maybe the encoded part here is ok” and “I don't know about your algorithm but I know your encoded part is wrong”.

algorithm.go Outdated
if !ok {
return ErrDigestUnsupported
}
if encoded == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by length check.

digest.go Outdated

// validate i then run through regexp
if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) {
if i < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i+1 == len(s) should remain

digest.go Outdated
return ErrDigestInvalidFormat
}
algorithm, encoded := Algorithm(s[:i]), s[i+1:]
if algorithm == "" || encoded == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion was instead of doing this here, when the algorithm is not available, do !DigestRegexpAnchored.MatchString(s) and return ErrDigestInvalidFormat if it does not match instead of ErrDigestUnsupported

@AkihiroSuda
Copy link
Member Author

updated PR

digest.go Outdated

// validate i then run through regexp
if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) {
if i < 0 || i+1 == len(s) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first condition should be i <= 0 to catch the empty-algorithm-string case, which will let you drop the algorithm == "" block below. Unit tests for Digest(""), Digest(":") (which we already have), Digest(":foo"), and Digest("foo:") (we already have "sha256:") would ensure we'd handled all the empty-component cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
// Digests much always be hex-encoded, ensuring that their hex portion will
// always be size*2
if a.Size()*2 != len(encoded) {
Copy link
Contributor

@wking wking May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this block. For algorithms in anchoredEncodedRegexps we're already checking the length (via the anchored regexp). And I'm not sure what would happen here if the user failed to import crypto/sha512 or similar. Would Algorithm("sha512:abc…").Size() return zero and then erroneously trigger this condition? Better to drop the redundant check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also removed ErrDigestInvalidLength

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also removed ErrDigestInvalidLength

While I think we may want to keep the ErrDigestInvalidLength constant so old consumers still compile, I think we want to re-drop this block (unwinding part of your recent 902cd804ca1301. Using .Size() * 2 assumes a hex encoding, and I don't see why callers would need to distinguish between hash portions that are the wrong length (traditionally handled by ErrDigestInvalidLength) and hash portions that contain invalid characters (traditionally handled by ErrDigestInvalidFormat). Is there a reason that distinction is useful?

algorithm.go Outdated
SHA256 Algorithm = "sha256" // sha256 with hex encoding
SHA384 Algorithm = "sha384" // sha384 with hex encoding
SHA512 Algorithm = "sha512" // sha512 with hex encoding
SHA256 Algorithm = "sha256" // sha256 with hex encoding (without upper hex characters. i.e. /A-F/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just put (lower case only) on each, the terminology ditto probably isn't the best...

...it makes me think of this 😭

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 😅

digest.go Outdated
@@ -68,9 +68,6 @@ var (
// ErrDigestInvalidFormat returned when digest format invalid.
ErrDigestInvalidFormat = fmt.Errorf("invalid checksum digest format")

// ErrDigestInvalidLength returned when digest has invalid length.
ErrDigestInvalidLength = fmt.Errorf("invalid checksum digest length")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this (and just deprecate it) to satisfy any backwards-compat requirements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to remain. Are we no longer throwing this error?

digest.go Outdated
return ErrDigestInvalidFormat
}
algorithm, encoded := Algorithm(s[:i]), s[i+1:]
if algorithm == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
@AkihiroSuda
Copy link
Member Author

Revived ErrInvalidLength.
Sorry for delay

@dmcgowan
Copy link
Member

dmcgowan commented Jun 7, 2017

LGTM

Approved with PullApprove

1 similar comment
@stevvooe
Copy link
Contributor

stevvooe commented Jun 7, 2017

LGTM

Approved with PullApprove

@stevvooe stevvooe merged commit 279bed9 into opencontainers:master Jun 7, 2017
wking added a commit to wking/go-digest that referenced this pull request Jun 9, 2017
…rithms

In some cases (e.g. Algorithm("foo").Validate("!")) we know the
encoded portion is invalid even if we do not have an entry for foo in
anchoredEncodedRegexps.  Whatever algorithm-specific additional
restrictions are applied via anchoredEncodedRegexps, the encoded
portion must satisfy the generic encoded-portion restrictions from the
DigestRegexp.  This commit adjusts the Algorithm("foo").Validate("!")
result to match the current Digest("foo:!").Validate() result, and
factors the encoded portion of DigestRegexp out into encodedRegexp and
encodedRegexpAnchored.

It also guards the Size()-based length check with Available(), because
Size() returns zero for unavailable algorithms.  Now that we have a
check for algorithms that aren't in anchoredEncodedRegexps, we can't
rely on lookup-misses there to protect us from running the Size()
check on unavailable algorithms.  And equating anchoredEncodedRegexps
hits with Available was dicey anyway, since it depended on what
packages get loaded, etc.

Adding Algorithm.ValidateIdentifier() allows us to avoid doubling up
on the encoded check in Digest.Validate(), since both
DigestRegexpAnchored and algorithm.Validate() were covering the
encoded portion.  The new tests are based on digest_test.go's
TestParseDigest.

I've also dropped the Available check from Digest.Valid, because we
can tell:

  sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

is a valid digest even if Algorithm("sha256") happens to be
unavailable (e.g. because the consumer didn't import crypto/sha256).
Available-ness just affects whether you're capable of verifying
content against the digest, not digest-validity.

Running Algorithm.Validate before Algorithm.ValidateIdentifier uses
the presence of an identifier in anchoredEncodedRegexps as a hint to
skip the identifier validation.  This saves some time (ad Derek's
suggestion [1]) and is safe as long as we don't add any invalid
identifiers to that map.

[1]: opencontainers#34 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants