Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disallow upper characters (/A-F/) in hex-encoded portion #34

Merged
merged 1 commit into from
Jun 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 29 additions & 3 deletions algorithm.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"fmt"
"hash"
"io"
"regexp"
)

// Algorithm identifies and implementation of a digester by an identifier.
Expand All @@ -28,9 +29,9 @@ type Algorithm string

// supported digest types
const (
SHA256 Algorithm = "sha256" // sha256 with hex encoding
SHA384 Algorithm = "sha384" // sha384 with hex encoding
SHA512 Algorithm = "sha512" // sha512 with hex encoding
SHA256 Algorithm = "sha256" // sha256 with hex encoding (lower case only)
SHA384 Algorithm = "sha384" // sha384 with hex encoding (lower case only)
SHA512 Algorithm = "sha512" // sha512 with hex encoding (lower case only)

// Canonical is the primary digest algorithm used with the distribution
// project. Other digests may be used but this one is the primary storage
Expand All @@ -50,6 +51,14 @@ var (
SHA384: crypto.SHA384,
SHA512: crypto.SHA512,
}

// anchoredEncodedRegexps contains anchored regular expressions for hex-encoded digests.
// Note that /A-F/ disallowed.
anchoredEncodedRegexps = map[Algorithm]*regexp.Regexp{
SHA256: regexp.MustCompile(`^[a-f0-9]{64}$`),
SHA384: regexp.MustCompile(`^[a-f0-9]{96}$`),
SHA512: regexp.MustCompile(`^[a-f0-9]{128}$`),
}
)

// Available returns true if the digest type is available for use. If this
Expand Down Expand Up @@ -164,3 +173,20 @@ func (a Algorithm) FromBytes(p []byte) Digest {
func (a Algorithm) FromString(s string) Digest {
return a.FromBytes([]byte(s))
}

// Validate validates the encoded portion string
func (a Algorithm) Validate(encoded string) error {
r, ok := anchoredEncodedRegexps[a]
if !ok {
return ErrDigestUnsupported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safer than before, but we can still do better (assuming go-digest is ok with the spec restrictions on the encoded charset). @stevvooe felt that had already been handled in “the top-level validate”, but Algorithm.Validate is a public method, so you can hit it without going through Digest.Validate. I expect we want to handle the following cases:

  • Algorithm is in anchoredEncodedRegexps, which you already handle correctly.
  • Algorithm is not in anchoredEncodedRegexps and:
    • The encoded part validates vs. the generic spec charset, in which case you should return ErrDigestUnsupported like you're doing here.
    • The encoded part does not validate vs. the generic spec charset, in which case we can return ErrDigestInvalidFormat even without recognizing the algorithm.

I'm not clear which case zero-length encoded parts fall into; they should be either ErrDigestInvalidFormat (which you've chosen below) or ErrDigestInvalidLength. Either way, the comment for this function should document which applies to that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price. Please stop trying to expand the scope of things.

This is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validates the character set for the encoded portion for the algorithm. Nothing less, nothing more. If you use a function incorrectly, you pay the price.

But if your Algorithm is not listed in anchoredEncodedRegexps, we can still validate the encoded portion of the algorithm using the generic spec charset and distinguish between “I don't know about your algorithm; maybe the encoded part here is ok” and “I don't know about your algorithm but I know your encoded part is wrong”.

}
// Digests much always be hex-encoded, ensuring that their hex portion will
// always be size*2
if a.Size()*2 != len(encoded) {
Copy link
Contributor

@wking wking May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this block. For algorithms in anchoredEncodedRegexps we're already checking the length (via the anchored regexp). And I'm not sure what would happen here if the user failed to import crypto/sha512 or similar. Would Algorithm("sha512:abc…").Size() return zero and then erroneously trigger this condition? Better to drop the redundant check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also removed ErrDigestInvalidLength

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also removed ErrDigestInvalidLength

While I think we may want to keep the ErrDigestInvalidLength constant so old consumers still compile, I think we want to re-drop this block (unwinding part of your recent 902cd804ca1301. Using .Size() * 2 assumes a hex encoding, and I don't see why callers would need to distinguish between hash portions that are the wrong length (traditionally handled by ErrDigestInvalidLength) and hash portions that contain invalid characters (traditionally handled by ErrDigestInvalidFormat). Is there a reason that distinction is useful?

return ErrDigestInvalidLength
}
if r.MatchString(encoded) {
return nil
}
return ErrDigestInvalidFormat
}
20 changes: 6 additions & 14 deletions digest.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,26 +101,18 @@ func FromString(s string) Digest {
// error if not.
func (d Digest) Validate() error {
s := string(d)

i := strings.Index(s, ":")

// validate i then run through regexp
if i < 0 || i+1 == len(s) || !DigestRegexpAnchored.MatchString(s) {
if i <= 0 || i+1 == len(s) {
return ErrDigestInvalidFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp check is no longer needed, the second regexp will do all the needed validation and the algorithm lookup validates the algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp check is no longer needed…

I think it is. What if someone tries to make a digest with foobar? Or foobar:!!!? I'd rather get ErrDigestInvalidFormat for those than ErrDigestUnsupported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only talking about the regexp check, foobar would get caught for lacking :. And who knows, !!! could be a valid encoding for foobar. It is better for the algorithm to make that distinction since we are only talking here whether ErrDigestUnsupported or ErrDigestInvalidFormat is returned, both of which indicate invalid input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And who knows, !!! could be a valid encoding for foobar.

Dropping the digest regexp is going to make this package much more permissive than image spec (which has a number of charset restrictions). That's fine, go-digest is free to be more permissive, but it would be nice if there was a way to ask “is this a valid image-spec digest?”. “Does go-digest have any personal beef with this?” is a useful question, but potentially less useful for folks using go-digest to handle image-spec digests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting we drop the drop the regexp from the package. I am pointing out that running this regexp is redundant when the input is valid. If returning ErrDigestInvalidFormat for foobar:!!! is seen a more useful error than ErrDigestUnsupported, which I am not convinced that either is more actionable than the other, then only do this regexp match when the algorithm is not recognized to return the right error. Digests are frequently parsed and validated and we should optimize for reducing the number of operations for valid input, if additional checks are done on invalid data I am fine with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If returning ErrDigestInvalidFormat for foobar:!!! is seen a more useful error than ErrDigestUnsupported, which I am not convinced that either is more actionable than the other…

The distinction is that ErrDigestUnsupported is “hmm, I haven't heard of that one”, while ErrDigestInvalidFormat is “that is wrong! Nobody should accept it”. The latter you can reject in middleman software (e.g. registries) regardless of who the end consumers will eventually be.

… then only do this regexp match when the algorithm is not recognized to return the right error.

Yeah, I'm fine with that.

}

algorithm := Algorithm(s[:i])
algorithm, encoded := Algorithm(s[:i]), s[i+1:]
if !algorithm.Available() {
if !DigestRegexpAnchored.MatchString(s) {
return ErrDigestInvalidFormat
}
return ErrDigestUnsupported
}

// Digests much always be hex-encoded, ensuring that their hex portion will
// always be size*2
if algorithm.Size()*2 != len(s[i+1:]) {
return ErrDigestInvalidLength
}

return nil
return algorithm.Validate(encoded)
}

// Algorithm returns the algorithm portion of the digest. This will panic if
Expand Down
4 changes: 4 additions & 0 deletions digest_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ func TestParseDigest(t *testing.T) {
encoded: "LCa0a2j_xo_5m0U8HTBBNBNCLXBkg7-g-YpeiGJm564",
err: ErrDigestUnsupported,
},
{
input: "sha256:E58FCF7418D4390DEC8E8FB69D88C06EC07039D651FEDD3AA72AF9972E7D046B",
err: ErrDigestInvalidFormat,
},
} {
digest, err := Parse(testcase.input)
if err != testcase.err {
Expand Down