-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index.json: improve implementors note to better represent algorithms necessary #588
Comments
Based on my research over the last week, I think we can address this by pulling up the platform descriptions into the descriptor to get rid of the type hierarchy of descriptors which is awkward. This is a Go type change and a slight reorganization of the types definition which is completely compatible, but it would be nice to have that correct for 1.0. Let me know what you think about this or if more details are needed. |
I agree with moving platform to descriptor annotations (now that we have them). Though we still would need to add some implementor notes about how they are expected to handle the hierarchy of annotations that you could run into and that the recommended referencing implementation would involve effectively tag matching (more of a search engine and less of a simple dereferencer). I've been working on this on the |
I think we'd have to support both platform field and annotations, in practice, but the simplification seems straightforward.
I'll see if I can look deeper into platform qualified pull to give some better guidance here, as promised. |
@stevvooe Did you mean adding And could you please provide more details about why it can address this issue (e.g., how to handle duplicated |
Could you please put up some possible examples of
If it means the contents(config+layers) in one manifest might be not operable? Instead, manifests in one index can be combined and run a container according to customized order? |
Yes.
This can help with the issue by avoid having to deal with different descriptor types, which was never intended by this specification. As far as an example of a ux that deals with multiple reference, let's say we have a tool that lists the references in a image layout:
Then we can unpack with something like this:
Now, just qualify the reference:
Or:
|
Thanks @stevvooe for the example! And I am thinking another case:
In this case, what will be the expected behavior of |
I think you would need to unroll the references in the targeted index. Perhaps, we could print that as a tree? |
Do we really need to allow multiple descriptors with the name but with different media types in an image layout? I think it is definitely OK if there are multiple manifest descriptors with the same name but for different platforms. But if there is already an index descriptor which is for multiple platforms, why should there be a manifest descriptor with the same name for a specific platform? I think the later should be part of the former rather than part of |
@qianzhangxa @cyphar Here is my visitor implementation for containerd: /~https://github.com/docker/containerd/pull/638. It let's you arbitrarily walk image resources and process them in the vein discussed above. Let me know if you have questions. |
Thanks @stevvooe I took a quick look at that PR, but it seems that its implementation is still based on the old image layout ( |
In fact, it is not based either. As I've said multiple times, I think the focus on the archive format has greatly confused things. "Fetching" an index.json file makes absolutely no sense. That PR shows the simple method of walking a tree with a visitor pattern to process image related resources. It also provides the ability to filter at pull, push or assembly. This can be generalized to deal with the exact problems described here. Adding support for new types is straightforward. |
Is metadata meant to apply to all children of a particular descriptor? If so, how are conflicts handled (the lowest statement wins?), and does that mean that the only way to implement a reference resolver is to walk everything that a descriptor points to and then collate those objects and their metadata? Surely there needs to be some clarification on this point. I mean, I'm not even sure how you would expose the concept of a hierarchy of metadata to users, but I'm trying to figure it out. |
On Tue, Mar 21, 2017 at 03:25:45AM -0700, Aleksa Sarai wrote:
… and does that mean that the only way to implement a reference
resolver is to walk everything that a descriptor points to and then
collate those objects and their metadata?
I don't think we want to require descendant walking. Each node in the
DAG should carry whatever metadata it wants walkers to use locally.
That is why we copy platform information [1] up from the config [2]
now, although the index platform information grew into something more
richly typed.
[1]: /~https://github.com/opencontainers/image-spec/blame/v1.0.0-rc5/image-index.md#L41
[2]: /~https://github.com/opencontainers/image-spec/blame/v1.0.0-rc5/config.md#L60
|
There is an equivalence between trees and a linearization of a tree based on the traversal order. Exploit this and you can do all these things.
As I've said, you can do whatever you want but having the lowest statement win would be lossy. I would just collect these references under a key and display them together. You can maintain a stack to recover the reference path or just have a list for each reference that you add descriptors to ( I'll add some more examples to containerd and consider contributing them back here, but I ask that you at least try out some algorithms before dismissing the approach. |
It's a lossy transformation. Given the following in-order traversal, which of the following trees did it come from?
I could make more but you get the idea. The same problem exists for preorder and postorder traversals. IIRC from my discrete maths classes, there is not a bijection between a traversal and the arbitrary tree that can produce it. While it is true that in binary trees there is a bijection if you have both inorder and postorder traversals, the OCI images aren't binary trees. Not to mention that now users have to describe an arbitrary tree whenever they want to create a reference. I mean, in some aspects it's better than the original system but I don't see any practical way of doing this without some graphical assistance. |
I'm not dismissing the approach, I'm asking for clarification. As it stands, it's ambiguous what should be done. I do know how to implement a traversal of a tree, what I'm not sure of is whether it's a good idea for everyone to expose a completely different UX and concepts when handling references. I mean, you've said that's not an issue in the past and I begrudgingly agreed, but the whole point of this issue was to provide guidance about the UX. Currently (or at least a few weeks ago) I'm discussing this issue with @mtrmac (one of the maintainers of skopeo) and I'm unsure if Thank god we don't need to resolve this in |
(I haven’t been following any of the previous conversations, so I apologize if this rehashes long-settled discussions.) Yeah, from my POV the index format prioritizes the wrong thing: it wants to be abstract enough to be able to represent any possible situation, but it gives absolutely no guidance how to consume the content. If two different implementations running on the same computer, given the same input, can choose two different images from the same index as “best”, the spec is not interoperable. It should be practical and easy to write clients which follow the spec, are interoperable, and behave consistently. It seems to me it is much better to start with a narrow but clearly working and interoperable spec (which has a clear promise, and is obviously possible to extend/improve) rather than a wide and unimplementable spec (which fails in its very purpose from the start, so is likely to be abandoned or replaced). AFAICS the index specification at /~https://github.com/opencontainers/image-spec/blob/master/image-index.md utterly fails in providing any “how to consume” guidance; though perhaps I am just missing the existence of another document which describes this. But then this ticket seems to advocate for explicitly documenting that there is no guidance; I suppose that’s better than saying nothing, because it saves readers time, but just giving up does not give us an interoperable standard. A clear example is the The references to a human UI which lists all of the index contents and expect the user to choose are, for me, not at all helpful with trying to decide how to consume such an index by software. Neither is a reference to SQL (or, with nesting, Prolog or something?); if the field values are not standardized, the SQL can’t be autogenerated/hard-coded in an application, the user has to specify it; but the user is similarly hampered by lack of specification of the values, so the user has to specify the selector also manually by individually inspecting a specific manifest or by blindly assuming that the index has been created using a specific implementation. The user is asked to predict future contents of indexes which have not yet been generated, without any guidance. I suppose in practice this will play out the way overly-complex interoperability specs usually do: by most implementation settling on implementing a small subset, perhaps a formally specified and interoperable “profile”, perhaps each implementation making its own incompatible choice. Right now my preference, as someone very ignorant about the wider ecosystem concerns and past practice, would be to standardize only the Future versions of the index (with a new |
@stevvooe @cyphar Any updates for this issue? I think for this issue, we have 4 cases need clarification:
My idea is, for all the above 4 cases, the implementation should report an error since it can not find a unique manifest for the ref. And since platform is an optional property in the descriptor, if there is any manifest in the Comments? |
@mtrmac If you think there is an area that is under-specified, please make an attempt to specify those fields. As it is, the referenced fields are mostly owned by the given platform. We can create a registry of those platform fields and their meanings, if that helps, but it is a big effort. Either way, I think that has little to do with this issue.
@cyphar The classic rules of a lack of bijection for binary trees and their linearizations don't apply here. The actual nodes also contain the links, so if you linearize, you can still recover the original tree. In fact, given any node of an OCI tree, the subtree can be reconstructed because it is itself the vertex and the edges.
I'm not sure what to say here other than UX decisions were made that may need to be reconsidered. If they want to take an opinionated view of images, that is perfectly acceptable, but I don't think that should be projected on this specification. @qianzhangxa Thank you for providing concrete questions! This will allow us to keep the discussion technical and moving forward.
Why couldn't all of these descriptors be linearized under that same name? The children have both type and order. The result is equivalent to a single index with a unique name.
This is up to the implementation. I would say err on taking the first match, but I am not sure we should specify this.
Same as case one. All of these index under the same name when trying to build the ref.
You can err on taking the first entry.
I think this is vastly outside the scope of the specification. We define the data structures and their meaning. If something is possible in the specification, but not useful, it should be up to the implementer to decide that the case doesn't work for them. To tell you the truth, none of these cases fall outside the original presented dichotomy of erring out versus presenting all the options. If you don't want your images to be ambiguous, don't create ambiguous images for your tools and it is perfectly acceptable for your tools to reject them. |
This verges towards “you can produce anything and call itself a compliant OCI producer, and you can refuse to consume anything and call yourself a compliant OCi consumer”, and the poor users are left with… pretty much no interoperability promise, i.e. no standard to speak of. (Or have these principles been already decided in the past, so that OCI is satisfied with documenting “data structures” and not ensuring interoperabity? If so, I apologize.) If the interoperable subset of values is only the non-ambiguous images, it would be far better for the spec to actually define this subset instead of the vague undefined universe of names and values. Then producers would know what to target, and consumers would know what they are required to consume, and both would be (more) likely to be interoperable. (Specifing the interoperable well-defined subset of doesn’t prevent any implementation from defining non-standard extensions, only they would be clearly non-standard instead of using “OCI-compliant data structures with not-prohibited values and undefined semantics”.)
How does answering “this is up to the implementation” benefit interoperability? Or if interoperability between independent OCI implementations is not the primary goal, what is the primary goal of this spec? If the behavior is unspecified, producers can’t produce interoperable indexes with such duplicate name/platform pairs. So they shouldn’t do that. If the behavior is specified to be “use the first match”, producing indexes with duplicates is clearly useless. So the producers shouldn’t do that either. Really I can see no situation, except for a producer which is knowingly producing an index to be consumed by a specific consumer, relying on a data format defined beyond the OCI spec (which is of course possible, but in that case OCI “allowing” such data, or not, is by definition irrelevant), when a producer would ever want to create such duplicates legitimately. If the duplicates are not legitimate, consumers should always reject them; hence the spec should describe duplicates as “forbidden in this version, may have defined semantics in the future” so that producers don’t create them. |
@mtrmac I can see my response was lost on you. When I said it would be nice if you could avoid histrionics, I meant that you should avoid statements like there is "pretty much no interoperability promise, i.e. no standard to speak of". Come on. This isn't even close to being true and you know it. I am wasting no more time on you. Seeing as this conversation is going absolutely no where, I am closing this issue. If you would like to have me consult for you on proper implementation, you can join the OCI call. |
@stevvooe We still haven't added implementor's notes, which is what this issue was about in the first place. FWIW, I do echo @mtrmac's concern about the abundance of statements that basically amount to "it's up to the implementation if they support / how they interpret " -- mainly because I'd like to make From my side, I'm just going to make the current UX of
Okay, but that assumes that all of the ancestors have already been created. What if they haven't? I mean, ultimately this UX is going to have to look like "here's a tree I want you to construct from this root" -- which isn't intuitive to users who aren't familiar with the spec / don't care about the spec. As an aside, is there any other @opencontainers/image-spec-maintainers who wants to speak up on this topic? |
@cyphar There two choices here:
To run all compliant images, you probably have to go with option 1 but this is a rare case, so option 2 is sufficient in practice. We can make implementer's notes that describe this trade off. |
On Thu, Mar 30, 2017 at 12:34PM -0700, Stephen Day wrote [1]:
> In index.json, there are one index and one manifest which have the
> same ref name, and inside the index, there is a manifest which has
> the same platform (which is the platform of the current machine)
> with the manifest in the index.json.
Same as case one. All of these index under the same name when trying
to build the ref.
This sounds like:
When resolving a reference by org.opencontainers.image.ref.name,
compliant implementation MUST:
1. Walk all application/vnd.oci.image.index.v1+json recursively,
inheriting ancestor annotations down the DAG (resolving conflicts
in favor of children?).
2. Consider the list of non-index descriptors compiled from the walk
in step 1, decide if there is any ambiguity in a platform-match
for application/vnd.oci.image.index.v1+json (although the
platform-matching algorithm is left unspecified).
3. Either error out or ask the user to resolve any ambiguity
detected in step 2 [2].
If that is the intention, I think the spec should definitely spell out
those requirements. If that is not the intention (e.g. I'd rather not
require decendant-walking in step 1 [3]), I think the spec should
clearly “implementations MAY recursively walk ancestors…” or some such
to make it clear that we do not expect all implementations to do that.
But as it stands, it's very unclear (to me, anyway) how similar two
platform declarations [4] can be before you get into the “ambiguous
overlap” space. Same arch and different os seems clearly separate, as
does same arch and os but different variant. But same arch, same os,
and different version? Same arch, os, and version and different
features? If I have a two amd64, Linux manifests, and one needs
feature A while another needs feature B, are they ambiguous on a host
that supports both A and B but unambigous on a host that supports only
A? If a potential outcome is erroring on ambiguity, I think the spec
should clearly explain when it considers platform declarations
ambiguously similar so that image producers can try to stay on the
unambiguous side of that line.
[1]: #588 (comment)
[2]: #588 (comment)
[3]: #588 (comment)
|
@stevvooe Please have more empathy for our community. If issues are being brought up repeatedly by different parties -- including a number of contributors actively working on implementations and attempting to use this specification -- it is more likely because the issues have not been adequately explained or resolved by us. Not because said contributors are stupid or seeking attention. This GitHub issue was opened in the first place at your suggestion as a follow up to #581 - wherein once again multiple independent contributors raised similar concerns which we have not yet satisfactorily addressed.
Broadly speaking, I share your and @mtrmac's concerns. |
I'm trying to follow this discussion, does this capture the basic idea/concerns:
I feel like there might be more, but hopefully I'm not too far off. If I'm totally wrong then ignore the rest of this ramble :-) My view:
So, given the above (and ignoring number 3 for a sec), what about something like:
As for 3, I don't have spec-ish text yet but I'm thinking of something like:
Anyway, hopefully my reading of this issue isn't too far off. |
On Fri, Apr 07, 2017 at 06:44:53AM -0700, Doug Davis wrote:
An implementation MAY choose to use whatever imaging matching
algorithm it deems necessary to make this choice. Likewise, an
implementation MAY choose to simple generate an error and stop
processing in the face of an ambiguous choice.
This doesn't give image authors much guidance about what sorts of
images will be unambiguous or not [1,2]. Are there specific things
they should avoid? Or is there no subset of the allowed image space
that is guaranteed to be unpackable by all compliant implementations?
[1]: #588 (comment)
[2]: #588 (comment)
|
I'm not sure we can hand-hold people enough to prevent them from doing silly things. If someone wants to define two images under one index.json such that all of the properties are the exact same then they get what they asked for - ambiguity. And people will stop using that image. This, IMO, is about interop not preventing silliness. If we have cases where we can look at an index.json and two different impls claim that w/o any ambiguity resolver they come up with different results then I agree the spec might need more, but I'm not sure that's the case. Do we have examples of that? |
On Fri, Apr 07, 2017 at 10:04:23AM -0700, Doug Davis wrote:
If we have cases where we can look at an index.json and two
different impls claim that **w/o any ambiguity resolver** they come
up with different results then I agree the spec might need more, but
I'm not sure that's the case. Do we have examples of that?
I tried to lay some out in the final paragraph of [1]. Do all of
those cases sound unambiguous to you? There may be less ambiguity
here after #631 / #632, but I don't see either of those helping with
the “one needs feature A while another needs feature B” case. If that
case is a “silly thing”, then I think we need to make that clear in
the spec.
[1]: #588 (comment)
|
So, let me ask this... is the real issue here around the relationship (e.g. inheritance) of metadata as you walk the tree? If its just about the last case you mentioned (the A and B example), then I don't think there's much we can do. In the case where both set of features are supported then yes its ambiguous and the platform will decide how to deal with it - error, toss a coin, etc... I'll admit that part of me would love to mandate that we error out on all ambiguous cases but that doesn't seem fair to implementations/users that are comfortable with picking either one. After all, if they're all part of the same grouping then the author is saying (to me) they're the same just with some variant. And if that variant wasn't important enough to include as part of the query then it shouldn't matter which one is picked. |
On Fri, Apr 07, 2017 at 10:39:59AM -0700, Doug Davis wrote:
So, let me ask this... is the real issue here around the
relationship (e.g. inheritance) of metadata as you walk the tree?
That's one of the things that's confusing me, but not the only thing
;).
If its just about the last case you mentioned (the A and B example),
then I don't think there's much we can do. In the case where both
set of features are supported then yes its ambiguous and the
platform will decide how to deal with it - error, toss a coin,
etc...
Then I think we need to say something about that in the spec. Either:
If an index references multiple manifests with the same
org.opencontainers.image.ref.name, platform.architecture, and
platform.os, portable indexes SHOULD avoid setting platform.features
unless each value is a strict superset of the platform.features
values for the other otherwise-matching references. Otherwise the
appropriate manifest for a host that supports both sets of features
MAY be ambiguous.
Or:
If an index references multiple manifests with the same
org.opencontainers.image.ref.name, platform.architecture, and
platform.os, implementations MUST select one as the best match,
although the algorithm for that selection is implementation-defined.
The former assists interop by restricting the set of portable images.
The latter assists interop by requiring implementations to support
this particular ambiguous case somehow. And both are still not quite
right, because the stated (org.opencontainers.image.ref.name,
platform.architecture, platform.os) tuple ignores other
platform-matching fields (like platform.os.version) which might make
an otherwise ambigous entry clear (or might not, I'm not sure what the
semantics for platform.os.version are expected to be).
In order to make the second approach work, we'd need to be very
clear about how much matching you'd need before further distinctions
were “not important enough” to be worth erroring on.
|
@duglin I'm going to throw my hat at this one again. God help me.
Those three concerns are things that we've discussed, and I will discuss them in a second. But first, I would argue that with #634 we should revisit some of the arguments we originally had about how flexible the spec is (all of this discussion comes from not wanting to make tl;dr: Can someone show me an example (with #634 applied) of an image where making However, if we assume we keep the current state of affairs with regards to
I'm not sure I can put my finger on it, but I don't like that Yes, I get that they're all "just annotations" but annotations do have different semantic meanings (and especially since some of the annotations are "sourced" [in an unspecified way] from the lower-level configuration).
I think that it's fine if this will just cause errors and require the user to either resolve the ambiguity or add more specificity. This doesn't seem particularly controversial, my main concern is that the only instance of ambiguity I understand the argument for is when the platform-specific variants and features are being compared.
Well, currently the entire meaning of conflicting keys and what the metadata means and recommendations how it should be used are all quite unspecified at the moment. To be honest, I'm not sure what I would even recommend here. We'd have to discuss it more.
My main concern with UX is that by not specifying a way to handle references in any meaningful way, then you're going to end up with different opinions on how the UX should look that make the user experience much worse than it needs to be. I'm sure |
@duglin Thanks for your analysis on this issue! I've also filed #634, which greatly simplifies the problem.
We need to be very careful about the direction of what an implementation must accept or reject, as we risk rendering otherwise operable implementations non-compliant if we do this haphazardly.
This was a concern I brought up multiple times during the introductions of extra annotations on descriptors and manifests. However, I think the meaning of the annotation is really incumbent on the annotation itself. Many annotations apply to the target, rather than the object itself, but things like Either way, after some thought, we need to focus on general meaning of the data structures and that will reconcile the issues with developing working algorithms. Since we are working with such a de-normalization, assigning a particular component as the source of truth will help to eliminate ambiguity. #634 does this for the |
Hopefully my opinion is welcome, but I really really think some of these
properties should be first class attributes; if we're always going to have
them, why not define them that way?
I think it simplifies this argument significantly and reduces complexity of
the spec. Some things, like name, will never go out of style. :)
…On Fri, Apr 7, 2017 at 11:19 AM Stephen Day ***@***.***> wrote:
@duglin </~https://github.com/duglin> Thanks for your analysis on this
issue!
I've also filed #634
<#634>, which greatly
simplifies the problem.
If we have cases where we can look at an index.json and two different
impls claim that w/o any ambiguity resolver they come up with different
results then I agree the spec might need more, but I'm not sure that's the
case. Do we have examples of that?
We need to be very careful about the direction of what an implementation
must accept or reject, as we risk rendering otherwise operable
implementations non-compliant if we do this haphazardly.
is the real issue here around the relationship (e.g. inheritance) of
metadata as you walk the tree?
This was a concern I brought up multiple times during the introductions of
extra annotations on descriptors and manifests. However, I think the
meaning of the annotation is really incumbent on the annotation itself.
Many annotations apply to the target, rather than the object itself, but
things like Config.Labels apply directly to the object. Since we are
working with what are effectively tagged pointers, this seems to work well.
Generically, we are lifting facts from the leaves and bringing to the
branches to make for more efficient processing.
Either way, after some thought, we need to focus on general meaning of the
data structures and that will reconcile the issues with developing working
algorithms. Since we are working with such a de-normalization, assigning a
particular component as the source of truth will help to eliminate
ambiguity. #634 <#634>
does this for the ref.name annotation. If we need this in other places,
such as the platform, that may help, as well.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#588 (comment)>,
or mute the thread
</~https://github.com/notifications/unsubscribe-auth/AABJ67OCSgBt3OgK-IIKk0oefl_9RZbeks5rtn4zgaJpZM4MQWN8>
.
|
@erikh The main problem here is that naming is hard. Even worse, embedded naming was tried in docker schema1 and it was not a good design. It presents a number of problems, in practice, including authority delegation. The addition of Note that this annotation is not |
Ok fair enough; thanks for the explanation!
…On Fri, Apr 7, 2017 at 1:28 PM Stephen Day ***@***.***> wrote:
@erikh </~https://github.com/erikh> The main problem here is that naming is
hard. Even worse, embedded naming was tried in docker schema1 and it was
not a good design. It presents a number of problems, in practice, including
authority delegation. The addition of ref.name is a narrow annotation use
to allow an image layout to reproduce the behavior of the docker save/load
format. Most of the complaints of ambiguity here are really about a wide
interpretation of the annotation, which doesn't seem valid in practice.
Note that this annotation is not name, it is ref.name. It has nothing to
do with the image name. It is effectively the analog of the tag. If you
interpret it in this narrow context, rather than trying to make it do 1000
things, it works well, for purpose. When you try to expand its scope, as
happens when you expand scope in many cases, the meaning and behavior
breaks down and that is what we are seeing here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#588 (comment)>,
or mute the thread
</~https://github.com/notifications/unsubscribe-auth/AABJ6xcSxnGLQMCUbcCQ43Fuk_-ILRsfks5rtpxcgaJpZM4MQWN8>
.
|
I completely agree with you on this point 😉. All of my concerns come from a place of "why do we need to make tag resolution ambiguous in certain cases"? Since I'm fairly sure that there would be no loss of generality if we made |
@stevvooe to comment from dockercon |
At this time, the original complaints of this issue has mostly been addressed.
If there are still items to be addressed, please open specific issues pertaining to that area. |
So what if the digest is not specified? Should the action fail and report an error to complain there are multiple refs with the same name? |
As discussed in the OCI call, currently the implementors note for
index.json
doesn't fully encapsulate the reason for the more complexIndex
data structure and the ways in which implementations should handle then
-tall tree that it can represent.The main points that need to be clarified are:
That the UX for selecting what manifest is being operated on is not defined by the spec, and that implementations are free to either fail when they can't uniquely resolve a manifest, or provide some sort of indication to the user that
The platform matching / annotation matching algorithm necessary to match and flatten the tree structure of descriptors and then reduce the set of descriptors to those that match a particular query. The SQL analogy would probably be something we should mention.
@stevvooe Stepped up to work on this one.
The text was updated successfully, but these errors were encountered: