-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config-linux: Specify relationships for new namespaces #795
Conversation
3ee4b06
to
26d592c
Compare
Previously we had no MUST-level runtime requirements for namespace entries in valid configs. This commit attempts to pin those down. I think we want more wording about new namespace creation (what namespace is the seed/parent? Which user namespace owns a runtime namespace? For more background on hierarchical namespaces, see [1]. For more background on the owning user namespace idea, see [2,3,4]), but that wording proved contentious [5,6], so I punted it to [7]. The "'path' not associated with a namespace of type 'type'" condition ensures that runtimes don't blindly call setns(2) on the path without setting nstype nonzero. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a7306ed8d94af729ecef8b6e37506a1c6fc14788 nsfs: add ioctl to get a parent namespace, 2016-09-06 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6786741dbf99e44fb0c0ed85a37582b8a26f1c3b nsfs: add ioctl to get owning user namespace for ns file descriptor, 2016-09-06 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5ff5ce6e20ee22511398bb31fb912466cf82a36 nsfs: Add an ioctl() to return the namespace type, 2017-01-25 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d95fa3c76a66b6d76b1e109ea505c55e66360f3c nsfs: Add an ioctl() to return owner UID of a userns, 2017-01-25 [5]: opencontainers#767 (comment) [6]: opencontainers#767 (comment) [7]: opencontainers#795 Signed-off-by: W. Trevor King <wking@tremily.us>
26d592c
to
79ee80a
Compare
79ee80a
to
c9b6118
Compare
This got kicked around some in today's meeting (although my notes are thin). My impression was that there is concern that we are documenting the kernel (which I'm trying to avoid) and that these requirements are somewhat tautological with the source of the
I think we want to land something like the wording in this PR (as of c9b6118) with RFC 2119 teeth, and then update the glossary entry to be something like:
Which embraces the overlap by trimming the incorrect glossary entry and referencing the RFC 2119 teeth here. I don't think we should leave these relationships alone with an argument like “namespaces can be confusing, so let's not say obvious things like this”. If namespaces are confusing, I think that's a good reason to say obvious things, because “obvious” is in the eye of the beholder. And it's especially good if we can phrase them specifically enough for compliance testing, since that removes the eye-of-the-beholder ambiguity. |
I feel like this is already implied (given that it's how the kernel itself works), and that there isn't a lot of value added by explicitly mentioning it or attempting to validate runtime behavior around it. I'd vote to either close or defer discussion to post-1.0. |
For more background on hierarchical namespaces, see [1]. For more background on the owning user namespace idea, see [2,3,4]). These were contentious [5,6], so they weren't part of the previous commit. I still think we want to say something about these relationships. We already have some of "runtime namespace" conditions (e.g. when a type is not listed in linux.namespaces[]), so runtimes should already have implementation-specific wording around what the runtime namespaces are (we don't explicitly make them implementation-defined, although we probably should). Anyhow, that's not a new concept added by this commit. # Seeded namespaces For example, if you ask for a new uts namespace but do not set the optional hostname, having the seed defined means that the hostname in the container UTS namespace is well-defined (it will be whatever the hostname was in the runtime UTS namespace). This is less of an issue for the mount namespace, because with root.path REQUIRED, there's no way to avoid clobbering whatever mounts you got from your seed (which makes not asking for a new mount namespace exciting ;). # Hierarchical namespaces I think "I want this container to run in a new user/pid namespace that is a child of the runtime user/pid namespace" should be something that has a portable config expression. Otherwise it becomes very unclear what to put in the hostID field for (u|g)idMappings, because you don't know what namespace will be used to interpret the hostIDs. # Namespace ownership This is another case where I think specified clarity is essential. A new network namespace will not be very useful if you don't know who owns it. # Glossary changes In review, Mrunal and others pointed out that the new config-linux entries overlapped with the existing glossary entry [7]. I've addressed that in this commit by trimming the glossary entries down to fix them (namespaces are not all hierarchical and processes aren't always in leaves). I've also dropped the handwavy and incorrect "children" sentence in favor of a link back to config-linux.md and the more specific RFC 2119 language from this commit. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a7306ed8d94af729ecef8b6e37506a1c6fc14788 nsfs: add ioctl to get a parent namespace, 2016-09-06 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6786741dbf99e44fb0c0ed85a37582b8a26f1c3b nsfs: add ioctl to get owning user namespace for ns file descriptor, 2016-09-06 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5ff5ce6e20ee22511398bb31fb912466cf82a36 nsfs: Add an ioctl() to return the namespace type, 2017-01-25 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d95fa3c76a66b6d76b1e109ea505c55e66360f3c nsfs: Add an ioctl() to return owner UID of a userns, 2017-01-25 [5]: opencontainers#767 (comment) [6]: opencontainers#767 (comment) [7]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-05-10-21.03.log.html#l-115 But I was talking so my log entries are sparse :/. Signed-off-by: W. Trevor King <wking@tremily.us>
c9b6118
to
cf8306e
Compare
On Wed, May 10, 2017 at 09:39:10PM -0700, Tianon Gravi wrote:
I feel like this is already implied (given that it's how the kernel
itself works)…
That's how the kernel works if you're calling unshare(2) or clone(2)
from the runtime namespaces to create new container namespaces. But
there's nothing in the spec that says you have to do it that way. You
could have a namespace-generating process that creates new namespaces
and pipes file descriptors for them into the container process on a
Unix socket which the container process setns(2)s. That's how the
kernel works too, but I doubt it's what most callers will expect.
… and that there isn't a lot of value added by explicitly mentioning
it or attempting to validate runtime behavior around it.
If my feeling is that this is a useful addition and yours is “it
doesn't add much”, I'd recommend erring on the side of adding three
lines to config-linux.md and trimming a line from the glossary (I've
included my proposal from [1] in a reroll from c9b6118 → cf8306e).
I'm also happy to PR runtime-tools tests if that helps. If there is
anything else I can do to reduce the maintenance cost of this change
(which I expect to be small), please let me know.
[1]: #795 (comment)
|
On Linux, a leaf in the [namespace][namespaces.7] hierarchy from which the [runtime](#runtime) process is executed. | ||
New container namespaces will be created as children of the runtime namespaces. | ||
|
||
On Linux, the namespaces from which new [container namespaces](config-linux.md#namespaces) are created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not link to the glossary container namespace above? Looks more suitable to me.
Agreed.
I'm +1 on this. |
@@ -40,6 +40,9 @@ The following parameters can be specified to setup namespaces: | |||
The runtime MUST [generate an error](runtime.md#errors) if `path` is not associated with a namespace of type `type`. | |||
|
|||
If `path` is not specified, the runtime MUST create a new [container namespace](glossary.md#container-namespace) of type `type`. | |||
For hierarchical namespaces (e.g. `pid`, `user`), the new container namespace MUST be a child of the [runtime namespace](glossary.md#runtime-namespace) of that type. | |||
For seeded namespaces (e.g. `mount`, `uts`), the new container namespace MUST be seeded by the runtime namespace of that type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NACK on this language.
A couple of the clarifying sentences are okay, but the seeding and hierarchical namespace language is too prescriptive and not okay to add. closing. add prs for the simpler sentences if still desired |
…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>
On Wed, May 24, 2017 at 08:55:44AM -0700, v1.0.0.batts wrote:
A couple of the clarifying sentences are okay, but the seeding and
hierarchical namespace language is too prescriptive and not okay to
add. closing.
Without these statements, a config including [1] could lead to a new
container PID namespace that is *not* a child of the runtime
namespace. With a PID namespace hierarchy like:
A
|-- B
`-- C
where B is the runtime PID namespace, the container PID namespace D
could be created in any of the following positions:
A
|-- B
|-- C
`-- D'
A
|-- B
| `-- D''
`-- C
A
|-- B
`-- C
`-- D'''
A
|-- B
`-- C
`-- E
`-- D'''
etc.
But I'd like to restrict runtimes to only D''. Do you feel that is
too perscriptive? Do you want to allow runtimes to use the D', D''',
D''', etc., positions? If so, can you explain why? If not, why allow
them?
add prs for the simpler sentences if still desired
|
If the runtime can implement all of the relevant lifecycle stuff with those semantics, they can go for it. I don't understand why it is a necessity that Not to mention there might be some very odd usecases where you might actually want the functionality of joining namespaces you couldn't have created. Without having something as awful as |
On Wed, May 24, 2017 at 11:02:24AM -0700, Aleksa Sarai wrote:
If the runtime can implement all of the relevant lifecycle stuff
with those semantics, they can go for it.
Lifecycle semantics are not the problem, especially if we address
#731. If we flip this over to user namespaces (also hierarchical),
one problem is “what do I put in hostID?” [1]. If the goal is for the
new container namespace to map B's UID 1000 (which happens to be to
UID 2000 in A and unmapped in C) to D's PID 0, do I put 1000 in
linux.uidMappings.hostID? Do I put 2000 in there? The correct value
depends on whether the runtime will create D', D'', D''', etc.
I don't understand why it is a necessity that `ps aux` in the
caller's namespace must container the container PIDs. That's not
required anywhere and would actually break any daemon-based
container runtimes (at least it would make their jobs harder for no
obviously beneficial reason).
Perhaps you want the container killed when PID namespace B's init
exits. The way to support daemon-based container runtimes is to have
those runtimes clearly document their runtime namespace. Calling a
‘create’ action from a shell in namespace B doesn't mean that B is the
runtime namespace. The runtime namespace doesn't even have to be on
the same host. But whatever the runtime namespace is (which should be
implementation-defined), you need the container namespaces to be
children/seeded from the runtime namespaces in order to know what to
put in linux.uidMappings.hostID and similar or to how to have the
runtime create nested PID namespaces.
[1]: /~https://github.com/opencontainers/runtime-spec/blob/v1.0.0-rc5/config-linux.md#user-namespace-mappings
|
…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>
…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. Examples of resources retrieved from this namespace include linux.namespaces[].path and the resctrl psuedo-filesystem used for `linux.intelRdt`, but Mrunal and Michael didn't want me to include the examples in the glossary entry (probably because they could go stale). [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>
Spun off from #767, since these are contentious. I still think we want to say something about these relationships.
We already have some of “runtime namespace” conditions (e.g. when a type is not listed in
linux.namespaces[]
), so runtimes should already have implementation-specific wording around what the runtime namespaces are (we don't explicitly make them implementation-defined, although we probably should). Anyhow, that's not a new concept added by this commit.Seeded namespaces
For example, if you ask for a new
uts
namespace but do not set the optionalhostname
, having the seed defined means that the hostname in the container UTS namespace is well-defined (it will be whatever the hostname was in the runtime UTS namespace).This is less of an issue for the mount namespace, because with
root.path
REQUIRED, there's no way to avoid clobbering whatever mounts you got from your seed (which makes not asking for a new mount namespace exciting ;).Hierarchical namespaces
I think “I want this container to run in a new user/pid namespace that is a child of the runtime user/pid namespace” should be something that has a portable config expression. Otherwise it becomes very unclear what to put in the
hostID
field for(u|g)idMappings
, because you don't know what namespace will be used to interpret the hostIDs.Namespace ownership
This is another case where I think specified clarity is essential. A new network namespace will not be very useful if you don't know who owns it.