config-linux: Specify relationships for new namespaces #795

wking · 2017-05-09T22:13:48Z

Spun off from #767, since these are contentious. I still think we want to say something about these relationships.

We already have some of “runtime namespace” conditions (e.g. when a type is not listed in linux.namespaces[]), so runtimes should already have implementation-specific wording around what the runtime namespaces are (we don't explicitly make them implementation-defined, although we probably should). Anyhow, that's not a new concept added by this commit.

Seeded namespaces

For example, if you ask for a new uts namespace but do not set the optional hostname, having the seed defined means that the hostname in the container UTS namespace is well-defined (it will be whatever the hostname was in the runtime UTS namespace).

This is less of an issue for the mount namespace, because with root.path REQUIRED, there's no way to avoid clobbering whatever mounts you got from your seed (which makes not asking for a new mount namespace exciting ;).

Hierarchical namespaces

I think “I want this container to run in a new user/pid namespace that is a child of the runtime user/pid namespace” should be something that has a portable config expression. Otherwise it becomes very unclear what to put in the hostID field for (u|g)idMappings, because you don't know what namespace will be used to interpret the hostIDs.

Namespace ownership

This is another case where I think specified clarity is essential. A new network namespace will not be very useful if you don't know who owns it.

Previously we had no MUST-level runtime requirements for namespace entries in valid configs. This commit attempts to pin those down. I think we want more wording about new namespace creation (what namespace is the seed/parent? Which user namespace owns a runtime namespace? For more background on hierarchical namespaces, see [1]. For more background on the owning user namespace idea, see [2,3,4]), but that wording proved contentious [5,6], so I punted it to [7]. The "'path' not associated with a namespace of type 'type'" condition ensures that runtimes don't blindly call setns(2) on the path without setting nstype nonzero. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a7306ed8d94af729ecef8b6e37506a1c6fc14788 nsfs: add ioctl to get a parent namespace, 2016-09-06 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6786741dbf99e44fb0c0ed85a37582b8a26f1c3b nsfs: add ioctl to get owning user namespace for ns file descriptor, 2016-09-06 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5ff5ce6e20ee22511398bb31fb912466cf82a36 nsfs: Add an ioctl() to return the namespace type, 2017-01-25 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d95fa3c76a66b6d76b1e109ea505c55e66360f3c nsfs: Add an ioctl() to return owner UID of a userns, 2017-01-25 [5]: opencontainers#767 (comment) [6]: opencontainers#767 (comment) [7]: opencontainers#795 Signed-off-by: W. Trevor King <wking@tremily.us>

wking · 2017-05-10T23:03:05Z

This got kicked around some in today's meeting (although my notes are thin). My impression was that there is concern that we are documenting the kernel (which I'm trying to avoid) and that these requirements are somewhat tautological with the source of the unshare(2) or clone(2) being the “runtime namespace” by definition (which I agree with). It's been a while since I looked at the glossary entry, and now that I do I think:

The current glossary entry makes all namespaces sound hierarchical, when they aren't.
The current glossary entry has no RFC 2119 language for compliance testing.

I think we want to land something like the wording in this PR (as of c9b6118) with RFC 2119 teeth, and then update the glossary entry to be something like:

On Linux, a the namespace from which the container namespaces [link to config-linux.md#namespaces] are created.

Which embraces the overlap by trimming the incorrect glossary entry and referencing the RFC 2119 teeth here.

I don't think we should leave these relationships alone with an argument like “namespaces can be confusing, so let's not say obvious things like this”. If namespaces are confusing, I think that's a good reason to say obvious things, because “obvious” is in the eye of the beholder. And it's especially good if we can phrase them specifically enough for compliance testing, since that removes the eye-of-the-beholder ambiguity.

tianon · 2017-05-11T04:39:09Z

I feel like this is already implied (given that it's how the kernel itself works), and that there isn't a lot of value added by explicitly mentioning it or attempting to validate runtime behavior around it.

I'd vote to either close or defer discussion to post-1.0.

For more background on hierarchical namespaces, see [1]. For more background on the owning user namespace idea, see [2,3,4]). These were contentious [5,6], so they weren't part of the previous commit. I still think we want to say something about these relationships. We already have some of "runtime namespace" conditions (e.g. when a type is not listed in linux.namespaces[]), so runtimes should already have implementation-specific wording around what the runtime namespaces are (we don't explicitly make them implementation-defined, although we probably should). Anyhow, that's not a new concept added by this commit. # Seeded namespaces For example, if you ask for a new uts namespace but do not set the optional hostname, having the seed defined means that the hostname in the container UTS namespace is well-defined (it will be whatever the hostname was in the runtime UTS namespace). This is less of an issue for the mount namespace, because with root.path REQUIRED, there's no way to avoid clobbering whatever mounts you got from your seed (which makes not asking for a new mount namespace exciting ;). # Hierarchical namespaces I think "I want this container to run in a new user/pid namespace that is a child of the runtime user/pid namespace" should be something that has a portable config expression. Otherwise it becomes very unclear what to put in the hostID field for (u|g)idMappings, because you don't know what namespace will be used to interpret the hostIDs. # Namespace ownership This is another case where I think specified clarity is essential. A new network namespace will not be very useful if you don't know who owns it. # Glossary changes In review, Mrunal and others pointed out that the new config-linux entries overlapped with the existing glossary entry [7]. I've addressed that in this commit by trimming the glossary entries down to fix them (namespaces are not all hierarchical and processes aren't always in leaves). I've also dropped the handwavy and incorrect "children" sentence in favor of a link back to config-linux.md and the more specific RFC 2119 language from this commit. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a7306ed8d94af729ecef8b6e37506a1c6fc14788 nsfs: add ioctl to get a parent namespace, 2016-09-06 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6786741dbf99e44fb0c0ed85a37582b8a26f1c3b nsfs: add ioctl to get owning user namespace for ns file descriptor, 2016-09-06 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5ff5ce6e20ee22511398bb31fb912466cf82a36 nsfs: Add an ioctl() to return the namespace type, 2017-01-25 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d95fa3c76a66b6d76b1e109ea505c55e66360f3c nsfs: Add an ioctl() to return owner UID of a userns, 2017-01-25 [5]: opencontainers#767 (comment) [6]: opencontainers#767 (comment) [7]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-05-10-21.03.log.html#l-115 But I was talking so my log entries are sparse :/. Signed-off-by: W. Trevor King <wking@tremily.us>

wking · 2017-05-11T07:28:05Z

On Wed, May 10, 2017 at 09:39:10PM -0700, Tianon Gravi wrote: I feel like this is already implied (given that it's how the kernel itself works)…

That's how the kernel works if you're calling unshare(2) or clone(2) from the runtime namespaces to create new container namespaces. But there's nothing in the spec that says you have to do it that way. You could have a namespace-generating process that creates new namespaces and pipes file descriptors for them into the container process on a Unix socket which the container process setns(2)s. That's how the kernel works too, but I doubt it's what most callers will expect.

… and that there isn't a lot of value added by explicitly mentioning it or attempting to validate runtime behavior around it.

If my feeling is that this is a useful addition and yours is “it doesn't add much”, I'd recommend erring on the side of adding three lines to config-linux.md and trimming a line from the glossary (I've included my proposal from [1] in a reroll from c9b6118 → cf8306e). I'm also happy to PR runtime-tools tests if that helps. If there is anything else I can do to reduce the maintenance cost of this change (which I expect to be small), please let me know. [1]: #795 (comment)

hqhq · 2017-05-24T13:33:37Z

glossary.md

-On Linux, a leaf in the [namespace][namespaces.7] hierarchy from which the [runtime](#runtime) process is executed.
-New container namespaces will be created as children of the runtime namespaces.
-
+On Linux, the namespaces from which new [container namespaces](config-linux.md#namespaces) are created.


Why not link to the glossary container namespace above? Looks more suitable to me.

hqhq · 2017-05-24T13:35:17Z

I feel like this is already implied (given that it's how the kernel itself works), and that there isn't a lot of value added by explicitly mentioning it or attempting to validate runtime behavior around it.
I'd vote to either close or defer discussion to post-1.0.

Agreed.

I'd recommend erring on the side of adding three
lines to config-linux.md and trimming a line from the glossary

I'm +1 on this.

vbatts · 2017-05-24T15:54:23Z

config-linux.md

@@ -40,6 +40,9 @@ The following parameters can be specified to setup namespaces:
    The runtime MUST [generate an error](runtime.md#errors) if `path` is not associated with a namespace of type `type`.

    If `path` is not specified, the runtime MUST create a new [container namespace](glossary.md#container-namespace) of type `type`.
+    For hierarchical namespaces (e.g. `pid`, `user`), the new container namespace MUST be a child of the [runtime namespace](glossary.md#runtime-namespace) of that type.
+    For seeded namespaces (e.g. `mount`, `uts`), the new container namespace MUST be seeded by the runtime namespace of that type.


NACK on this language.

vbatts · 2017-05-24T15:55:39Z

A couple of the clarifying sentences are okay, but the seeding and hierarchical namespace language is too prescriptive and not okay to add. closing. add prs for the simpler sentences if still desired

…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>

wking · 2017-05-24T16:53:37Z

On Wed, May 24, 2017 at 08:55:44AM -0700, v1.0.0.batts wrote: A couple of the clarifying sentences are okay, but the seeding and hierarchical namespace language is too prescriptive and not okay to add. closing.

Without these statements, a config including [1] could lead to a new container PID namespace that is *not* a child of the runtime namespace. With a PID namespace hierarchy like: A |-- B `-- C where B is the runtime PID namespace, the container PID namespace D could be created in any of the following positions: A |-- B |-- C `-- D' A |-- B | `-- D'' `-- C A |-- B `-- C `-- D''' A |-- B `-- C `-- E `-- D''' etc. But I'd like to restrict runtimes to only D''. Do you feel that is too perscriptive? Do you want to allow runtimes to use the D', D''', D''', etc., positions? If so, can you explain why? If not, why allow them?

add prs for the simpler sentences if still desired

I've spun the glossary changes out into #852 and #853.

cyphar · 2017-05-24T18:02:23Z

Do you want to allow runtimes to use the D', D''',
D''', etc., positions? If so, can you explain why? If not, why allow
them?

If the runtime can implement all of the relevant lifecycle stuff with those semantics, they can go for it. I don't understand why it is a necessity that ps aux in the caller's namespace must container the container PIDs. That's not required anywhere and would actually break any daemon-based container runtimes (at least it would make their jobs harder for no obviously beneficial reason).

Not to mention there might be some very odd usecases where you might actually want the functionality of joining namespaces you couldn't have created.

Without having something as awful as cgroupsPath for namespaces, mandating any rules on this that aren't extensible on this topic will break implementations and make people very unhappy. Please don't make a PR for nsPath, it will be a waste of time to review (such a path doesn't have any kernel parallel and not every hierarchy is a VFS).

wking · 2017-05-24T18:39:31Z

On Wed, May 24, 2017 at 11:02:24AM -0700, Aleksa Sarai wrote: If the runtime can implement all of the relevant lifecycle stuff with those semantics, they can go for it.

Lifecycle semantics are not the problem, especially if we address #731. If we flip this over to user namespaces (also hierarchical), one problem is “what do I put in hostID?” [1]. If the goal is for the new container namespace to map B's UID 1000 (which happens to be to UID 2000 in A and unmapped in C) to D's PID 0, do I put 1000 in linux.uidMappings.hostID? Do I put 2000 in there? The correct value depends on whether the runtime will create D', D'', D''', etc.

I don't understand why it is a necessity that `ps aux` in the caller's namespace must container the container PIDs. That's not required anywhere and would actually break any daemon-based container runtimes (at least it would make their jobs harder for no obviously beneficial reason).

Perhaps you want the container killed when PID namespace B's init exits. The way to support daemon-based container runtimes is to have those runtimes clearly document their runtime namespace. Calling a ‘create’ action from a shell in namespace B doesn't mean that B is the runtime namespace. The runtime namespace doesn't even have to be on the same host. But whatever the runtime namespace is (which should be implementation-defined), you need the container namespaces to be children/seeded from the runtime namespaces in order to know what to put in linux.uidMappings.hostID and similar or to how to have the runtime create nested PID namespaces. [1]: /~https://github.com/opencontainers/runtime-spec/blob/v1.0.0-rc5/config-linux.md#user-namespace-mappings

…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>

…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. Examples of resources retrieved from this namespace include linux.namespaces[].path and the resctrl psuedo-filesystem used for `linux.intelRdt`, but Mrunal and Michael didn't want me to include the examples in the glossary entry (probably because they could go stale). [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>

wking force-pushed the rfc2119-namespace-relationships branch from 3ee4b06 to 26d592c Compare May 9, 2017 22:15

wking force-pushed the rfc2119-namespace-relationships branch from 26d592c to 79ee80a Compare May 9, 2017 22:19

This was referenced May 9, 2017

config-linux: RFC 2119 tightening for namespaces #767

Merged

config-linux: Many config properties without RFC 2119 language #746

Closed

wking force-pushed the rfc2119-namespace-relationships branch from 79ee80a to c9b6118 Compare May 10, 2017 22:27

wking mentioned this pull request May 11, 2017

config: Clarify mounts[].source relative path anchor #735

Merged

wking force-pushed the rfc2119-namespace-relationships branch from c9b6118 to cf8306e Compare May 11, 2017 07:23

hqhq reviewed May 24, 2017

View reviewed changes

vbatts reviewed May 24, 2017

View reviewed changes

vbatts closed this May 24, 2017

wking mentioned this pull request May 24, 2017

glossary: Remove "leaf" and "hierarchy" from container namespace definition #852

Merged

wking mentioned this pull request May 24, 2017

glossary: Remove "leaf" and "hierarchy" from runtime namespace definition #853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config-linux: Specify relationships for new namespaces #795

config-linux: Specify relationships for new namespaces #795

wking commented May 9, 2017 •

edited

Loading

wking commented May 10, 2017

tianon commented May 11, 2017

wking commented May 11, 2017 via email

hqhq May 24, 2017

hqhq commented May 24, 2017

vbatts May 24, 2017

vbatts commented May 24, 2017

wking commented May 24, 2017 via email

cyphar commented May 24, 2017

wking commented May 24, 2017 via email

config-linux: Specify relationships for new namespaces #795

config-linux: Specify relationships for new namespaces #795

Conversation

wking commented May 9, 2017 • edited Loading

Seeded namespaces

Hierarchical namespaces

Namespace ownership

wking commented May 10, 2017

tianon commented May 11, 2017

wking commented May 11, 2017 via email

hqhq May 24, 2017

Choose a reason for hiding this comment

hqhq commented May 24, 2017

vbatts May 24, 2017

Choose a reason for hiding this comment

vbatts commented May 24, 2017

wking commented May 24, 2017 via email

cyphar commented May 24, 2017

wking commented May 24, 2017 via email

wking commented May 9, 2017 •

edited

Loading