Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix various SELinux policy issues #1729

Merged
merged 8 commits into from
Sep 9, 2021

Conversation

bcressey
Copy link
Contributor

@bcressey bcressey commented Sep 2, 2021

Issue number:
#1617, #1651, #1712

Description of changes:
This includes some SELinux policy fixes, as well as some refactoring to pave the way for MCS isolation. More details are available in the commit messages, but at a high level, here's what's changing.

A new proc_t object type is added to fix an AVC denial that processes could encounter when writing to /proc. This didn't appear to cause any real trouble; as far as I can tell, this wasn't surfaced to programs as an error.

The unlabeled_t and external_t object types are removed, and aliases are added for local_t. The distinction between different types of local files was never very useful and caused problems when creating new filesystems via bootstrap containers, as discussed in #1617. This should also help with #1651.

While cleaning up the local_t mount options that are no longer needed, I fixed an oversight with the /usr/src/kernels mount, which is not intended to be writable by unprivileged containers.

Finally there are the changes that lay the groundwork for MCS isolation. Adding a distinct type for container files - data_t vs. local_t - enables us to assert that data_t must always follow the isolation rules. Running privileged and host containers with the full MCS range clarifies the policy goal that they are allowed to bypass isolation.

Specifying the default range transition fixes two problems, both caused by using the source context. Privileged containers would create files with the wrong level - either too weak (s0) or too strong (s0-s0:c0.c1023). Unprivileged containers would add their MCS pairs to object types that aren't meant to be isolated, such as any_t.

Testing done:
Verified that nodes came up and sonobuoy passed with no SELinux denials. Previously, nodes would sometimes log a denial for httpd because of the "associate" issue that's now resolved.

Confirmed that the directories under /var/lib/{host,}containerd had the expected labels.

Verified that only container_t and data_t types use MCS pairs:

# find -context '*:s0:c*,c*' ! -context '*:data_t:*' ! -context '*:container_t:*'
<no output>

Verified that only control_t and super_t types use the full MCS range:

# find -context '*:s0-s0:c*.c*' ! -context '*:control_t:*' ! -context '*:super_t:*'
<no output>

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Previosly, an unprivileged process trying to write to its own files
under `/proc/self` could trigger an "associate" denial since `/proc`
has a filesystem context of `any_t`.

Giving `/proc` its own label lets subject labels be associated
without also letting them be created on filesystems like `/run` and
`/tmp`.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
These types were mostly treated like `any_t` and `local_t`, where all
processes could freely write to the files. The special case was for
trusted processes, where directories created on an `unlabeled_t` path
would end up with the `local_t` label.

In combination with some mount options, this caused files on `/local`
to end up with the right labels, even if the filesystem was created
on a system without SELinux enabled. This might happen when using a
custom disk image as the source for the secondary storage volume.

However, a filesystem that's created by a bootstrap container won't
necessarily be mounted with the right options, and the `unlabeled_t`
label would continue to propagate. That would prevent the named file
transitions used to label Docker and containerd directories from
taking place, which would make them less secure.

We can simplify the policy and avoid this problem by treating unknown
or unrecognized types as already having the `local_t` label.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
The mount options are no longer needed, now that objects with missing
or invalid labels are treated as `local_t`.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Loading a kernel module is a privileged operation, so writing to the
location where build files like `objtool` are stored should also be
privileged.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Previously we had two use cases for `local_t`. It was the label used
for most files and directories on `/local`, and therefore the label
that most hostPath mounts would have. It was also the label of the
container root filesystem, and therefore the label that external
volumes, emptyDir mounts, and other private storage would have.

For MCS isolation, it's useful to have distinct types to assert that
one type always has a level with categories, and another type never
does. That way, the constraints can be applied only to the files that
are meant to be private to a pod or container.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
If `defaultrange` is not specified in the policy, the lower part of
the range from the source process is applied to all new files.

Unprivileged containers will run with a process label that includes
two category pairs, so the files get the label we expect.

Privileged containers, on the other hand, may run with these labels:
* `system_u:system_r:control_t:s0-s0:c0.c1023`
* `system_u:system_r:super_t:s0`

In both cases, the lower range of the process is just `s0`, and files
would end up with that. This would allow unprivileged containers to
also modify the files.

We can avoid this by using the target's range instead, since Docker
and containerd CRI will ensure that volume mounts are labeled with
the appropriate range.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Conceptually, anything with the `control_t` label has access to all
categories. Setting the range makes this explicit in the output of
tools like `ps`.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Setting the level gives our host and bootstrap containers the same
range of categories as all other privileged containers, and means
that all containers will run with some categories specified.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
packages/selinux-policy/fs.cil Show resolved Hide resolved
@samuelkarp
Copy link
Contributor

Privileged containers would create files with the wrong level - either too weak (s0) or too strong (s0-s0:c0.c1023). Unprivileged containers would add their MCS pairs to object types that aren't meant to be isolated, such as any_t.

Are these labels persisted through updates? If an existing Bottlerocket instance has files with these labels from old containers and is updated to a new OS image that includes these policy changes, do those old files need to be relabeled?

@bcressey
Copy link
Contributor Author

bcressey commented Sep 3, 2021

I've confirmed that this fixes #1651.

@bcressey
Copy link
Contributor Author

bcressey commented Sep 3, 2021

Are these labels persisted through updates? If an existing Bottlerocket instance has files with these labels from old containers and is updated to a new OS image that includes these policy changes, do those old files need to be relabeled?

Yes, the labels can be persisted - for example, the VPC CNI plugin writes these files:

# ls -1Z /run/aws-node/ipam.json /var/log/aws-routed-eni/ipamd.log
system_u:object_r:any_t:s0:c210,c729 /run/aws-node/ipam.json
system_u:object_r:local_t:s0:c210,c729 /var/log/aws-routed-eni/ipamd.log

Adding the new data_t type will let us say that only data_t objects should have MCS isolation enforced. After an upgrade, any files with local_t:s0:cX,cY labels will be treated as though they were just local_t. On downgrade, any files with a persisted data_t label will be treated as unlabeled_t, since that label didn't exist in older policies. In both cases, reads and writes will be allowed for any subject.

So I don't think we need any explicit relabel operations. It's something I'd like to avoid if possible since it's rather expensive.

Copy link
Contributor

@arnaldo2792 arnaldo2792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👢 ⛵

@bcressey bcressey merged commit a6f6262 into bottlerocket-os:develop Sep 9, 2021
@bcressey bcressey deleted the policy-fixes branch September 9, 2021 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants