-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix various SELinux policy issues #1729
Conversation
Previosly, an unprivileged process trying to write to its own files under `/proc/self` could trigger an "associate" denial since `/proc` has a filesystem context of `any_t`. Giving `/proc` its own label lets subject labels be associated without also letting them be created on filesystems like `/run` and `/tmp`. Signed-off-by: Ben Cressey <bcressey@amazon.com>
These types were mostly treated like `any_t` and `local_t`, where all processes could freely write to the files. The special case was for trusted processes, where directories created on an `unlabeled_t` path would end up with the `local_t` label. In combination with some mount options, this caused files on `/local` to end up with the right labels, even if the filesystem was created on a system without SELinux enabled. This might happen when using a custom disk image as the source for the secondary storage volume. However, a filesystem that's created by a bootstrap container won't necessarily be mounted with the right options, and the `unlabeled_t` label would continue to propagate. That would prevent the named file transitions used to label Docker and containerd directories from taking place, which would make them less secure. We can simplify the policy and avoid this problem by treating unknown or unrecognized types as already having the `local_t` label. Signed-off-by: Ben Cressey <bcressey@amazon.com>
The mount options are no longer needed, now that objects with missing or invalid labels are treated as `local_t`. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Loading a kernel module is a privileged operation, so writing to the location where build files like `objtool` are stored should also be privileged. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Previously we had two use cases for `local_t`. It was the label used for most files and directories on `/local`, and therefore the label that most hostPath mounts would have. It was also the label of the container root filesystem, and therefore the label that external volumes, emptyDir mounts, and other private storage would have. For MCS isolation, it's useful to have distinct types to assert that one type always has a level with categories, and another type never does. That way, the constraints can be applied only to the files that are meant to be private to a pod or container. Signed-off-by: Ben Cressey <bcressey@amazon.com>
If `defaultrange` is not specified in the policy, the lower part of the range from the source process is applied to all new files. Unprivileged containers will run with a process label that includes two category pairs, so the files get the label we expect. Privileged containers, on the other hand, may run with these labels: * `system_u:system_r:control_t:s0-s0:c0.c1023` * `system_u:system_r:super_t:s0` In both cases, the lower range of the process is just `s0`, and files would end up with that. This would allow unprivileged containers to also modify the files. We can avoid this by using the target's range instead, since Docker and containerd CRI will ensure that volume mounts are labeled with the appropriate range. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Conceptually, anything with the `control_t` label has access to all categories. Setting the range makes this explicit in the output of tools like `ps`. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Setting the level gives our host and bootstrap containers the same range of categories as all other privileged containers, and means that all containers will run with some categories specified. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Are these labels persisted through updates? If an existing Bottlerocket instance has files with these labels from old containers and is updated to a new OS image that includes these policy changes, do those old files need to be relabeled? |
I've confirmed that this fixes #1651. |
Yes, the labels can be persisted - for example, the VPC CNI plugin writes these files:
Adding the new So I don't think we need any explicit relabel operations. It's something I'd like to avoid if possible since it's rather expensive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👢 ⛵
Issue number:
#1617, #1651, #1712
Description of changes:
This includes some SELinux policy fixes, as well as some refactoring to pave the way for MCS isolation. More details are available in the commit messages, but at a high level, here's what's changing.
A new
proc_t
object type is added to fix an AVC denial that processes could encounter when writing to/proc
. This didn't appear to cause any real trouble; as far as I can tell, this wasn't surfaced to programs as an error.The
unlabeled_t
andexternal_t
object types are removed, and aliases are added forlocal_t
. The distinction between different types of local files was never very useful and caused problems when creating new filesystems via bootstrap containers, as discussed in #1617. This should also help with #1651.While cleaning up the
local_t
mount options that are no longer needed, I fixed an oversight with the/usr/src/kernels
mount, which is not intended to be writable by unprivileged containers.Finally there are the changes that lay the groundwork for MCS isolation. Adding a distinct type for container files -
data_t
vs.local_t
- enables us to assert thatdata_t
must always follow the isolation rules. Running privileged and host containers with the full MCS range clarifies the policy goal that they are allowed to bypass isolation.Specifying the default range transition fixes two problems, both caused by using the source context. Privileged containers would create files with the wrong level - either too weak (
s0
) or too strong (s0-s0:c0.c1023
). Unprivileged containers would add their MCS pairs to object types that aren't meant to be isolated, such asany_t
.Testing done:
Verified that nodes came up and sonobuoy passed with no SELinux denials. Previously, nodes would sometimes log a denial for
httpd
because of the "associate" issue that's now resolved.Confirmed that the directories under
/var/lib/{host,}containerd
had the expected labels.Verified that only
container_t
anddata_t
types use MCS pairs:Verified that only
control_t
andsuper_t
types use the full MCS range:Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.