Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add using include/exclude kinds and namespaces #395

Merged
merged 10 commits into from
Jan 14, 2025

Conversation

afdesk
Copy link
Contributor

@afdesk afdesk commented Aug 26, 2024

Problem

After v0.51.0 to successfully scan a Kubernetes cluster, Trivy must be executed under a role that has read permissions at the cluster scope.

If a user tries to run trivy k8s under a limited account there will be a lot of error message (ex here).

I can reproduce it for limiteduser with the role:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
trivy k8s --report summary --disable-node-collector --kubeconfig ./k8s/configs/limiteduser-kubeconfig --include-namespaces default --include-kinds pod 
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: apps/v1, Resource=deployments - deployments.apps is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"deployments\" in API group \"apps\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=pods - pods is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: apps/v1, Resource=replicasets - replicasets.apps is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"replicasets\" in API group \"apps\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=replicationcontrollers - replicationcontrollers is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"replicationcontrollers\" in API group \"\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: apps/v1, Resource=statefulsets - statefulsets.apps is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"statefulsets\" in API group \"apps\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: apps/v1, Resource=daemonsets - daemonsets.apps is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"daemonsets\" in API group \"apps\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: batch/v1, Resource=cronjobs - cronjobs.batch is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"cronjobs\" in API group \"batch\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: batch/v1, Resource=jobs - jobs.batch is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=services - services is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"services\" in API group \"\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=serviceaccounts - serviceaccounts is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"serviceaccounts\" in API group \"\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=configmaps - configmaps is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: rbac.authorization.k8s.io/v1, Resource=roles - roles.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"roles\" in API group \"rbac.authorization.k8s.io\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: rbac.authorization.k8s.io/v1, Resource=rolebindings - rolebindings.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"rolebindings\" in API group \"rbac.authorization.k8s.io\" at the cluster scope"
2024-08-27T00:25:22+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: networking.k8s.io/v1, Resource=networkpolicies - networkpolicies.networking.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"networkpolicies\" in API group \"networking.k8s.io\" at the cluster scope"
2024-08-27T00:25:23+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: networking.k8s.io/v1, Resource=ingresses - ingresses.networking.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"ingresses\" in API group \"networking.k8s.io\" at the cluster scope"
2024-08-27T00:25:23+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=resourcequotas - resourcequotas is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"resourcequotas\" in API group \"\" at the cluster scope"
2024-08-27T00:25:23+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=limitranges - limitranges is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"limitranges\" in API group \"\" at the cluster scope"
2024-08-27T00:25:23+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: rbac.authorization.k8s.io/v1, Resource=clusterroles - clusterroles.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"clusterroles\" in API group \"rbac.authorization.k8s.io\" at the cluster scope"
2024-08-27T00:25:23+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: rbac.authorization.k8s.io/v1, Resource=clusterrolebindings - clusterrolebindings.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"clusterrolebindings\" in API group \"rbac.authorization.k8s.io\" at the cluster scope"
2024-08-27T00:25:24+06:00	ERROR	Unable to list resources	error="failed listing resources for gvr: /v1, Resource=nodes - nodes is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"
2024-08-27T00:25:24+06:00	ERROR	Unable to list node resources	error="nodes is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"

Reason

Trivy must be able to access information about all cluster resources, including pods, deployments etc.

for _, gvr := range grvs {
dclient := c.getDynamicClient(gvr)
resources, err := dclient.List(ctx, v1.ListOptions{})
if err != nil {
lerr := fmt.Errorf("failed listing resources for gvr: %v - %w", gvr, err)
if errors.IsNotFound(err) || errors.IsForbidden(err) {
slog.Error("Unable to list resources", "error", lerr)
continue
}
return nil, lerr
}

Flags include/exclude kinds and namespaces are used only for the result filter.
for _, resource := range resources.Items {
if c.ignoreResource(resource) {
continue
}
// if excludeOwned is enabled and the resource is owned by built-in workload, then we skip it
if c.excludeOwned && c.hasOwner(resource) {
continue
}
// filter resources by kind
if FilterResources(c.includeKinds, c.excludeKinds, resource.GetKind()) {
continue
}
// filter resources by namespace
if FilterResources(c.includeNamespaces, c.excludeNamespaces, resource.GetNamespace()) {
continue
}

Solution

This PR suggests next solution:

  1. Collect scannable resources based on --include-kinds/--exclude-kinds flags.
  2. Collect scannable namespaces based on --include-namespaces/--exclude-namespaces flags. If the namespaces are not provided.
  3. Run Trivy for resources under each namespace.
  4. If resources and namespaces are not provided, Trivy will run as currently at the cluster scope".
  5. For exclude-namespaces a user has to have a cluster role to receive all namespaces and filter excluded ones.

Result:

$ ./trivy k8s --report summary --disable-node-collector --kubeconfig ./k8s/configs/limiteduser-kubeconfig --include-namespaces default --include-kinds pod

1 / 1 [----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 100.00% 0 p/s

Summary Report for default


Workload Assessment
┌───────────┬────────────────────────────────────┬───────────────────────────────┬────────────────────┬───────────────────┐
│ Namespace │              Resource              │        Vulnerabilities        │ Misconfigurations  │      Secrets      │
│           │                                    ├─────┬──────┬──────┬──────┬────┼───┬───┬───┬────┬───┼───┬───┬───┬───┬───┤
│           │                                    │  C  │  H   │  M   │  L   │ U  │ C │ H │ M │ L  │ U │ C │ H │ M │ L │ U │
├───────────┼────────────────────────────────────┼─────┼──────┼──────┼──────┼────┼───┼───┼───┼────┼───┼───┼───┼───┼───┼───┤
│ default   │ Pod/my-web-deploy-6dbcdb8c54-kbptl │ 157 │ 1336 │ 3108 │ 1529 │ 85 │   │ 1 │ 4 │ 10 │   │   │   │   │   │   │
└───────────┴────────────────────────────────────┴─────┴──────┴──────┴──────┴────┴───┴───┴───┴────┴───┴───┴───┴───┴───┴───┘
Severities: C=CRITICAL H=HIGH M=MEDIUM L=LOW U=UNKNOWN


Infra Assessment
┌───────────┬──────────┬───────────────────┬───────────────────┬───────────────────┐
│ Namespace │ Resource │  Vulnerabilities  │ Misconfigurations │      Secrets      │
│           │          ├───┬───┬───┬───┬───┼───┬───┬───┬───┬───┼───┬───┬───┬───┬───┤
│           │          │ C │ H │ M │ L │ U │ C │ H │ M │ L │ U │ C │ H │ M │ L │ U │
└───────────┴──────────┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
Severities: C=CRITICAL H=HIGH M=MEDIUM L=LOW U=UNKNOWN

@afdesk afdesk marked this pull request as ready for review August 26, 2024 18:45
@afdesk afdesk marked this pull request as draft October 14, 2024 06:02
@simar7
Copy link
Member

simar7 commented Nov 2, 2024

FYI this might be related aquasecurity/trivy#7107

@afdesk
Copy link
Contributor Author

afdesk commented Nov 4, 2024

FYI this might be related aquasecurity/trivy#7107

thanks. I saw it
this fix just skips the permission error: /~https://github.com/aquasecurity/trivy-kubernetes/pull/374/files

@afdesk afdesk marked this pull request as ready for review November 27, 2024 09:20
@afdesk afdesk requested a review from simar7 November 27, 2024 09:20
@itaysk
Copy link

itaysk commented Nov 28, 2024

Reason Trivy must be able to access information about all cluster resources

so today in order to run any scan, the user needs cluster permissions? if so, indeed sounds like a bug, since we should support the use case of scanning only specific namespaced workloads.

Flags include/exclude kinds and namespaces are used only for the result filter.

needless to say, when the user includes something, we should scan only that, and when they exclude something we should scan everything but this (we can discuss what "everything" means, if we want to never scan some resources).
In either case, Trivy should not error on not-included or excluded resources for any reason (including permissoins). Also the user is responsible for scoping the scan to something they have permissions to.
the steps you describe seem to align with this.

Then there is the question of if Trivy should error and fail when attempting to scan unauthorised resources. I think that for included resources it should - similar to how Trivy would fail for scanning unreachable container image. But for non-excluded resources, maybe it shouldn't fail (since the user isn't aware of everything), and just report it. If supporting this behavior is out of scope for this PR, it's fine I'm just documenting my thoughts.

@afdesk
Copy link
Contributor Author

afdesk commented Nov 28, 2024

so today in order to run any scan, the user needs cluster permissions?

yes, after v0.51.0 there is no way to set up a specific namespace, so all request run at the cluster scope.
so all service account with any Role (not ClusterRole) receives nothing for every resources.

we can discuss what "everything" means, if we want to never scan some resources

right now, "everything" for resources means next namespaced items: Deployments, Pods, ReplicaSets, ReplicationControllers, StatefulSets, DaemonSets, CronJobs, Jobs, Services, ServiceAccounts, ConfigMaps, Roles, RoleBindings, NetworkPolicies, Ingresses, ResourceQuotas, LimitRanges.
and also cluster resources ClusterRoles, ClusterRoleBindings and Nodes.

so a user can exclude any kind of this list.

there is a problem for exclude namespaces. I found out only one way to do it: get all namespaces and exclude specified ones. but such request requires a cluster role again.

@itaysk
Copy link

itaysk commented Nov 28, 2024

after v0.51.0 there is no way to set up a specific namespace

do you mean there is no way to include a specific namespace? meaning to scan just a specific namespace?

there is a problem for exclude namespaces

do you mean there's a problem in our specific implementation, or a general problem supporting such behavior?

@afdesk
Copy link
Contributor Author

afdesk commented Nov 28, 2024

after v0.51.0 there is no way to set up a specific namespace

do you mean there is no way to include a specific namespace? meaning to scan just a specific namespace?

yes, a user can't specify a specific namespace to scan now, --include-namespaces works only as a filter.

there is a problem for exclude namespaces

do you mean there's a problem in our specific implementation, or a general problem supporting such behavior?

it's a good question, I'll take a look at another implementation

@afdesk
Copy link
Contributor Author

afdesk commented Nov 28, 2024

do you mean there's a problem in our specific implementation, or a general problem supporting such behavior?

my thought is next:
to exclude some namespace, we need to get a list of namespaces, butnamespaces isn't a namespaced resource, so we need a cluster role to get a namespaces list.
or I missed something

UPD: so it seems It's a general problem such behavior.
@simar7 @itaysk wdyt?

@afdesk afdesk requested a review from simar7 November 28, 2024 14:05
@itaysk
Copy link

itaysk commented Nov 28, 2024

I understand.. in order to exclude namespaces you need to know all available namespaces. can you please point me to the relevant piece of code where this happens?

@afdesk
Copy link
Contributor Author

afdesk commented Nov 28, 2024

I understand.. in order to exclude namespaces you need to know all available namespaces. can you please point me to the relevant piece of code where this happens?

sure, in the current code:

for _, resource := range resources.Items {
if c.ignoreResource(resource) {
continue
}
// if excludeOwned is enabled and the resource is owned by built-in workload, then we skip it
if c.excludeOwned && c.hasOwner(resource) {
continue
}
// filter resources by kind
if FilterResources(c.includeKinds, c.excludeKinds, resource.GetKind()) {
continue
}
// filter resources by namespace
if FilterResources(c.includeNamespaces, c.excludeNamespaces, resource.GetNamespace()) {
continue
}

@afdesk
Copy link
Contributor Author

afdesk commented Nov 28, 2024

@simar7
Copy link
Member

simar7 commented Dec 2, 2024

@afdesk
Copy link
Contributor Author

afdesk commented Dec 3, 2024

my suggestion:
/~https://github.com/afdesk/trivy-kubernetes/blob/3cee92f75e565d7ea68352319cd001a4bad720d1/pkg/trivyk8s/trivyk8s.go#L200-L215

Wouldn't all namespaces be cluster level resources anyway? Maybe I don't understand this logic /~https://github.com/afdesk/trivy-kubernetes/blob/3cee92f75e565d7ea68352319cd001a4bad720d1/pkg/trivyk8s/trivyk8s.go#L542-L546

yes, you're right. a customer needs a cluster role to get all namespaces.

@itaysk
Copy link

itaysk commented Dec 3, 2024

I think there's no way around it, so let's just make the experience clear:

  1. if the user uses include command, no need for elevated permissions
  2. if the user uses exclude command, require only necessary permissions (e.g list namespaces)
  3. document the required permissions
  4. when failing for this reason, explain the cause and solution (e.g list namespaces permissions required when using the --exclude-namespaces flag)

@simar7
Copy link
Member

simar7 commented Dec 3, 2024

To summarize,

  1. If user doesn't ensure that they have enough permissions and tries to run Trivy operator, we should fail out and show to the user why we failed. It's the users responsibility to ensure proper permissions are obtained prior to scanning.
  2. If a user has enough permissions to scan but only requests a subset of resources, in such case we should only scan the subset. I believe this is different from today's behaviour where we scan everything & filter.

I believe this PR addresses point 2, but we should address point 1 if we don't already.

@afdesk
Copy link
Contributor Author

afdesk commented Dec 3, 2024

  1. If user doesn't ensure that they have enough permissions and tries to run Trivy operator, we should fail out and show to the user why we failed. It's the users responsibility to ensure proper permissions are obtained prior to scanning.

I've added a specific log message if there is no permission to get a namespaces list:
/~https://github.com/afdesk/trivy-kubernetes/blob/fix/perms/pkg/trivyk8s/trivyk8s.go#L206-L210

For example, if a limited user tries to exclude kube-system namespace:

$ ./trivy k8s --report summary --kubeconfig myconfig --exclude-namespaces kube-system
2024-12-03T16:09:23+06:00       FATAL   Fatal error     get k8s artifacts with node info error: 'exclude namespaces' option requires a cluster role with permissions to list namespaces

If a customer uses a limited role without some permissions, there will be shown an error message.
For example, if list isn't grant for "rbac.authorization.k8s.io":

2024-12-03T15:48:57+06:00       ERROR   Unable to list resources        error="failed listing resources for gvr: rbac.authorization.k8s.io/v1, Resource=roles - roles.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"roles\" in API group \"rbac.authorization.k8s.io\" in the namespace \"dev\""
2024-12-03T15:48:57+06:00       ERROR   Unable to list resources        error="failed listing resources for gvr: rbac.authorization.k8s.io/v1, Resource=rolebindings - rolebindings.rbac.authorization.k8s.io is forbidden: User \"system:serviceaccount:default:limiteduser\" cannot list resource \"rolebindings\" in API group \"rbac.authorization.k8s.io\" in the namespace \"dev\""

@afdesk
Copy link
Contributor Author

afdesk commented Dec 3, 2024

Just FYI.
I wrote some scripts for tests and auto-deploy /~https://github.com/afdesk/k8s-samples/tree/main/limited-user
It may be useful for integration tests in Trivy in the future.

@afdesk
Copy link
Contributor Author

afdesk commented Dec 3, 2024

@simar7 PTAL, when you have time. thanks!

}

// getNamespaces collects scannable namespaces
func (c *client) getNamespaces() ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems untested, can we add some tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 75fd906

Comment on lines +208 to +210
if errors.IsForbidden(err) {
return result, fmt.Errorf("'exclude namespaces' option requires a cluster role with permissions to list namespaces")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can occur if a user passes in --include-namespaces and they don't have cluster level permissions as well right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can occur if a user passes in --include-namespaces and they don't have cluster level permissions as well right?

no, it can occur only if len(c.excludeNamespaces) != 0
also a cluster role isn't required for --include-namespaces

}
return result, nil
}

// ListArtifacts returns kubernetes scannable artifacs.
func (c *client) ListArtifacts(ctx context.Context) ([]*artifacts.Artifact, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it.
ListArtifacts consists from several functions, which already tested: c.initResourceList() and c.getNamespaces().

c.ListSpecificArtifacts(ctx) had no tests, and I suggest to do it in a separated PR. wdyt?


// collect only included kinds
if len(c.includeKinds) != 0 {
// `includeKinds` are already low cased.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment was improved eed731d

@afdesk afdesk requested a review from simar7 December 4, 2024 10:39
@afdesk afdesk requested a review from simar7 January 10, 2025 07:24
@simar7 simar7 merged commit 0d5396d into aquasecurity:main Jan 14, 2025
4 checks passed
@afdesk afdesk deleted the fix/perms branch January 14, 2025 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants