Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: "trace" command #744

Closed
suskin opened this issue Sep 4, 2019 · 6 comments · Fixed by #978
Closed

cli: "trace" command #744

suskin opened this issue Sep 4, 2019 · 6 comments · Fixed by #978
Assignees
Labels
enhancement New feature or request

Comments

@suskin
Copy link
Member

suskin commented Sep 4, 2019

What seems to be the problem?

During problem solving and troubleshooting of issues (especially with workloads), I often end up doing a kubectl get on the relevant resource, then a bunch of gets on related resources, to see the statuses of everything related to what I was trying to do.

How can Crossplane help?

It seems like it'd be useful to have a trace command which would walk the related objects and print out statuses for me. I imagine it working something like kubectl crossplane trace mysqlinstance wordpressdatabase. I believe the crossplane portion of the invocation is not necessary, but it may make sense to start with it. I could also imagine a --maxdepth type option.

Because grabbing information on objects related to a particular object seems like a common troubleshooting use-case, it is possible that a tool to do this already exists or is in development elsewhere.

@suskin suskin added enhancement New feature or request troubleshooting labels Sep 4, 2019
@suskin suskin self-assigned this Sep 20, 2019
@suskin
Copy link
Member Author

suskin commented Sep 20, 2019

Assigning to myself for tracking purposes, but I am not starting work on this right now.

@turkenh
Copy link
Member

turkenh commented Oct 14, 2019

Here is my proposal:

kubectl crossplane trace RESOURCE_TYPE RESOURCE_NAME [-n|--namespace NAMESPACE]

This command will provide debugging information for Crossplane resources by tracing the related objects and printing helpful information. To achieve this, we need a way to find the next/related object(s) for a given one. There are different types of relations between Crossplane resources and need to be handled separately. Command can be invoked for any object in the chain and information for resources down to last object will be shown as output.

Relation Chain:

Stack Instance -> Kubernetesapplication: using owner reference
Stack Instance -> Other Possible Resources (namespaces,pod,secrets etc): using owner reference? too costly if we need to check all types...
KubernetesApplication -> KubernetesCluster: .spec.clusterSelector.matchLabels or .status.clusterRef
KubernetesApplication -> Other Resources Claims: .spec.resourceSelector.matchLabels
Kubernetesapplication -> KubernetesApplicationResources: .spec.resourceTemplates[*].metadata.name
Resource claim -> Resource: .spec.resourceRef

Questions:

  • We can enhance user experience with formatted and colored texts (i.e. errors in bold/red), would this work for all terminals?
  • Shall we include last-transition/update-time from conditions info, do they add value?

Implementation:

Just like existing Crossplane cli commands, kubectl plugin mechanism will be used. Existing commands implemented in bash and in general, it makes sense to switch to a more powerful language like go or python. However bash seems to be good enough for this command as well, thanks to kubectl output format options like -o jsonpath.

We can consider implementing two different flavours, as follows:
kubectl crossplane trace ... => prints conditions only if there is a failed condition
kubectl crossplane trace ... --all => prints all conditions for all resource

Possible output for a fictitious state:

$ kubectl crossplane trace kubernetesapplication wordpress-app-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45

KUBERNETES APPLICATIONS
---
NAME                                                 CLUSTER                                                  STATUS      DESIRED   SUBMITTED
wordpress-app-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   wordpress-cluster-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Submitted   3         3

Status Conditions:
TYPE	STATUS  	REASON
Synced	True  		Successfully reconciled managed resource
-----------------------
KUBERNETES APPLICATION RESOURCES

NAME                                                             TEMPLATE-KIND   TEMPLATE-NAME   CLUSTER                                                  STATUS
wordpress-demo-deployment-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Deployment      wordpress       wordpress-cluster-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Submitted

Status Conditions:
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource

Remote 
AVAILABLE-REPLICAS	READY-REPLICAS	REPLICAS	UPDATED-REPLICAS
1					1				1			1
Remote Conditions:
TYPE		STATUS  	REASON										MESSAGE
Progressing	True  		NewReplicaSetAvailable						ReplicaSet "wordpress-68d579bcb4" has successfully progressed.
Available	True  		MinimumReplicasAvailable				 	Deployment has minimum availability.
---
NAME                                                             TEMPLATE-KIND   TEMPLATE-NAME   CLUSTER                                                  STATUS
wordpress-demo-namespace-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45    Namespace       wordpress       wordpress-cluster-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Submitted

Status Conditions:
TYPE	STATUS  LAST-TRANSITION-TIME 	REASON
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource

Remote:
PHASE
Active
---
NAME                                                             TEMPLATE-KIND   TEMPLATE-NAME   CLUSTER                                                  STATUS
wordpress-demo-service-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45      Service         wordpress       wordpress-cluster-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Submitted

Status Conditions:
TYPE	STATUS  LAST-TRANSITION-TIME 	REASON
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource

Remote:
LOADBALANCER-INGRESS-IP
35.222.234.64
--------------------------------
KUBERNETES CLUSTER
---
NAME                                                     STATUS   CLUSTER-CLASS      CLUSTER-REF   AGE
wordpress-cluster-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Bound    standard-cluster                 24h

Status Conditions:
TYPE	STATUS  LAST-TRANSITION-TIME 	REASON
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource
Ready	True 	2019-10-10T06:55:55Z 	Managed resource is available for use

|||

NAME                                                     STATUS   STATE     CLUSTER-NAME                               ENDPOINT        CLUSTER-CLASS   LOCATION        RECLAIM-POLICY   AGE
kubernetescluster-5c843147-069e-4a94-81d3-188c9e0fbd9c   Bound    ERROR   gke-0b9ec875-4ee9-409d-886a-94751fb1a32e   35.193.166.52   standard-gke    us-central1-b   Delete           24h

Status Conditions:
TYPE	STATUS  LAST-TRANSITION-TIME 	REASON										MESSAGE
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource
Ready	False 	2019-10-10T06:55:55Z 	Managed resource is not available for use gke cluster is in ERROR state with message: Retry budget exhausted (10 attempts): Services range "services" in network "example-network", subnetwork "example-subnetwork" is already used by another cluster.																																	
-----------------
MANAGED RESOURCES
---
MySQLInstance
NAME                                                   STATUS   CLASS            VERSION   AGE
wordpress-mysql-64edc6f9-7c70-43ed-bd1d-1c26e09e0a45   Bound    standard-mysql   5.7       25h

Status Conditions:
TYPE	STATUS  LAST-TRANSITION-TIME 	REASON
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource
Ready	True 	2019-10-10T06:55:55Z 	Managed resource is available for use

|||

NAME                                                 STATUS   STATE      CLASS               VERSION     AGE
mysqlinstance-e3460958-271d-4cf7-a468-fa21f360b334   Bound    RUNNABLE   standard-cloudsql   MYSQL_5_7   25h
Status Conditions:
TYPE	STATUS  LAST-TRANSITION-TIME 	REASON
Synced	True 	2019-10-10T06:52:15Z 	Successfully reconciled managed resource
Ready	True 	2019-10-10T06:55:55Z 	Managed resource is available for use

@suskin
Copy link
Member Author

suskin commented Oct 14, 2019

Awesome! I'm thinking there are a couple more types of chains not covered by the existing list that we might want to follow:

  • ResourceClaim -> ResourceClass, Provider, for debugging configuration issues
  • Resource -> ResourceClaim, for debugging resources and to look for orphans
  • Policy or Provider -> Resource and ResourceClaim objects
  • Provider -> everything using that provider

For the question of:

We can enhance user experience with formatted and colored texts (i.e. errors in bold/red), would this work for all terminals?

I wouldn't worry about this for now, since the main improvement offered by trace would be the core chain-following functionality. But if we wanted to add it, the simplest way to make it work for all terminals would be to add an option to disable it.

For the question of:

Shall we include last-transition/update-time from conditions info, do they add value?

I would say yes, these definitely add value when debugging.

Possible output for a fictitious state

I am imagining that the output of this tool could be input for another tool (in the spirit of the Unix Philosophy), so I would try to put all relevant information on individual lines where possible. That said, we can always start with something we think is pretty close, and change things in the future : ).

Based on the sample output, it sounds like you are imagining output which feels very similar to kubectl get. I agree with this, and it'd be great if we supported some of the same familiar options. I suppose by options I especially mean a default concise view (like kubectl get with no formatting arguments), and a full view (like kubectl get -o yaml). One way to do this would be to structure the trace command as a thing which walks the chains and feeds the objects to kubectl get commands, passing through any extra arguments the user specified.

bash seems to be good enough for this command as well

You may have a better idea of what a bash implementation may look like at this point than I do, but I am thinking the chain-walking logic may be tricky to do cleanly in bash. But maybe not!

too costly if we need to check all types...

I was imagining that we could probably design a labeling scheme that made this walk much simpler to follow from trace, but I stopped short of designing such a scheme. Keep in mind that if we come up with a labeling scheme (or some other change) which makes trace a lot more effective, we can also consider getting support for the labeling scheme into the core crossplane and infrastructure stacks code.

@turkenh
Copy link
Member

turkenh commented Oct 15, 2019

Awesome! I'm thinking there are a couple more types of chains not covered by the existing list that we might want to follow:

  • ResourceClaim -> ResourceClass, Provider, for debugging configuration issues
  • Resource -> ResourceClaim, for debugging resources and to look for orphans
  • Policy or Provider -> Resource and ResourceClaim objects
  • Provider -> everything using that provider

I understand you suggest a more general purpose debugging tool rather than a simple "find related next object and print some info" tool which definitely makes sense. In this case, it would be hard to scale with bash and better to implement in go.

Based on the sample output, it sounds like you are imagining output which feels very similar to kubectl get. I agree with this, and it'd be great if we supported some of the same familiar options. I suppose by options I especially mean a default concise view (like kubectl get with no formatting arguments), and a full view (like kubectl get -o yaml). One way to do this would be to structure the trace command as a thing which walks the chains and feeds the objects to kubectl get commands, passing through any extra arguments the user specified.

Yeah, this is something that I also considered but not all kubectl get options makes sense except formatting. I am thinking to represent the relations with some sort of data structure (e.g. graph) and have different formatting options supporting yaml or json output as you suggested. By default, we can start with some kind of basic text output and later consider even some visualization depicting the relation between objects which could help new users to understand Crossplane resources and relations between them.

You may have a better idea of what a bash implementation may look like at this point than I do, but I am thinking the chain-walking logic may be tricky to do cleanly in bash. But maybe not!

Representing resource chain/relations with a data structure makes more sense considering scalability of this tool. So, I agree bash is not a good option here, I will go with go :)

@suskin
Copy link
Member Author

suskin commented Oct 15, 2019

Representing resource chain/relations with a data structure makes more sense

In general I agree with this, BUT, I am wondering whether the data structure could be embedded in the Kubernetes objects. For example: what if all objects related to a particular workload had a label on them, like workload: pretend-this-is-a-uuid? The graph would be represented by labels, and the tool would only need to know how to find the nodes in the graph and walk them. In other words, it seems like we can decouple the representation of the graph from the process of walking the graph. And maybe we don't need both of them to be in the tool. Does that help at all?

@suskin
Copy link
Member Author

suskin commented Oct 15, 2019

Also, I do want to clarify that it is totally fine to make a simple and very limited tool now, play with it to learn more about what we want, and then improve it in the future to support other use-cases. I would much rather have a simple tool that we can use now than wait longer for a more sophisticated tool.

For example, a use-case that came up multiple times in the last couple of days is debugging a resource claim, so making something quickly that helps with debugging claims seems like a good place to start. It's even fine for that to be in bash if that makes things easier, even if we think golang will be the direction we want to go in the future. Because our vision of the future will change as people use our tool and we learn more about the use-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants