Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed tracing #89

Closed
kazimuth opened this issue Jun 13, 2019 · 14 comments
Closed

Distributed tracing #89

kazimuth opened this issue Jun 13, 2019 · 14 comments
Labels
help wanted Extra attention is needed kind/feature New feature or request needs/design Additional design discussion is required.

Comments

@kazimuth
Copy link
Contributor

kazimuth commented Jun 13, 2019

I saw a little bit of discussion of this in tokio-rs/tokio#561, but there aren't currently any open issues for it, so I figured I'd start one.

Distributed tracing is like tokio-trace but extended for distributed systems; instead of tracing only within a single process, you can trace code execution of code across process boundaries, machines, data centers, continents... There are a number of systems for this, e.g. Jaeger, zipkin, a bunch of others. Tokio-trace is perfectly set up to support distributed tracing, only the actual glue code needs to be written.

  1. Trace propagation -- in a distributed tracing system, incoming and outgoing requests to a process are annotated with some form of ID, so that they can be collated later. In the past, tracing systems defined their own bespoke propagation formats, but going forward it looks like people are standardizing on the W3C Trace Context Recommendation.

  2. Trace export -- after traces are recorded, they need to be sent to a central location for viewing. This can be done push-style (having your application connect somewhere and send data) or pull-style (exposing data on some port that can later be scraped). There are open APIs for doing this -- OpenTracing and OpenCensus, which are currently merging into OpenTelemetry.

I think the simplest path forward on this is to build:

  • A parser for W3C TraceContext
  • Integrations of that parser with various ecosystem libraries (http, hyper, grpc, actix, rocket, reqwests, ...).
  • An exporter for the OpenCensus service. That service acts as basically a multiplexer - you run a little daemon on your server and send traces to it, then it can send them to jaeger, zipkin, etc for you. OpenTelemetry will be backwards-compatible with this.

Ideally end users should be able to do:

fn main() {
    opencensus::export();
    actix_web::App::new()
       .middleware(TraceContextMiddleware::new())
       /* ... */
       .finish();
}

and have things just work ✨

@hawkw
Copy link
Member

hawkw commented Jun 13, 2019

Thanks for opening this issue! Supporting integration with distributed tracing systems is definitely a goal for tokio-trace, and we've tried to design the core primitives to make integrating with distributed tracing easy. However, nobody has actually written any such integrations yet.

I've done some thinking about how one might want to go about writing a distributed tracing integration for tokio-trace in the past. I think we would start by writing a subscriber that consumes tokio-trace spans and events, and translates them into a format suitable for exporting to the distributed tracing system. Then, we would write middleware/glue for various web frameworks and libraries, as you suggested, that would parse incoming trace contexts and associate them with tokio-trace spans.

A potential way to associate the trace contexts with spans is to use the Subscriber downcasting API. I discussed how this could be done in the PR that added support for downcasting subscribers: tokio-rs/tokio#974.

Alternatively, constructing the subscriber could return a subscriber and a handle type that allows sending trace context IDs to the subscriber. The trace context middlewares could then be constructed using that handle. This might be more efficient than getting the current subscriber, but it would require users to thread through that handle from where the subscriber is created to wherever the middlewares are constructed.

@kazimuth are you interested in writing a tokio-trace/OpenTelemetry integration? If so, I'd be happy to provide guidance. Regardless, thanks for opening this issue, as it'll provide a place for folks interested in seeing this to discuss how it ought to be implemented.

@kazimuth
Copy link
Contributor Author

kazimuth commented Jun 14, 2019

I'm about to start grad school so I don't want to commit to any intense projects right now. Thanks for the offer of help, though :) i may poke around on this at some point.

I think ideally the trace context ID should be ambiently available, rather than having to be explicitly threaded through control flow. Explicitly threading anything is a pain, lol -- the system should involve as little as much as possible if we want people to try out the system. Downcasting subscribers would definitely work for an initial implementation.

Longer-term, it might make sense to add something like "baggage" to the tokio-trace API - values attached to a span tree that the subscriber is required to keep around. That complicates subscriber impls, but it makes it super easy to propagate metadata without having to worry about the underlying subscriber implementation.

@hawkw
Copy link
Member

hawkw commented Jun 14, 2019

I think ideally the trace context ID should be ambiently available, rather than having to be explicitly threaded through control flow.

I totally agree. My expectation is that when a request with a trace context ID is received, the middleware informs the subscriber which stores the context associated with the current span. We could then have a free function to get it by downcasting the subscriber to the expected type and asking it for the trace ID.

I'm not opposed to the idea you brought up around arbitrary metadata, though it seems like a lot of complexity in a system where we already have a notion of "fields". Seems worthy of more thought.

@kazimuth
Copy link
Contributor Author

kazimuth commented Jun 14, 2019

Yeah it's sorta a pain. Sometimes I wish rust had something like go's ctx.Ctx for misc stuff like this.

Could also do something like making the distributed subscriber wrap another subscriber and just handle the distributed stuff... But then you wouldn't be able to downcast it. Unless you stored the other subscriber as a dyn Subscriber, I guess.

@carllerche carllerche transferred this issue from tokio-rs/tokio Jun 24, 2019
@hawkw hawkw added the help wanted Extra attention is needed label Jul 3, 2019
@hawkw hawkw changed the title Distributed tracing in tokio-trace Distributed tracing Aug 14, 2019
@hawkw hawkw added kind/feature New feature or request needs/design Additional design discussion is required. labels Aug 14, 2019
@anton-dutov
Copy link
Contributor

Have you looked at the rustracing / rustracing_jaeger crates for impl ideas?

@inanna-malick
Copy link
Contributor

I've been playing around with a tracing honeycomb subscriber and one of the things I've had to spend some time thinking about is how trace ids are generated/discovered.

My initial impl just generates a trace id for all top-level spans, but it might make more sense to make this explicit instead of implicit. For example, in a GRPC service I'm building, I think I want to either generate per-request tracing ids or pick up external tracing ids at the request handler level.

@thedodd
Copy link

thedodd commented Nov 9, 2019

OT/Jaeger support would definitely be a pretty big win!

@bbigras
Copy link

bbigras commented Jan 29, 2020

Any progress on this?

@hawkw
Copy link
Member

hawkw commented Jan 29, 2020

@bbigras depending on what distributed tracing system you're using, there are some (work in progress) implementations available: tracing-opentelemetry for OpenTelemetry, and the honeycomb-tracing crate @pkinsky mentions in #89 (comment) for Honeycomb users.

@inanna-malick
Copy link
Contributor

@bbigras I've updated the honeycomb-tracing crate to support arbitrary backends (not just honeycomb) in a branch. Currently all tests are passing, I'll be publishing this new version sometime in the next week.

@thedodd
Copy link

thedodd commented Feb 5, 2020

@pkinsky nice! Does it support OpenTelemetry / Jaeger and such? I'm not a big fan of all the ceremony required in getting the tracing-opentelemetry crate setup, especially because it is using the rustracing_jaeger crate under the hood, and setting that up is even more painful. The setup for you honeycomb crate looks pretty terse. I like.

@inanna-malick
Copy link
Contributor

inanna-malick commented Feb 6, 2020

@thedodd

In theory it should support pretty much any tracing backend, all you need to do is implement this trait with whatever backing logic you prefer:

pub trait Telemetry {
    type Visitor: Default + tracing::field::Visit;

    fn report_span<'a>(&self, span: Span<'a, Self::Visitor>);
    fn report_event<'a>(&self, event: Event<'a, Self::Visitor>);
}

(also, tldr, I'm trans, I'm using this library release as an opportunity to do some much-needed identity refactoring)

@velvia
Copy link

velvia commented Mar 12, 2022

Hi folks, I found this old issue and I'm trying to get trucing-opentelemetry and Jaeger to work together. I am able to get spans into Jaeger, however I cannot find a way to tie the spans together with trace propagation. I'm trying to use code like the following:

     let context = global::get_text_map_propagator(|propagator| propagator.extract(&carrier))
     span.set_parent(context);

Does it matter what is in the hashmap in carrier?

Thanks.

@davidbarsky
Copy link
Member

@velvia: I've opened a discussion #1991 to address your question.


Since this functionality already exists through libraries such as tracing-opentelemetry and tracing-honeycomb, I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed kind/feature New feature or request needs/design Additional design discussion is required.
Projects
None yet
Development

No branches or pull requests

8 participants