From 832833b60083e17224e742bfe6c207ad67fbd454 Mon Sep 17 00:00:00 2001 From: Mark Marron Date: Wed, 10 Jan 2018 13:51:53 -0800 Subject: [PATCH] Pass to fixup definitions and add some content. (#1) * Clean up for 80 column lines. * Clean up wording and setup in intro. * Pass over context and event definitions. * Update to definitions of async call graph. * Fixups based on PR comments. * Adding text around motivating examples. * writing about metadata * quick stash * writing * Additional event info --- Async-Context-Definitions.md | 353 ++++++++++++++++++++++------------- 1 file changed, 225 insertions(+), 128 deletions(-) diff --git a/Async-Context-Definitions.md b/Async-Context-Definitions.md index 0f816fa..f31385b 100644 --- a/Async-Context-Definitions.md +++ b/Async-Context-Definitions.md @@ -4,84 +4,89 @@ ## 1. Overview Node.js uses an event-driven non-blocking I/O model of execution instead of -multi-threading. This model greatly simplifies reasoning about many aspects of -program execution but requires the use of _callback_ based code to ensure -applications remain responsive. Thus, understanding the behavior of an -application may require understanding the execution of several blocks of code -and how their executions are related via asynchronous callback interleavings and -dependencies. +multi-threading. This model simplifies reasoning about many aspects of +program execution but requires the use of _callback_ or _promise_ based code +to ensure applications remain responsive. Thus, understanding the behavior of +an application requires understanding the execution of several blocks of code +and how their executions' are related via asynchronous callback interleavings +and execution dependencies. This document provides a specification for asynchronous execution in Node.js and a model for reasoning about the relationships between asynchronous calls -in an application. Colloquially, such relationships are called **Asynchronous Context**. -A formal definition of Asynchronous Context is given in section 2. We model state -transitions in asynchronous context as a series of discrete events that occur -during program execution. These events are described in detail in section 3. The -event streams can be used to build a directed acyclic graph, that we call the -**Asynchronous Call Graph**. How the events are used to construct the Asynchronous -Call Graph is defined in section 4. Section 5 gives a number of example problems -that are easily reasoned about given the terminology defined herein. Lastly, section 6 -shows a number of common asynchronous code patterns in Node.js, and shows the event -stream produced from such code executing, as well as some example pictures of the -Asynchronous Call Graphs produced. +in an application. Colloquially, such relationships are called +**Asynchronous Context**. A formal definition of Asynchronous Context is given +in section 2. We model state transitions in asynchronous context as a series +of discrete events that occur during program execution. These events are +described in detail in section 3. The event streams can be used to build a +directed acyclic graph, that we call the **Asynchronous Call Graph**. How the +events are used to construct the Asynchronous Call Graph is defined in +section 4. Section 5 gives a number of example problems that are easily +reasoned about given the terminology defined herein. Lastly, section 6 shows a +number of common asynchronous code patterns in Node.js, and shows the event +stream produced from such code executing, as well as some examples of +the resulting Asynchronous Call Graphs. ## 2. Definition of an Asynchronous API A fundamental challenge for asynchronous execution tracking is that, in the Node.js model, asynchronous behavior is defined both at different layers of the code stack and through implicit API usage conventions. Thus, our definitions and mechanisms for tracking asynchronous execution rely on: - 1. A notion of `client` code which sees an _asynchronous execution_ based on - a series of executed JavaScript functions from different logical asynchronous - contexts. - 2. A notion of `host` code, which can be native C++, JavaScript in Node - core, or even other user code that is surfaces an asynchronous API to the - `client` even if the underlying implementations mixes the execution of code - from different logical asynchronous client contexts in a single synchronous - execution. + 1. A convention based notion of `client` code which sees + _asynchronous execution_ based on a sequence of executed JavaScript functions + from logical asynchronous contexts. + 2. A convention based notion of `host` code, which can be native C++, + JavaScript in Node core, or even other userland code that surfaces an + asynchronous API to the `client` -- even if the underlying implementations + mixes the execution of code from different logical asynchronous client + contexts in a single synchronous execution. 3. A set of explicit tagging API's that will notify us of which API's are involved in these boundaries, the contexts they are associated with, and which relations they affect. These issues are illustrated by the Node `timers` API which, as described [here](https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/), -is implemented in JavaScript code and uses a list of timer callbacks to -track _every_ function to be executed when a given timespan has elapsed. -When the timespan expires a single synchronous function is called from the -LibUV library, `listOnTimeout`, that iterates over the list and executes -every function. Thus, from a runtime viewpoint all of the functions are -part of the same asynchronous context regardless of which (logically +is implemented using JavaScript in Node core and uses a list of timer +callbacks to track _every_ function to be executed when a given timespan has +elapsed. When the timespan expires a single synchronous function is called +from the LibUV library, `listOnTimeout`, that iterates over the list and +executes every function. Thus, from a runtime viewpoint all of the functions +are part of the same asynchronous context regardless of which (logically different) client asynchronous contexts added them. ### Definition of Context and Ordering A single JavaScript function may be passed to multiple asynchronous API's and, in order to track the state of each of these asynchronous executions -independently, we must be able to distinguish between these instances. Thus, +independently, we must be able to distinguish between uses of the same +function from multiple logically different asynchronous executions. Thus, we begin by defining an _asynchronous function context_ (or context) as a -unique identifier that is associated with any function that is passed to an -asynchronous API. In general we only require that fresh instances of these -values can be generated on demand and compared for identity. In practice +unique identifier. We only require that fresh instances of these +values can be generated on demand and compared for equality. In practice monotonically increasing integer values provide a suitable representation. For a given function _f_ we define the asynchronous context representation of _f_ in context _i_ as _fi_. -Our definitions of asynchronous executions are based on three +Our definitions of asynchronous executions are based on four binary relations over the executions of logically asynchronous JavaScript functions: + - **execution** -- when a function _f_ is executed we create a unique + fresh context for it _c_ and use this as the `execution` context for + asynchronous events that happen during the execution of _f_. - **link** -- when the execution of function _f_ in context _i_ stores a second function _g_ in context _j_ for later asynchronous execution we say _fi_ `links` _gj_. - - **causal** -- when the execution of a function _f_ in context _i_ is the - `client` code that is logically responsible (according to the `host` API) - for causing the execution of a previously **linked** _gj_ we say - _fi_ `causes` _gj_. - - **happens before** -- when a function _f_ in context is asynchronously executed - before a second function _gj_ we say _fi_ `happens before` + - **causal** -- when the execution of a function _f_ in context _i_ is + logically responsible (according to the `host` API) + for causing the execution of a previously **linked** _g_ from context _j_ + we say _fi_ `causes` _gj_. + - **happens before** -- when a function _f_ with execution context _i_ + is asynchronously executed before a second function _g_ with execution + context _j_ then _i_ < _j_ and we say _fi_ `happens before` _gj_. -We define the following module code that provides the needed explicit marking -of API's that are exposing asynchronous behavior from a `host` component to +We define the following module code that provides the required functions to +explicitly mark API's that expose asynchronous behavior from `host` code to `client` code and which enable the tracking of the core asynchronous execution -chain concepts. +concepts. ``` let globalCtxCtr = 0; generateFreshContext() { @@ -95,7 +100,7 @@ generateNextTime() { let currentExecutingContext = "root"; -link(ctxf) { +link(f) { const linkCtx = generateFreshContext(); emit("link", linkCtx, generateNextTime()); @@ -110,15 +115,22 @@ cause(ctxf) { } execute(ctxf) { - currentExecutingContext = { link: ctxf.link, cause: ctxf.cause, execution: generateFreshContext() }; + const origCtx = currentExecutingContext; + currentExecutingContext = { + link: ctxf.link, + cause: ctxf.cause, + execution: generateFreshContext() + }; + emit("executeBegin", currentExecutingContext, generateNextTime()); + let res = undefined; try { - ctxf.call(null); + res = ctxf.function.call(null); } finally { emit("executeEnd", currentExecutingContext, generateNextTime()); - currentExecutingContext = undefined; + currentExecutingContext = origCtx; } return res; @@ -135,10 +147,10 @@ An event trace for an asynchronous execution must satisfy the following ordering and identify requirements: 1) For any context the events must be ordered in the form: link < cause < beginExecute < endExecute - 2) A context may only appear in _one_ link event. - 3) A context may appear in multiple cause events. - 4) A context may appear in multiple beginExecute/endExecute events but these - must have different cause and execution context values. + 2) A single link context, cause context, or execute context may only be + introduced in a single event. + 3) A link context may appear in multiple `cause` and `execute` events. + 4) a cause context may appear in multiple `execute` events. The emit events must also satisfy the grammar constraints of the language: ``` @@ -151,6 +163,10 @@ PartialExecution := beginExecution AsyncOp* AsyncOp := link | cause ``` +**TODO:** I would like to make this stronger as any invariants here help +users of this output as well as provide acceptance criteria for an +implementation. + ## Asynchronous Annotations Examples To illustrate how the asynchronous annotation code can be used to convert a `host` API into one that tracks asynchronous events for client code we look @@ -228,8 +244,10 @@ We will see the asynchronous trace: ``` ### Promise API -Similarly we can provide a basic promise API that supports asynchronous context tracking by modifying the real promise implementation as follows: **TODO** this -is super rough. +Similarly we can provide a basic promise API that supports asynchronous +context tracking by modifying the real promise implementation as follows: +**TODO** this is super rough. + ``` function then(onFulfilled, onRejected) { cfFulfilled = contextify(onFulfilled); @@ -302,87 +320,167 @@ The events described above can be used to construct and maintain an `Asynchronous Call Graph`, which is a directed acyclic graph. Nodes are defined as follows: - * An "execution context" node is defined when an "executeBegin" event is received. - * A "linking context" node is defined when a "link" event is received. + * An "execution context" node is defined when an "executeBegin" event + is received and all linking/causal events will create child edges from + this node until the matching "executeEnd" event is seen. + * A "linking context" node is defined when a "link" event is received and + is associated with the data from the link event. -Edges are defined as follows: - * A `link edge`, directed from a given "execution context" node - _c1_ to second "linking context" node, _c2_ - exists, if the `asynchronous event trace` contains the entries - _{event: "executeBegin", current: c1, time: t}_ and - _{event: "link", linkCtx: c2, time: t'}_ where t < t' and, - if the trace contains an event - _{event: "executeEnd", current: c1, time: t''}_, where t < t'' then t' < t''. - In this case, a `link edge` exists directed from _c1_ to _c2_. - - * A `causal edge`, directed from a given "execution context" node, _c1_, - to a - second "execution context" node, _c2_, if the `asynchronous event - trace` contains the entries - _{event: "executeBegin", current: c1, time: t}_ and - _{event: "cause", causeCtx: c2, time: t'}_ where t < t' and, - if the trace contains an event - _{event: "executeEnd", current: c1, time: t''}_, where t < t'' then t' < t''. - - * An `execution edge`, directed from a given "linking context" node, _c1_, - to a second `linking context` node _c2_, if the `asynchronous event trace` - contains the entries - _{event: "link", linkCtx: c1, time: t}_ and - _{event: "executeBegin", current: c2, time: t''}_, where t < t'' then t' < t''. + * A "causal context" node is defined when a "cause" event is received and + is associated with the data from the cause event. - **TODO** - need better definition of above - insuficient info in the event trace to know - that c1 is the linking node for c2 +Edges are defined as follows: + * A "link" edge is added from the current "execution context" node + to a newly created "linking context" node when a "link" event is + received. + * A "causal" edge is added from the current "execution context" node + to a newly created "causal context" node when a "cause" event is + received. An additional "linked by" cross-edge is added to the + "linking context" node with the corresponding link context value. + * An "execution" edge is added into the newly created "execution context" + node from the "causal context" node with the corresponding causal context + value. **TODO** use these definitions to see trees for trace examples. -### Long Call-Stacks from Traces and Trees -**TODO** fill this in - ### Resource Use Monitoring with Traces -**TODO** fill this in +A common task for a developer of a web-service is to understand how +long it took to service a user request as well as how much time was +spent in various sub-tasks of the processing -- in synchronous JavaScript +code or waiting in various asynchronous queues. The first question can +be easily answered by recording the time when HTTP request is received +and when the response end is written (as provided by many frameworks +today). However, to answer the second question we need to know how much +time the callback executed in response to the HTTP event + the time of +all asynchronous callbacks it causes. The time in the synchronous JS +code is the total time minus this sum while the time spent blocked in +asynchronous queues is simply the sum of this time. + +Computing the total time spent in callbacks that are relevant to a single HTTP +request while ignoring the time spent in other callbacks requires tracking the +`causal` relation. Conceptually, we can identify a root node in the +`asynchronous call graph` that we want to compute the inclusive execution time +for, in our case the HTTP callback handler, and traverse all of the +`causal context` edge -> `execution context` edge to reach a child execution +node, summing up the times for each of the child nodes as we go (contained in +the _Asynchronous Operation Metadata_ defined below). The resulting value is +the total blocking execution time spent handling the HTTP request. + +**TODO** show a sample of code + async graph with multiple simultaneous +http handlers and highlight the 1 we are looking at. + +### Long Call-Stacks from Graphs +Another common task is constructing _long call stacks_. These call stacks +connect the lexical call stacks that surround it, but were torn due to +asynchronous execution, into a simple linear structure. To construct a +long call stack from a set of lexical call-stacks for a given callback +execution we need to know which of them contain variables/values that were +live, and potentially relevant, when the callback was created. + +Using the `asynchronous call graph` definitions we can construct long call +stacks using the _causal_ -> _linked-by_ ->_link_ parent path in the graph. +The bottom of the call-stack can be extracted from the current call stack +up to the frame that is a `host` asynchronous API as usual. At this point we +then look at the _causal_ parent and follow the _linked_by_ parent to find +the link node which is the point in execution when this asynchronous call +with the scope where the callback was linked. From here if we utilize +additional _Asynchronous Operation Metadata_ (defined below) containing +the line number & call stack at the time of the link operation we can get +the next set of frames to stitch together to build the long call-stack. +If this process is continued we will eventually reach the `root` context +and the top of the desired long call-stack. + +**TODO** add an example with sample code etc. + +## Asynchronous Operation Metadata +In the previous sections we focused solely on tracking and emitting +information on the structure of the asynchronous call graph. However, most +applications, including our samples above, are interested in more than just the raw structure of this graph. While we cannot, and would not want to, identify +all possible data that could be needed and write it out to our log we can +(1) select core of commonly useful information to include and (2) add a timestamp +that is shared with user logging code to allow the correlation of custom user +log data with the asynchronous event data we write. + +In our definitions all events are emitted with a timestamp generated by +`generateNextTime`. To allow correlation between our emitted events and any +user logging we expose this method to user code so they can include correlated +timestamps in their logging. + +The other core metadata we track split it into two classes `standard` and +`detailed`. The `standard` data is intended to include information that is +nearly universally useful and low cost to gather while the `detailed` class +is for less universaly relevant data or data that is expensive to capture. + * Standard: + - Source/Line info for applicable events. + - **TODO** other info? + * Detailed: + - Callstack info the applicable events. + - **TODO** other info? ## Enriched Terminology -The definitions in the "Terminology" section provide basic asynchronous +The definitions in the _Terminology_ section provide basic asynchronous lifecycle events but do not capture many important features including, -canceled or failed rejections and, in cases of asynchronous events that +canceled events or failed rejections and, in cases of asynchronous events that depend on environmental interaction, what external events may be relevant. -To address these +To support scenarios that require this type of information we extend the vocabulary +of events recorded in the asynchronous execution trace with the following hooks: + ``` cancel(ctxf) { - emit("cancel", ctxf.ctx, generateNextTime()); + emit("cancel", ctxf.linkCtx, generateNextTime()); } externalCause(ctxf, data) { - emit("externalCause", ctxf.ctx, generateNextTime(), data); + emit("externalCause", ctxf.causeCtx, generateNextTime(), data); } -completed(ctxf) { - emit("completed", ctxf.ctx, generateNextTime()); +failed(ctxf, reason) { + emit("fail", reason, generateNextTime()); } -failed(ctxf) { - emit("fail", ctxf.ctx, generateNextTime()); +rejected(ctxf, reason, isException) { + ctxf.rejection = currentExecutingContext; + emit("rejected", reason, isException, generateNextTime()); } -rejected(ctxf) { - emit("rejected", ctxf.ctx, generateNextTime()); +unhandledReject(ctxf) { + emit("rejected", ctxf.rejection, generateNextTime()); } ``` -A `completed asynchronous execution` is one in which **TODO** +A `cancel` event is emitted when a cancellable callback, such as one +registered with SetInterval is explicitly canceled, or when a listener such +as on a file stream is implicitly canceled via channel termination or GC +collection. The `cancel` event notifies us that we will never observe +another event related to the `linkCtx` of the associated callback. -A `canceled asynchronous execution` is one in which **TODO** +An `externalCause` event is emitted as part of a handling a callback that +depends on external inputs, in addition to internal execution, to be caused. +Examples include File and Network IO callbacks that need to be both `caused` +by registering them **and** by external data arriving to trigger their +execution. These entries provide additional context on this external data, +included in the `data` component of the message. -A `failed asynchronous execution` is one in which **TODO** +A `failed` event is emitted when a callback throws an uncaught exception +during its execution which results in an unhandled exception. -A `rejected asynchronous execution` is one in which **TODO** +A `rejected` event is emitted when an [asynchronously executed]? promise +rejects and includes if the rejection was the result of an exception or +explicit reject. -A `rejected and unhandled asynchronous execution` is one in which **TODO** +An `unhandled rejection` event is emitted when an [asynchronously executed]? +promise rejects and is not handled within a turn of the event loop as per Node [semantics](https://nodejs.org/api/process.html#process_event_unhandledrejection). +**TODO** I think promise handling described here will miss something like a +naked `Promise.reject("nothing")` call which is an unhandled rejection but +just doesn't involve any asynchronous behavior. So, we may want to include +synchronous promise creation/execution somehow if this is important to us. +I think this just gets kicked back to sequential execution analysis but +may be worth discussing. Using the these definitions we can define the following states of an asynchronous execution: @@ -391,67 +489,66 @@ asynchronous execution: * A (sub)tree has `retired asynchronous execution` when all child nodes, link of causal, are in the completed state. -## Asynchronous Operation Metadata -**TODO** define the metadata associated with each async execution node. e.g., time stamps associated w/ each state transition? - ## 5. Use Cases Use cases for Async Context can be broken into two categories, **post-mortem** and **online**. ### Post-Mortem Use Cases** **Post-Mortem Use Cases** are program analysis tasks that happen after a -program has completed execution. They require reconstruction of an Async Call Graph -up to some point in time, and is achievable via an accurate event stream that -describes all state transitions of nodes & edges in the Async Call Graph. +program has completed execution. They require reconstruction of an Async Call +Graph up to some point in time, and is achievable via an accurate event stream +that describes all state transitions of nodes & edges in the Async Call Graph. #### Execution Timing Analysis **Execution Timing Analysis** - A user wants to understand timing details of -specific HTTP request. Since the HTTP request's processing consists of multiple -Async Executions, a thorough timing analysis needs to understand each node -in the path from the end of the request, to the start of the request, and for -each node, specific timing details around each state transition. Such data +specific HTTP request. Since the HTTP request's processing consists of +multiple Async Executions, a thorough timing analysis needs to understand each +node in the path from the end of the request, to the start of the request, and +for each node, specific timing details around each state transition. Such data can tell us how long a request was blocked in an execution queue, or waiting for some event, or actually executing. - 2. Understand parent/child relationship in async call paths - **TODO** add text + 2. Understand parent/child relationship in async call paths - **TODO** add + text - 3. Reconstruct async call tree to some point in time given an event stream - **TODO** add text + 3. Reconstruct async call tree to some point in time given an event stream + - **TODO** add text ### Online Use Cases **Online Use Cases** are use cases where the Asynchronous Context needs to be -examined dynamically while a program is executing. Meeting requirements of +examined dynamically while a program is executing. Meeting requirements of online use cases requires runtime and/or module support to keep an accurate -representation of the Async Call Graph, as well as APIs to navigate the graph. +representation of the Async Call Graph, as well as APIs to navigate the graph. In particular, garbage collection passes must occur on retired sub-trees. Examples of Online Use Cases include: #### Continuation Local Storage -Analagous to thread-local-storage, but for asynchronous continuations. +Analagous to thread-local-storage, but for asynchronous continuations. Continuation Local Storage provides the ability to store key/value pairs -in a storage container associated with the current Async Execution. Clients +in a storage container associated with the current Async Execution. Clients can lookup values for a given key, and the lookup will walk a path on the -Async Call Grapp until a key is found, or it reaches the root. Continuation +Async Call Grapp until a key is found, or it reaches the root. Continuation local storage is useful when code in some Asynchronous Execution needs to know -values associated with some parent Asynchronous Execution. For example, APM +values associated with some parent Asynchronous Execution. For example, APM vendors often need to associate code execution events with a specific HTTP request. #### Async exception handling Traditional (i.e., synchronous) exception -handling is a multi-frame stack jump. Asynchronous Exception Handling +handling is a multi-frame stack jump. Asynchronous Exception Handling can be described as a when a synchronous exception handler wishes to -notify intersted observers about an exception. The set of interested +notify intersted observers about an exception. The set of interested observers can be succinctly described as observers on some path through -the Async Call Graph. For example, one trivial strategy would be to +the Async Call Graph. For example, one trivial strategy would be to traverse all linked edges from the current Async Excecution to the root, and see if any registered observers are present. #### Long Call Stacks -A **long call stack** is a list of call-stacks that span asynchronous callback -operations. Analagous to a synchronous callstack, "Long call stacks" are useful -for programmers to answer the question of "what was the call path to a specific -point in program execution. For example, the following code +A **long call stack** is a list of call-stacks that span asynchronous callback +operations. Analagous to a synchronous callstack, "Long call stacks" are +useful for programmers to answer the question of "what was the call path to a +specific point in program execution. For example, the following code ``` function f1() {