Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sketch] Timeline redux #1098

Merged
merged 15 commits into from
Apr 7, 2023
Merged

[sketch] Timeline redux #1098

merged 15 commits into from
Apr 7, 2023

Conversation

lukego
Copy link
Member

@lukego lukego commented Feb 9, 2017

TLDR; Scroll down for pretty screenshot.

I have a new revision of the timeline code. This one is intended to come upstream very soon. This branch is a preview for early feedback especially on the interface. I have rebased and split into logical commits. I marked this branch as [sketch] to reserve the right to do so again.

Timeline is a circular event log. Calling the logger is cheap (~5 cycles) and events are timestamped with sub-nanosecond precision. The log wraps around after one million events. Instrumenting Snabb with timeline logging makes it possible to do detailed offline profiling and performance diagnostics.

Here is how it works:

Define your events

First you define an "events file" like this excerpt from core/engine.events:

4|sleep_Hz: usec Hz
The engine requests that the kernel suspend this process for a period of
microseconds in order to reduce CPU utilization and achieve a fixed
frequency of breaths per second (Hz).

4|sleep_on_idle: usec
The engine requests that the kernel suspend this process for a period
of microseconds in order to reduce CPU utilization because idleness
has been detected (a breath in which no packets were processed.)

4|wakeup_from_sleep:
The engine resumes operation after sleeping voluntarily.

This example defines three events. They all have log level 4 (medium) which determines how frequently they will be logged to the ring. Each event can have up to six 64-bit integer arguments. The long textual descriptions will be stored only once in the log file (i.e. interned.)

Instrument your code

First you create a timeline log in shared memory and load the relevant event definitions:

timeline_log = timeline.new("engine/timeline")
events = timeline.load_events(timeline_log, "core.engine")

Then you instrument your code with calls to log events as they happen:

events.sleep_on_idle(sleep) -- Log message before sleeping
C.usleep(sleep)
events.wakeup_from_sleep()  -- Log message after sleeping

These events are timestamped with sub-nanosecond precision using the RDTSCP instruction. Here the log will show both the number of microseconds you requested to sleep (value of the sleep variable) and also the number of cycles that you really did sleep (timestamp delta between log messages.)

Generate your data

Now when you run Snabb the file /var/run/snabb/$pid/engine/timeline will contain your events. You can copy this file either from a running process or one that has terminated for offline inspection. It is updated in real-time as file-backed shared memory.

The engine samples the events by randomizing the log level at the start of each breath. For 99.9% of breaths there is no logging at all. 0.1% are logged with a high level (e.g. start/end of breath), 0.01% with a medium level (e.g. individual push/pull calls), and 0.001% with a low level (e.g. logs within apps and libraries.) This is intended to produce roughly 1000 events per second and wrap the million-element log roughly every 15 minutes.

Calling the log function costs around 5 cycles on average. Most frequently logging will be disabled and the call will cost ~4 cycles. Occasionally logging will be enabled and the call will cost ~44 cycles. The logger is written in assembler and looks like a simple FFI call to the JIT.

Analyze the data

The log contains one million entries. That's too many to read by hand. The idea is to perform statistical analysis of the logs and be able to compare different sets of logs (e.g. test lab data vs production environment data.) Separate tooling will be available for this.

Here is an example analysis of the cycles per packet cost depending on which app is being called, how many packets are on its input links, and which NUMA node it is scheduled on. This data was generated from a directory containing 10 timeline files from Snabb processes with varied workloads and NUMA affinities (no special preparation needed.)

timeline-profile-cycles-per-packet

Graphs like this (and many others over time) will be easy to produce from a directory of timeline files.

This is a very useful instruction for self-benchmarking programs that
want to read the CPU timestamp counter efficiently.

See Intel whitepaper for details:
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
Use 'double' instead of 'uint64_t' for values in the timeline file.

This change is motivated by making timeline files easier to process by
R. In the future we may switch back to uint64_t for the TSC counter
and/or argument values for improved precision. The major_version file
header field can be used to avoid confusion.

The obvious downside to using doubles is that the TSC value will lose
precision as the server uptime increases (the TSC starts at zero and
increases at the base frequency of the CPU e.g. 2GHz.) The impact seems
to be modest though. For example a 2GHz CPU would start rounding TSC
values to the nearest 128 (likely quite acceptable in practice) after
approximately 2 years of operation (2^53 * 128 cycles.)

So - storing the TSC as a double-float is definitely a kludge - but
unlikely to cause any real harm and expedient for the short-term goal of
putting this code to use without getting blocked due to e.g. my lack of
sophisticated as an R hacker.
Resolved conflict in app.lua between adding timeline events and the new
breath topological-sort machinery.
Simplify the code and eliminate unwanted branches from the engine loop
by drawing a random timeline level from a log-uniform distribution that
mathematically favors higher log levels over lower ones.

Plucked log5() out of the air i.e. each log level should be enabled for
5x more breaths than the one below.

Here is how the distribution of log level choice looks in practice using
this algorithm:

    > t = {0,0,0,0,0,0,0,0,0}
    > for i = 1, 1e8 do
         local n = math.max(1,math.ceil(math.log(math.random(5^9))/math.log(5)))
         t[n] = t[n]+1
      end
    > for i,n in ipairs(t) do print(i,n) end
    1       560
    2       2151
    3       10886
    4       55149
    5       273376
    6       1367410
    7       6844261
    8       34228143
    9       171120244

Note: Lua provides only natural logarithm functions but it is easy to
derive other bases from this (google "log change of base formula").
I suspect that it is a misfeature for the timeline to sample the
contents of packets. Do we really want user data potentially appearing
in debug logs? Removed for now.
benagricola pushed a commit to benagricola/snabb that referenced this pull request Jun 14, 2018
…fication

Fix incorrect status sent as alarm-notification
@eugeneia eugeneia merged commit 596d6b5 into snabbco:master Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants