-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sketch] Timeline redux #1098
Merged
Merged
[sketch] Timeline redux #1098
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a very useful instruction for self-benchmarking programs that want to read the CPU timestamp counter efficiently. See Intel whitepaper for details: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
Use 'double' instead of 'uint64_t' for values in the timeline file. This change is motivated by making timeline files easier to process by R. In the future we may switch back to uint64_t for the TSC counter and/or argument values for improved precision. The major_version file header field can be used to avoid confusion. The obvious downside to using doubles is that the TSC value will lose precision as the server uptime increases (the TSC starts at zero and increases at the base frequency of the CPU e.g. 2GHz.) The impact seems to be modest though. For example a 2GHz CPU would start rounding TSC values to the nearest 128 (likely quite acceptable in practice) after approximately 2 years of operation (2^53 * 128 cycles.) So - storing the TSC as a double-float is definitely a kludge - but unlikely to cause any real harm and expedient for the short-term goal of putting this code to use without getting blocked due to e.g. my lack of sophisticated as an R hacker.
f657e93
to
40ceef6
Compare
Resolved conflict in app.lua between adding timeline events and the new breath topological-sort machinery.
Simplify the code and eliminate unwanted branches from the engine loop by drawing a random timeline level from a log-uniform distribution that mathematically favors higher log levels over lower ones. Plucked log5() out of the air i.e. each log level should be enabled for 5x more breaths than the one below. Here is how the distribution of log level choice looks in practice using this algorithm: > t = {0,0,0,0,0,0,0,0,0} > for i = 1, 1e8 do local n = math.max(1,math.ceil(math.log(math.random(5^9))/math.log(5))) t[n] = t[n]+1 end > for i,n in ipairs(t) do print(i,n) end 1 560 2 2151 3 10886 4 55149 5 273376 6 1367410 7 6844261 8 34228143 9 171120244 Note: Lua provides only natural logarithm functions but it is easy to derive other bases from this (google "log change of base formula").
I suspect that it is a misfeature for the timeline to sample the contents of packets. Do we really want user data potentially appearing in debug logs? Removed for now.
benagricola
pushed a commit
to benagricola/snabb
that referenced
this pull request
Jun 14, 2018
…fication Fix incorrect status sent as alarm-notification
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TLDR; Scroll down for pretty screenshot.
I have a new revision of the timeline code. This one is intended to come upstream very soon. This branch is a preview for early feedback especially on the interface. I have rebased and split into logical commits. I marked this branch as
[sketch]
to reserve the right to do so again.Timeline is a circular event log. Calling the logger is cheap (~5 cycles) and events are timestamped with sub-nanosecond precision. The log wraps around after one million events. Instrumenting Snabb with timeline logging makes it possible to do detailed offline profiling and performance diagnostics.
Here is how it works:
Define your events
First you define an "events file" like this excerpt from
core/engine.events
:This example defines three events. They all have log level 4 (medium) which determines how frequently they will be logged to the ring. Each event can have up to six 64-bit integer arguments. The long textual descriptions will be stored only once in the log file (i.e. interned.)
Instrument your code
First you create a timeline log in shared memory and load the relevant event definitions:
Then you instrument your code with calls to log events as they happen:
These events are timestamped with sub-nanosecond precision using the RDTSCP instruction. Here the log will show both the number of microseconds you requested to sleep (value of the
sleep
variable) and also the number of cycles that you really did sleep (timestamp delta between log messages.)Generate your data
Now when you run Snabb the file
/var/run/snabb/$pid/engine/timeline
will contain your events. You can copy this file either from a running process or one that has terminated for offline inspection. It is updated in real-time as file-backed shared memory.The engine samples the events by randomizing the log level at the start of each breath. For 99.9% of breaths there is no logging at all. 0.1% are logged with a high level (e.g. start/end of breath), 0.01% with a medium level (e.g. individual push/pull calls), and 0.001% with a low level (e.g. logs within apps and libraries.) This is intended to produce roughly 1000 events per second and wrap the million-element log roughly every 15 minutes.
Calling the log function costs around 5 cycles on average. Most frequently logging will be disabled and the call will cost ~4 cycles. Occasionally logging will be enabled and the call will cost ~44 cycles. The logger is written in assembler and looks like a simple FFI call to the JIT.
Analyze the data
The log contains one million entries. That's too many to read by hand. The idea is to perform statistical analysis of the logs and be able to compare different sets of logs (e.g. test lab data vs production environment data.) Separate tooling will be available for this.
Here is an example analysis of the cycles per packet cost depending on which app is being called, how many packets are on its input links, and which NUMA node it is scheduled on. This data was generated from a directory containing 10 timeline files from Snabb processes with varied workloads and NUMA affinities (no special preparation needed.)
Graphs like this (and many others over time) will be easy to produce from a directory of timeline files.