Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated collection of open PRs on master #424

Merged
merged 28 commits into from
Mar 25, 2015
Merged

Conversation

lukego
Copy link
Member

@lukego lukego commented Mar 25, 2015

Merge with testing of open pull requests to master:

lukego and others added 28 commits March 5, 2015 07:47
Strip away these options:

    -DLUAJIT_USE_PERFTOOLS

This feature causes noise in /tmp (perf-nnnn.map files). There is a
better way to profile at trace granularity with the LuaJIT
profiler. Hopefully this will be upstream in LuaJIT but for now
available as a patch:
http://www.freelists.org/post/luajit/Profile-at-trace-granularity,1

    -DLUAJIT_USE_GDBJIT

The documentation says that "enabling it always has a non-negligible
overhead -- do not use in release mode" and I don't believe that we
are using this functionality (debugging JIT traces in gdb).

    -DLUAJIT_NUMMODE=3

This is less clear cut. Mike Pall has encouraged us to experiment with
settings for this variable for potential performance benefits. The
current value (3) was chosen during a micro-optimization spree many
moons ago. I am not confident whether this setting helps or is better
than other settings. So I am disabling it in this commit ("less
voodoo") and we will have to check for performance regressions before
merging onto master.
Simplify the code by operating on unaligned data. CPUs supporting AVX2
handle unaligned access very efficiently (source: Agner Fog).

The previous version of this function used a non-SIMD checksum for any
non-aligned data at the start of the packet. (This is still done for
the non-aligned part at the end, since the AVX code still operates on
minimum 32 byte chunks.)
This will make Snabb Switch always show 100% CPU in default settings.

Previously Snabb Switch has by default attempted to bring in a batch
of new packets once every 100us. This is too conservative on the
straightline design because we now work on smaller batches of
packets (max is now 256 and was 8192 before). So this change makes the
default behavior to busy-loop without sleeping to maximize throughput.

Going forwards we should consider a dynamic mechanism that will make a
truly idle Snabb Switch sleep. Meanwhile the default is to busy-loop
and a static configuration is possible with a setting like
'engine.Hz=10000'.
The standard link report printout is now a little prettier: links are
alphabetically sorted (resolves snabbco#385), punctuation is improved, and
values are printed with commas and justified.

Before:

    link report
    597223954 sent on Virtio_A.tx -> NIC_A.rx (loss rate: 0%))
    137411842 sent on NIC_A.tx -> Virtio_A.rx (loss rate: 0%))
    597223786 sent on NIC_B.tx -> Virtio_B.rx (loss rate: 0%))
    137411851 sent on Virtio_B.tx -> NIC_B.rx (loss rate: 0%))

After:

    link report:
	      17,747,602 sent on NIC_A.tx -> Virtio_A.rx (loss rate: 0%)
	      70,955,552 sent on NIC_B.tx -> Virtio_B.rx (loss rate: 0%)
	      70,955,729 sent on Virtio_A.tx -> NIC_A.rx (loss rate: 0%)
	      17,747,621 sent on Virtio_B.tx -> NIC_B.rx (loss rate: 0%)
Snabb Switch previously called mlockall() to force all memory in the
process address space to be locked to its physical location. This made
it possible to use all memory for DMA, including memory mapped from
virtual machines.

This had several side-effects:

* Prevent Snabb Switch from being swapped out. (Nice?)
* Prevent VMs that we serve from being swapped out. (Overkill?)
* Requires root permissions at startup. (Overkill?)

The new behavior is to individually mlock() the HugeTLB pages that we
allocate for DMA. The rest of the memory - Snabb Switch and VMs - is
left unlocked. This is feasible now that we only do DMA within this
memory. This also allows us to defer the root-privileged system call
until we need DMA memory, which may not happen for ceratin
applications.
Wait until the first packet is needed to allocate the memory.
Certain Snabb programs don't require root permissions. The ones that
do will need to check for them when system calls fail unexpectedly.
Use ljsyscall's geteuid() via a new library function:

    lib.root_check()
Added proper usage for neutron-sync-master and neutron-sync-agent.

Created separate usage commands for traffic: -h/--help prints command
line usage, -H/--long-help also prints the configuration file format.
…le ethernet frame) and returns true if successufly checks header (if aplicable) and data.
It's faster to return a single value from C to Lua, even if that means recalculating the other one.
The optional arguments (only -B) were treated as positional arguments.
Conflicts:
	src/program/snabbnfv/traffic/traffic.lua
the generic loop takes the initial value in network order and swaps at the end,
but the SIMD versions take it in host order.
Setting flags to C.VIO_NET_HDR_F_NEEDS_CSUM allowed even known-bad packets to pass.
it's better to set it to 0 when we do have a failed checksum.
Tests that all checksum functions return the same value with the same
parameters, including non-zero initial values.
@lukego lukego changed the title Integrate Integrated collection of open PRs on master Mar 25, 2015
lukego added a commit that referenced this pull request Mar 25, 2015
Integration branch merge containing PRs to master.
@lukego lukego merged commit 94c4a27 into snabbco:master Mar 25, 2015
@lukego lukego deleted the integrate branch March 25, 2015 15:48
dpino added a commit to dpino/snabb that referenced this pull request Sep 8, 2016
Load two virtual NICs if using two different VLAN tags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants