Integrated collection of open PRs on master #424

lukego · 2015-03-25T14:19:38Z

Merge with testing of open pull requests to master:

Makefile: Streamline LuaJIT build options #389 Streamline LuaJIT build options
cksum_avx2: Switch to AVX unaligned loads #414 cksum_avx2: Use unaligned loads
Run in busy-loop (engine.Hz = false) by default #415 Run in busy-loop (Hz = false) by default
engine: Make link report prettier #416 engine: Make link status printouts prettier.
Relax requirement to run as root and mlock() less memory #417 Relax requirement to run as root and mlock() individual pages on demand
snabbnfv: Improve --help usage printouts #418 snabbnfv: Improve --help usage output
snabbnfv traffic: Fix command-line argument parsing #420 snabbnfv traffic: Fix command-line argument parsing
[csum-offload-simd] Incoming checksum verification #411 checksum offload: Perform checksum validation on packets sent to Virtio-net

Strip away these options: -DLUAJIT_USE_PERFTOOLS This feature causes noise in /tmp (perf-nnnn.map files). There is a better way to profile at trace granularity with the LuaJIT profiler. Hopefully this will be upstream in LuaJIT but for now available as a patch: http://www.freelists.org/post/luajit/Profile-at-trace-granularity,1 -DLUAJIT_USE_GDBJIT The documentation says that "enabling it always has a non-negligible overhead -- do not use in release mode" and I don't believe that we are using this functionality (debugging JIT traces in gdb). -DLUAJIT_NUMMODE=3 This is less clear cut. Mike Pall has encouraged us to experiment with settings for this variable for potential performance benefits. The current value (3) was chosen during a micro-optimization spree many moons ago. I am not confident whether this setting helps or is better than other settings. So I am disabling it in this commit ("less voodoo") and we will have to check for performance regressions before merging onto master.

Simplify the code by operating on unaligned data. CPUs supporting AVX2 handle unaligned access very efficiently (source: Agner Fog). The previous version of this function used a non-SIMD checksum for any non-aligned data at the start of the packet. (This is still done for the non-aligned part at the end, since the AVX code still operates on minimum 32 byte chunks.)

This will make Snabb Switch always show 100% CPU in default settings. Previously Snabb Switch has by default attempted to bring in a batch of new packets once every 100us. This is too conservative on the straightline design because we now work on smaller batches of packets (max is now 256 and was 8192 before). So this change makes the default behavior to busy-loop without sleeping to maximize throughput. Going forwards we should consider a dynamic mechanism that will make a truly idle Snabb Switch sleep. Meanwhile the default is to busy-loop and a static configuration is possible with a setting like 'engine.Hz=10000'.

The standard link report printout is now a little prettier: links are alphabetically sorted (resolves snabbco#385), punctuation is improved, and values are printed with commas and justified. Before: link report 597223954 sent on Virtio_A.tx -> NIC_A.rx (loss rate: 0%)) 137411842 sent on NIC_A.tx -> Virtio_A.rx (loss rate: 0%)) 597223786 sent on NIC_B.tx -> Virtio_B.rx (loss rate: 0%)) 137411851 sent on Virtio_B.tx -> NIC_B.rx (loss rate: 0%)) After: link report: 17,747,602 sent on NIC_A.tx -> Virtio_A.rx (loss rate: 0%) 70,955,552 sent on NIC_B.tx -> Virtio_B.rx (loss rate: 0%) 70,955,729 sent on Virtio_A.tx -> NIC_A.rx (loss rate: 0%) 17,747,621 sent on Virtio_B.tx -> NIC_B.rx (loss rate: 0%)

Snabb Switch previously called mlockall() to force all memory in the process address space to be locked to its physical location. This made it possible to use all memory for DMA, including memory mapped from virtual machines. This had several side-effects: * Prevent Snabb Switch from being swapped out. (Nice?) * Prevent VMs that we serve from being swapped out. (Overkill?) * Requires root permissions at startup. (Overkill?) The new behavior is to individually mlock() the HugeTLB pages that we allocate for DMA. The rest of the memory - Snabb Switch and VMs - is left unlocked. This is feasible now that we only do DMA within this memory. This also allows us to defer the root-privileged system call until we need DMA memory, which may not happen for ceratin applications.

Wait until the first packet is needed to allocate the memory.

Certain Snabb programs don't require root permissions. The ones that do will need to check for them when system calls fail unexpectedly.

Use ljsyscall's geteuid() via a new library function: lib.root_check()

Added proper usage for neutron-sync-master and neutron-sync-agent. Created separate usage commands for traffic: -h/--help prints command line usage, -H/--long-help also prints the configuration file format.

…le ethernet frame) and returns true if successufly checks header (if aplicable) and data.

…ng again.

It's faster to return a single value from C to Lua, even if that means recalculating the other one.

The optional arguments (only -B) were treated as positional arguments.

Conflicts: src/program/snabbnfv/traffic/traffic.lua

the generic loop takes the initial value in network order and swaps at the end, but the SIMD versions take it in host order.

Setting flags to C.VIO_NET_HDR_F_NEEDS_CSUM allowed even known-bad packets to pass. it's better to set it to 0 when we do have a failed checksum.

Tests that all checksum functions return the same value with the same parameters, including non-zero initial values.

Integration branch merge containing PRs to master.

Load two virtual NICs if using two different VLAN tags

lukego and others added 28 commits March 5, 2015 07:47

core.packet: Don't allocate packets at module-load time

487873c

Wait until the first packet is needed to allocate the memory.

core.main: Don't insist on starting as root

25f97c6

Certain Snabb programs don't require root permissions. The ones that do will need to check for them when system calls fail unexpectedly.

memory/pci: Explicitly check for root access when needed

51519c5

Use ljsyscall's geteuid() via a new library function: lib.root_check()

snabbnfv: Improve --help usage printouts

6759f01

Added proper usage for neutron-sync-master and neutron-sync-agent. Created separate usage commands for traffic: -h/--help prints command line usage, -H/--long-help also prints the configuration file format.

new function checksum.verify_packet() takes an IP packet (not the who…

122bfa7

…le ethernet frame) and returns true if successufly checks header (if aplicable) and data.

if we can check the incoming packet, tell the VM not to bother checki…

c809291

…ng again.

optimization: move the ntohs() to the end of the sums

3dec599

return a single value, not a structure

482f89e

It's faster to return a single value from C to Lua, even if that means recalculating the other one.

Merge PR snabbco#389 to integration branch

48b1721

Merge PR snabbco#414 to integration branch

fb867d8

Merge PR snabbco#415 to integration branch

ae0ff06

Merge PR snabbco#417 to integration branch

629e905

Merge PR snabbco#416 to integration branch

fa2b1d1

Merge PR snabbco#418 to integration branch

5ab5336

snabbnfv traffic: Fix command-line argument parsing

bdc7be7

The optional arguments (only -B) were treated as positional arguments.

snabbnfv traffic: Fix benchmark-mode startup

c2c7732

snabbnfv traffic: Print profiler output after benchmark

7ceacca

Merge PR snabbco#420 into integration branch

72da719

Conflicts: src/program/snabbnfv/traffic/traffic.lua

ugly hack: return pseudoheader initial swapped or not according to size

54cdbcc

the generic loop takes the initial value in network order and swaps at the end, but the SIMD versions take it in host order.

Explicit difference between 'bad' and 'unknown' packets

fc1a2b4

Setting flags to C.VIO_NET_HDR_F_NEEDS_CSUM allowed even known-bad packets to pass. it's better to set it to 0 when we do have a failed checksum.

Add random initial value to selftest

a0b6811

Tests that all checksum functions return the same value with the same parameters, including non-zero initial values.

Initial values are in network order

32ca44c

Merge PR snabbco#411 into integration branch.

6a41f25

lukego changed the title ~~Integrate~~ Integrated collection of open PRs on master Mar 25, 2015

lukego added a commit that referenced this pull request Mar 25, 2015

Merge pull request #424 from lukego/integrate

94c4a27

Integration branch merge containing PRs to master.

lukego merged commit 94c4a27 into snabbco:master Mar 25, 2015

lukego deleted the integrate branch March 25, 2015 15:48

dpino added a commit to dpino/snabb that referenced this pull request Sep 8, 2016

Merge pull request snabbco#424 from Igalia/different-vlan-tags

9161a1a

Load two virtual NICs if using two different VLAN tags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrated collection of open PRs on master #424

Integrated collection of open PRs on master #424

lukego commented Mar 25, 2015

Integrated collection of open PRs on master #424

Integrated collection of open PRs on master #424

Conversation

lukego commented Mar 25, 2015