Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major socket io performance regression in 0.4 and 0.5-dev #14467

Closed
ViralBShah opened this issue Dec 22, 2015 · 10 comments · Fixed by #14667
Closed

Major socket io performance regression in 0.4 and 0.5-dev #14467

ViralBShah opened this issue Dec 22, 2015 · 10 comments · Fixed by #14667
Labels
performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks regression Regression in behavior compared to a previous version
Milestone

Comments

@ViralBShah
Copy link
Member

The following basic socket io code shows a major performance regression in 0.4 and 0.5-dev compared to 0.3.

using Compat
const n=16*10^6
function server()
    s=listen( 8000)
    @async while true
        a=accept(s)
        @async while true
            try
                b = readbytes(a, n)
                println(length(b))
                write(a, b)
            catch e
                println("client closed connection")
                break
            end
        end
    end
end

function client(c)
    arr=Array(UInt8, n);
    @time begin
        write(c, arr)
        b=readbytes(c, n)
        println(length(b))
    end
end

server()

const c=connect("localhost", 8000)
client(c)

On 0.3.12: 1 second
On 0.4.2: 10 seconds
On 0.5-dev: 12 seconds

@ViralBShah ViralBShah added performance Must go faster regression Regression in behavior compared to a previous version labels Dec 22, 2015
@ViralBShah ViralBShah modified the milestones: 0.5.0, 0.4.x Dec 22, 2015
@ViralBShah
Copy link
Member Author

This was discovered by @amitmurthy and @shashi while doing some parallel benchmarking and finding poor speedup.

@amitmurthy
Copy link
Contributor

A few days back I had run a simple benchmark comparing C vs 0MQ(C API) vs Julia TCPSocket vs Julia remotecall_fetch. The test was to basically echo back 1K of data in a loop.

The results were:

C code (10^6 request-responses of 1K each)
------
8.95035 seconds
9.11186 seconds

0MQ C code (10^6 request-responses of 1K each)
----------
36.91122 seconds
36.92812 seconds

Julia TCP sockets 10^4 requests-responses of 1K each
-----------------
7.927375 seconds (260.33 k allocations: 28.702 MB, 0.30% gc time)
7.583553 seconds (260.33 k allocations: 28.702 MB, 0.32% gc time)

Julia remotecall_fetch 10^5 requests-responses of 1K each
-----------------
elapsed time: 8.995498505 seconds
elapsed time: 9.335054655 seconds

code here - https://gist.github.com/amitmurthy/945843c9f0c61bc7d9e0

The loop counts are 10^6 for C and 0MQ, 10^5 for remotecall_fetch and 10^4 for TCPSocket.

There is obviously something wrong with the TCPSocket test code, it being slower than remotecall_fetch, but I couldn't locate any obvious mistake. C code is 10 times faster than remotecall_fetch, while 0MQ is thrice as fast.

@JeffBezanson
Copy link
Member

Maybe related to #14106

@jrevels jrevels added the potential benchmark Could make a good benchmark in BaseBenchmarks label Dec 23, 2015
@amitmurthy
Copy link
Contributor

Do we have all nightly builds still available somewhere? It may be faster to spin up a AWS instance that will bisect against the nightlies rather than git bisect.

@amitmurthy
Copy link
Contributor

git bisect throws up

5863b486435b86d9db25973a684ef4428f563a73 is the first bad commit
commit 5863b486435b86d9db25973a684ef4428f563a73
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Thu Jun 4 10:15:06 2015 -0400

    make start_reading/stop_reading automatic exactly when needed. fix #1925, fix #10655 (closes #11530)

    this also makes the read throttle more intelligent so that it doesn't
    get tripped up by wait_nb requests for more than READ_BUFFER_SZ bytes

:040000 040000 7bda1043b8ea05590ad2008e0a827dbfc83746ac a1070338b67f8f88ddad6ac16352852f8f1436f3 M  base
:040000 040000 a1d20ff70afcf6555a7ab1e24ca99b76639b657a 9d1da9cca99c713a949e02616cf805c0cbebedab M  test

commit - 5863b48

The test code was

function serve(sock)
  Base.disable_nagle(sock)
  while true
    try
        a=readbytes(sock, 1024)
        write(sock,a)
    catch e
        println("client closed connection")
        break
    end
  end
end

function srvr()
  const srv = listen(8502)
  while true
    sock = accept(srv)
    @async serve(sock)
  end
end

@schedule srvr()

const n = 10^4
function client()
  sock = connect("localhost", 8502) 
  Base.disable_nagle(sock)
  @time for i in 1:n
    write(sock,UInt8[0 for x in 1:1024])
    readbytes(sock, 1024)
  end
end

client()
client()

Prior to this commit 10^4 1K request-responses take approx 1.0 second. Post this commit, it is 7.0 seconds.

cc: @vtjnash

@amitmurthy
Copy link
Contributor

Bump @vtjnash

Running dtruss for the above test code. The number of kevent system calls jumped from 3000 to 12500. The number of read system calls jumped from 1100 to 3500 when we compare the problematic patch with its immediate prior.

@amitmurthy
Copy link
Contributor

Having spent quite a few hours on this, the root cause for the slowdown is still unidentified. This branch - /~https://github.com/JuliaLang/julia/tree/amitm/fixregr - reverts the change. It breaks tests in other ways but the performance regression is vastly improved down to 1.2x (from the current 7x).

Approximately 1.5 seconds in the slowdown is due to the try-catch-finally blocks introduced.

@vtjnash
Copy link
Member

vtjnash commented Jan 14, 2016

the benchmarks show very different improvements, but the one from the OP was definitely the most substantial:
before:
14.564401 seconds (1.36 k allocations: 30.555 MB, 0.05% gc time)
after:
0.066761 seconds (1.22 k allocations: 30.545 MB, 60.54% gc time)
(over 200x faster :o)

@amitmurthy
Copy link
Contributor

Awesome!

@tkelman
Copy link
Contributor

tkelman commented Jan 14, 2016

moving backport pending label to the PR #14667

tkelman pushed a commit that referenced this issue Mar 7, 2016
fixes #14467 performance regression (and some)

(cherry picked from commit 9dbc3ab)
ref #14667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants