-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resizable Concurrency Pool Fix #17
Conversation
8d58145
to
bb3d02f
Compare
3a257e5
to
378af03
Compare
378af03
to
08dcc1c
Compare
08dcc1c
to
17c47b3
Compare
cancelLastGiver <- struct{}{} | ||
cancelLastGiver = nil | ||
} | ||
deferIter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this go deferIter()
probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out, as I agree it should be fine/safe, and it didn't change results in any significant way. I think it's slightly simpler to not be deferred so I think I'll leave it as is.
The previous iteration relied on this part of the select statement being reliably hit whenever an item was on the
lastGiverCancel
channel. IflastGiverCancel
channel has an item, but the go routine is still giving out capacity, then we're exceeding our desired concurrency. I, mistakenly, believed that select cases were ordered. This article does a good job showing the actual behavior.Also, given that this isn't true
taken
could have been modified concurrently (which was called out in this comment -- I was wrong, but thinking that it still wouldn't be concurrent because of the cancel channel).Each commit includes
test_results
,go test ./v1/common -timeout 30m -v -count 2 -run TestResizablePoolConcurrency > test_results
. This test attempts to evaluate how well the resizable pool works. It varies a lot based on the timing of changing the capacity relative to the worker times. The test probably isn't perfect but it shows improvements across my changes.Each test prints some some summaries, e.g. "Constant Results" "Avg" is the avg performance of all the tests that ran with a capacity that kept being reset to the same value (also the NaN results are fine -- they're from running with 0 capacity).
Looking at the constant size tests first, originally we exceeded our desired capacity ~63% of the time (on average). Each change improves this number slightly, but the last changes gets this down to ~3%.
With the rolling capacity test, originally we were at the correct capacity ~80% of the time (on average), but the worst test case was at capacity just 40% of the time. By the last commit, the average is up to 92% and the worst case is still at capacity 63% of the time. For the worst case, the improvement is better than just 40% -> 63%: originally 57% of the time we were above desired capacity, and in the last commit this is reduced to 9% in the worst case.