Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

fix concurrent map write panics #2926

Merged
merged 1 commit into from
Mar 25, 2020
Merged

fix concurrent map write panics #2926

merged 1 commit into from
Mar 25, 2020

Conversation

infa-bsurber
Copy link
Contributor

@infa-bsurber infa-bsurber commented Mar 18, 2020

Description

Solves issue #2917

  • Unit test to prove that concurrent executions of getAllowedAndExistingNamespaces cause panics, which is leading to random panics in the wild. This test can be cleaned up after fix is tested if desired.
  • Fix panics by using a concurrency-safe map or RWMutex

Unit Test Output Demonstrating Issue

go test -run TestConcurrentGetAllowedNamespaces

fatal error: concurrent map writes                                                                                                                 
                                                                                                                                               
goroutine 22418 [running]:                                                                                                                              
runtime.throw(0x2744a87, 0x15)                                                                            
        /usr/local/Cellar/go/1.12.3/libexec/src/runtime/panic.go:617 +0x72 fp=0xc03a1cdde0 sp=0xc03a1cddb0 pc=0x102fb22         
runtime.mapassign_faststr(0x24f1c80, 0xc0003ee960, 0x3af8ef4, 0x1, 0x0)                                      
        /usr/local/Cellar/go/1.12.3/libexec/src/runtime/map_faststr.go:291 +0x40f fp=0xc03a1cde48 sp=0xc03a1cdde0 pc=0x101536f
github.com/fluxcd/flux/pkg/cluster/kubernetes.(*Cluster).getAllowedAndExistingNamespaces(0xc000162500, 0x2b301e0, 0xc0000dc000, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/bsurber/Go/src/github.com/infa-bsurber/flux/pkg/cluster/kubernetes/kubernetes.go:320 +0x669 fp=0xc03a1cdf78 sp=0xc03a1cde48 pc=0x235ab29
github.com/fluxcd/flux/pkg/cluster/kubernetes.TestConcurrentGetAllowedNamespaces.func1(0xc0000de300, 0xc000162500, 0xc0000de360)
        /Users/bsurber/Go/src/github.com/infa-bsurber/flux/pkg/cluster/kubernetes/kubernetes_test.go:74 +0x5b fp=0xc03a1cdfc8 sp=0xc03a1cdf78 pc=0x2379b2b
runtime.goexit()                                                       
        /usr/local/Cellar/go/1.12.3/libexec/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc03a1cdfd0 sp=0xc03a1cdfc8 pc=0x105e4f1
created by github.com/fluxcd/flux/pkg/cluster/kubernetes.TestConcurrentGetAllowedNamespaces
        /Users/bsurber/Go/src/github.com/infa-bsurber/flux/pkg/cluster/kubernetes/kubernetes_test.go:72 +0x5bd

...

Fix 1 performance (sync.Map)

go test -run TestConcurrentGetAllowedNamespaces 
PASS
ok      github.com/fluxcd/flux/pkg/cluster/kubernetes   19.955s

Fix 2 performance (map of RWLocks)

still panicked actually, I guess maps can't be locked by key, and instead you have to lock the whole map when writing

Fix 3 performance (single RWLock) <- my personal preferred option

go test -run TestConcurrentGetAllowedNamespaces
PASS
ok      github.com/fluxcd/flux/pkg/cluster/kubernetes   19.266s

@infa-bsurber infa-bsurber changed the title add test (temporarily) to prove panics fix concurrent map write panics Mar 18, 2020
Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, and for providing a test to illustrate the issue. 🌷

Besides the minor nitpick I commented, can you please smash your commits together and rephrase the commit comment into something descriptive? After that it should be good to go 🌟

loggedAllowedNS map[string]bool // to keep track of whether we've logged a problem with seeing an allowed namespace
allowedNamespaces map[string]struct{}
loggedAllowedNS map[string]bool // to keep track of whether we've logged a problem with seeing an allowed namespace
loggedAllowedNSLock *sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
loggedAllowedNSLock *sync.RWMutex
loggedAllowedNSLock sync.RWMutex

With this you can remove L131 :-)

@infa-bsurber
Copy link
Contributor Author

made the fix per your comment and cleaned up the concurrency unit test after my final round of testing (don't want to add 20 seconds to every test run), then squashed the commits

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks @infa-bsurber 🥇

@hiddeco
Copy link
Member

hiddeco commented Mar 25, 2020

Can you please rebase the branch on top of master?

@infa-bsurber
Copy link
Contributor Author

oh wait, you said rebase, just a sec

@infa-bsurber
Copy link
Contributor Author

there, rebased on upstream master

@hiddeco hiddeco merged commit f5305a6 into fluxcd:master Mar 25, 2020
@hiddeco hiddeco added this to the 1.19.0 milestone Mar 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants