-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798
Comments
The queue was added when the candidate search was changed from DFS to BFS. I presume the benchmark disables the completion budget to avoid flakiness. In practice the budget is set to ~100ms, so the search gives up sooner. Improving this hotspot would allow gopls to search more candidates within its 100ms budget, but might not end up materially improving the completion results or reducing overall CPU usage (assuming it couldn't search 5 million candidates in 100ms anyway). |
Ah, thanks @muirdm. Right: the benchmark takes 10s because IIRC the regtest environment sets a completion budget of 10s. We should investigate whether reducing allocation allows us to search more candidates per second, as that might matter more for smaller repos. |
Thanks @muirdm for the insight. The running time of both calls---warmup and the first benchmark run---did seem curiously similar, not to mention close to 10s. So, the benchmark is fundamentally flawed: it always measures the first 10s of a loop that never completes, and that includes completion tasks and other tasks (parsing and type checking). Therefore making the production code faster at the task the benchmark claims to measure does not improve the benchmark numbers, it merely reduces the fraction of the constant 10s of wall time that is spent in completion tasks. Consequently memory allocation appears to go through the roof (because parsing and type-checking are so allocy). I don't know the code well enough to say how, but we should change the benchmark to measure a task that corresponds to the latency experienced by the user in a completion request. [Update: Rob pointed out that it's really my choice of using the large k8s module as input that exposes the problem with the benchmark.] |
Before, these methods of the Source interface used to use a cache of buildASTCache, which built a Pos-keyed map for a whole file, but the necessary algorithm is essentially a binary search which is plenty fast enough to avoid the need for cache. This change implements that algorithm and moves both methods out of the interface into a single function, source.FindDeclAndField. -- I measured the duration of all calls to astCacheData (before) and FindDeclAndField (after) occurring within this command: $ go test -bench=TestBenchmarkConfiguredCompletion -v ./gopls/internal/regtest/bench -completion_workdir=$HOME/w/kubernetes -completion_file=../kubernetes/pkg/generated/openapi/zz_generated.openapi.go -completion_regexp=Get (The numbers reported by this benchmark are problematic, which is why I measured call times directly; see golang/go#53798.) Results: before (n=4727) max = 21ms, 90% = 4.4ms, median = 19us after (n=6282) max = 2.3ms, 90% = 25us, median = 14us The increased number of calls to the function after the change is due to a longstanding bug in the benchmark: each iteration of the b.N loop doesn't do a fixed amount of work, it does as much as it can in 10s. Thus making the code faster simply causes the benchmark to spend the same amount of time on other parts of the program--such as the loop that calls FindDeclAndField. See https://go-review.googlesource.com/c/tools/+/221021 for background on the previous implementation. Change-Id: I745ecc4e65378fbe97f456228cafba84105b7e49 Reviewed-on: https://go-review.googlesource.com/c/tools/+/416880 Auto-Submit: Alan Donovan <adonovan@google.com> Run-TryBot: Alan Donovan <adonovan@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Reviewed-by: Robert Findley <rfindley@google.com>
Change https://go.dev/cl/422906 mentions this issue: |
Completion is very performance sensitive, and building a string to check for *testing.F has a significant cost. StructCompletion-8 20.7ms ±14% 16.8ms ± 1% -18.59% (p=0.000 n=10+10) ImportCompletion-8 1.36ms ± 5% 1.05ms ±18% -22.55% (p=0.000 n=9+10) SliceCompletion-8 23.5ms ± 2% 19.3ms ±18% -17.85% (p=0.000 n=7+10) FuncDeepCompletion-8 17.6ms ± 2% 15.5ms ± 2% -11.82% (p=0.000 n=8+8) CompletionFollowingEdit-8 81.2ms ± 8% 74.2ms ± 5% -8.60% (p=0.000 n=9+9) For golang/go#53992 For golang/go#53798 Change-Id: Ia138cbadce142a424caabe8259bda05bcc536055 Reviewed-on: https://go-review.googlesource.com/c/tools/+/422906 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> Run-TryBot: Robert Findley <rfindley@google.com>
The typ.String() issue has been fixed, and the other problem is really an artifact of using a much larger input than the benchmark was designed to handle. The actual allocations on smaller repos look more reasonable. Closing. |
I noticed while running
TestBenchmarkConfiguredCompletion
on k8s thatdeepCompletionState.enqueue
allocates 10GB of memory because it adds 5 million completion candidates to the queue, each being an 18-word structure. This causes each call of
env.Completion
in the benchmark to take 10s. There are two calls, the warmup and the one in the b.N loop (which only reaches N=1).This was the command. The .go file is my favorite large file in k8s.
Also, 2GB (and 3s CPU) are spent on this call to
Type.String
incompleter.methodsAndFields
:@muirdm
The text was updated successfully, but these errors were encountered: