x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

adonovan · 2022-07-11T21:29:19Z

I noticed while running TestBenchmarkConfiguredCompletion on k8s that deepCompletionState.enqueue

func (s *deepCompletionState) enqueue(cand candidate) {
	s.nextQueue = append(s.nextQueue, cand)
}

allocates 10GB of memory because it adds 5 million completion candidates to the queue, each being an 18-word structure. This causes each call of env.Completion in the benchmark to take 10s. There are two calls, the warmup and the one in the b.N loop (which only reaches N=1).

This was the command. The .go file is my favorite large file in k8s.

$ go test -bench=TestBenchmarkConfiguredCompletion ./gopls/internal/regtest/bench -completion_workdir=$HOME/w/kubernetes -completion_file=../kubernetes/pkg/generated/openapi/zz_generated.openapi.go -completion_regexp=Get

Also, 2GB (and 3s CPU) are spent on this call to Type.String in completer.methodsAndFields:

if typ.String() == "*testing.F" && addressable {
	// is that a sufficient test? (or is more care needed?)
	if c.fuzz(typ, mset, imp, cb, c.snapshot.FileSet()) {
		return
	}
}

@muirdm

The text was updated successfully, but these errors were encountered:

muirdm · 2022-07-11T22:12:44Z

The queue was added when the candidate search was changed from DFS to BFS. I presume the benchmark disables the completion budget to avoid flakiness. In practice the budget is set to ~100ms, so the search gives up sooner. Improving this hotspot would allow gopls to search more candidates within its 100ms budget, but might not end up materially improving the completion results or reducing overall CPU usage (assuming it couldn't search 5 million candidates in 100ms anyway).

findleyr · 2022-07-12T00:55:40Z

Ah, thanks @muirdm. Right: the benchmark takes 10s because IIRC the regtest environment sets a completion budget of 10s.

We should investigate whether reducing allocation allows us to search more candidates per second, as that might matter more for smaller repos.

adonovan · 2022-07-12T03:11:07Z

Thanks @muirdm for the insight. The running time of both calls---warmup and the first benchmark run---did seem curiously similar, not to mention close to 10s.

So, the benchmark is fundamentally flawed: it always measures the first 10s of a loop that never completes, and that includes completion tasks and other tasks (parsing and type checking). Therefore making the production code faster at the task the benchmark claims to measure does not improve the benchmark numbers, it merely reduces the fraction of the constant 10s of wall time that is spent in completion tasks. Consequently memory allocation appears to go through the roof (because parsing and type-checking are so allocy). I don't know the code well enough to say how, but we should change the benchmark to measure a task that corresponds to the latency experienced by the user in a completion request.

[Update: Rob pointed out that it's really my choice of using the large k8s module as input that exposes the problem with the benchmark.]

Before, these methods of the Source interface used to use a cache of buildASTCache, which built a Pos-keyed map for a whole file, but the necessary algorithm is essentially a binary search which is plenty fast enough to avoid the need for cache. This change implements that algorithm and moves both methods out of the interface into a single function, source.FindDeclAndField. -- I measured the duration of all calls to astCacheData (before) and FindDeclAndField (after) occurring within this command: $ go test -bench=TestBenchmarkConfiguredCompletion -v ./gopls/internal/regtest/bench -completion_workdir=$HOME/w/kubernetes -completion_file=../kubernetes/pkg/generated/openapi/zz_generated.openapi.go -completion_regexp=Get (The numbers reported by this benchmark are problematic, which is why I measured call times directly; see golang/go#53798.) Results: before (n=4727) max = 21ms, 90% = 4.4ms, median = 19us after (n=6282) max = 2.3ms, 90% = 25us, median = 14us The increased number of calls to the function after the change is due to a longstanding bug in the benchmark: each iteration of the b.N loop doesn't do a fixed amount of work, it does as much as it can in 10s. Thus making the code faster simply causes the benchmark to spend the same amount of time on other parts of the program--such as the loop that calls FindDeclAndField. See https://go-review.googlesource.com/c/tools/+/221021 for background on the previous implementation. Change-Id: I745ecc4e65378fbe97f456228cafba84105b7e49 Reviewed-on: https://go-review.googlesource.com/c/tools/+/416880 Auto-Submit: Alan Donovan <adonovan@google.com> Run-TryBot: Alan Donovan <adonovan@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Reviewed-by: Robert Findley <rfindley@google.com>

gopherbot · 2022-08-15T17:36:46Z

Change https://go.dev/cl/422906 mentions this issue: internal/lsp/completion: don't use Type.String for checking identity

Completion is very performance sensitive, and building a string to check for *testing.F has a significant cost. StructCompletion-8 20.7ms ±14% 16.8ms ± 1% -18.59% (p=0.000 n=10+10) ImportCompletion-8 1.36ms ± 5% 1.05ms ±18% -22.55% (p=0.000 n=9+10) SliceCompletion-8 23.5ms ± 2% 19.3ms ±18% -17.85% (p=0.000 n=7+10) FuncDeepCompletion-8 17.6ms ± 2% 15.5ms ± 2% -11.82% (p=0.000 n=8+8) CompletionFollowingEdit-8 81.2ms ± 8% 74.2ms ± 5% -8.60% (p=0.000 n=9+9) For golang/go#53992 For golang/go#53798 Change-Id: Ia138cbadce142a424caabe8259bda05bcc536055 Reviewed-on: https://go-review.googlesource.com/c/tools/+/422906 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> Run-TryBot: Robert Findley <rfindley@google.com>

adonovan · 2022-10-28T20:21:47Z

The typ.String() issue has been fixed, and the other problem is really an artifact of using a much larger input than the benchmark was designed to handle. The actual allocations on smaller repos look more reasonable. Closing.

gopherbot added Tools This label describes issues relating to any tools in the x/tools repository. gopls Issues related to the Go language server, gopls. labels Jul 11, 2022

gopherbot added this to the Unreleased milestone Jul 11, 2022

suzmue modified the milestones: Unreleased, gopls/later Jul 12, 2022

suzmue added the gopls/performance Issues related to gopls performance (CPU, memory, etc). label Jul 12, 2022

ansaba assigned adonovan Jul 14, 2022

adonovan closed this as completed Oct 28, 2022

golang locked and limited conversation to collaborators Oct 28, 2023

gopherbot added the FrozenDueToAge label Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

adonovan commented Jul 11, 2022 •

edited

Loading

muirdm commented Jul 11, 2022

findleyr commented Jul 12, 2022

adonovan commented Jul 12, 2022 •

edited

Loading

gopherbot commented Aug 15, 2022

adonovan commented Oct 28, 2022

x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

x/tools/gopls: deepCompletionState.enqueue is an allocation hotspot #53798

Comments

adonovan commented Jul 11, 2022 • edited Loading

muirdm commented Jul 11, 2022

findleyr commented Jul 12, 2022

adonovan commented Jul 12, 2022 • edited Loading

gopherbot commented Aug 15, 2022

adonovan commented Oct 28, 2022

adonovan commented Jul 11, 2022 •

edited

Loading

adonovan commented Jul 12, 2022 •

edited

Loading