-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dramatically improve postgresql performance #323
Conversation
Co-authored-by: Adrian Gonzalez-Martin <agonzalezma5@bloomberg.net> Co-authored-by: Adrian Gonzalez-Martin <adrian.gonz.mar@gmail.com> Co-authored-by: Sambhav Kothari <skothari44@bloomberg.net> Co-authored-by: Silvio Moioli <moio@suse.com> Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! One nit on this, and a question: Does this require all servers to be upgraded to a new release of Kine before the schema can be changed, or is this change safe to make without updating all the nodes?
Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>
It is safe to make the change without updating all the nodes. The queries are b/w compatible since we didn't create any new columns. With the new schema, even if we do not change the queries, the existing queries get significantly faster due to the index usage. Note, that it may take some time to rebuild the indices after the |
Looks like cockroachdb does not support per column collations? |
Yeah, was just noticing that too. I'm not sure how to best handle that, as we currently expect cockroachdb to support the same SQL features as postgres. |
Actually on a closer look it looks like it supports column collation, it just does not support |
It looks like it's using |
I might end up doing something along the lines of select version(); and changing the column type based on whether that contains cockroach vs postgres. Is that reasonable? For unknown version strings, I will just assume postgres. |
Yeah, worth a try I guess? I don't know why they decided to parse that as a BCP47 language tag instead of a collation, that seems pretty broken. Not the first weird decision that project has made though. |
ffdad1b
to
0228965
Compare
Ok, hopefully my little hack works as expected. |
0228965
to
e6208eb
Compare
You might consider opening an issue with cockroachdb, maybe they just want to special-case the C collation? The end result should just be bytewise string comparison. |
Some comparisons from PR perf tests - List Before
List After
As mentioned in the PR, the gap gets wider and wider as we add more items, but we can still see that most of the queries now complete in under 0.05s (p99). Previously even the p90 was 0.1s. |
e6208eb
to
163f906
Compare
Hmm, not sure why this is still happening, let me check
|
Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>
163f906
to
4002bb9
Compare
Ok, looks like we are all good 🎉 I will deal with the cockroachdb issue creation later. |
Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>
da3426b
to
08eaa2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking up where I left this and taking it over the finish line 🏁
Thanks @brandond for the merge. Can you also cut a release please? Should this be 0.13.0 given the schema migrations needed? |
I was waiting to merge another PR, but yes we can tag a 0.13. |
Contributors
Co-Authored by @adriangonz who paired with me on this change and also tested/benched all the changes 🎉
This builds on top of the great work done in #151 by @moio (included as a co-author on this PR since it builds on top of the ideas introduced in the said PR)
Benchmarks
For a paginanted API Server list query (which uses CountRevision and List with a filter query) we were able to see speed-ups from 70+ seconds to 2 seconds (35x speed-up) for a kine DB with 1M+ resources that match the list query. The speed-up might be larger for larger kine DBs.
Individually we were able to bring the count query down to 1.5s from ~40s and the list query down to 0.5s from ~30s on our test bench.
This is fairly significant as it makes kine usable/scalable at such high volume of objects.
Without the above changes we would usually see timeouts and context cancelled errors most of the times.
Changes
Datatype Change
In Postgres,
text
tends to give better performance and use less space. When usingvarchar
,text
values can be sparse).varchar
entries totext
anyway.The main benefit provided by
varchar
is making sure that all values are shorter than N characters. However, in the case ofkine
, that's already validated upstream bykube-apiserver
.See the Postgres docs for more info.
Collate Change
When using the
C.UTF-8
locale (which comes as default on many Postgres setups), indices working ontext
(orvarchar
) columns can't be used forLIKE
operations. The workaround in this case is to create the index with atext_pattern_ops
operator. However, the resulting index can't then be used for<
/>
queries. It's possible to work around this second issue by creating a second index that uses the default operator for thetext
column, however that then introduces unnecessary overhead.Instead, a simpler fix is to change the collation of the
name
column to use theC
locale. This means that the strings saved in this column are treated as ASCII values (i.e. each character is a byte), which in turn lets queries use indices for all operations. Since the values in thename
column will be a concatenation of DNS-like segments, we shouldn't get any UTF-8 values there.See the Postgres docs on text indices for more info. This SO answer a lot of useful context around this issue.
Query changes
Most of the original List/Count queries are derived from #151 and have been adapted to support the new CountCurrent and CountRevision. There were also some minor bugs in the original PR WRT to the order of filter queries which has been fixed.
Even without the query changes, the PR achieves a significant speed up with the existing queries as all of them start using the indices for
name
in theLIKE
and>
queries.