Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stably sort controllerrevisions #66882

Conversation

ryanmcnamara
Copy link

@ryanmcnamara ryanmcnamara commented Aug 2, 2018

Fixes #61998

There are times when multiple "equal" controllerrevisions are created with
the same revision number. When this happens and this is the case for the
largest revision number, the statefulset controller will periodically
select one of the maximal controllerrevisions to be the target of the
underlying statefulset. The selection happens here: /~https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L212.
Prior to this change this selection was random as the sort was not
stable, which caused the pods of a stable set to continually roll.

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Stably sort controllerrevisions. This can prevent pods of statefulsets from continually rolling.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 2, 2018
@ryanmcnamara
Copy link
Author

/assign @smarterclayton

@ryanmcnamara
Copy link
Author

This is somewhat related to this bug: #63282
Regardless of how the discussion finishes there, this will still cause a problem as there can be multiple equal controller revisions at the end of the sorted array that can be selected as mentioned above, and without the stable sort they are selected randomly

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 2, 2018
@tanshanshan
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 13, 2018
@BenTheElder
Copy link
Member

/retest

@BenTheElder
Copy link
Member

/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Sep 7, 2018
@BenTheElder
Copy link
Member

Verify is failing on a gofmt check, can you please run hack/update-gofmt.sh?

@BenTheElder
Copy link
Member

one more format issue:

diff -u ./pkg/controller/history/controller_history_test.go.orig ./pkg/controller/history/controller_history_test.go
--- ./pkg/controller/history/controller_history_test.go.orig	2018-09-07 09:43:31.562334296 +0000
+++ ./pkg/controller/history/controller_history_test.go	2018-09-07 09:43:31.563334384 +0000
@@ -1545,7 +1545,7 @@
 		want      []string
 	}
 	testFn := func(test *testcase, t *testing.T) {
-		t.Run(test.name,func(t *testing.T) {
+		t.Run(test.name, func(t *testing.T) {
 			SortControllerRevisions(test.revisions)
 			for i := range test.revisions {
 				if test.revisions[i].Name != test.want[i] {

Run ./hack/update-gofmt.sh

@ryanmcnamara
Copy link
Author

/retest

@ryanmcnamara
Copy link
Author

pull-kubernetes-verify — Job succeeded.
so I think format's good, not sure if the other failures were a flake or what

@BenTheElder
Copy link
Member

Those do look like flakes, I've also mentioned this PR in #sig-apps on slack seeing if I can get someone appropriate to take a look. 👍

sort.Sort(byRevision(revisions))
// Stably sort first by name to break revision number ties by name
sort.Stable(byName(revisions))
sort.Stable(byRevision(revisions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to sort it twice. In the Less function of byRevision, use a tie-breaker there when two revisions are equal. I'd prefer using something like CreationTimestamp as a tie-breaker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Name as another tie-breaker if both Revision and CreationTimestamp are equal.

@janetkuo janetkuo self-assigned this Sep 11, 2018
@BenTheElder
Copy link
Member

gentle poke @ryanmcnamara :-) #66882 (comment)

@ryanmcnamara
Copy link
Author

/assign @janetkuo

@BenTheElder
Copy link
Member

Thanks for keeping after this! Poke @janetkuo :-)

@ryanmcnamara
Copy link
Author

@janetkuo any update?

@ryanmcnamara
Copy link
Author

Ping @BenTheElder @janetkuo

Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ryanmcnamara
Copy link
Author

@janetkuo done!
Out of curiosity does the squash and merge github UI button not work? Or is there some additional process in merging prs for kubernetes?

@ryanmcnamara
Copy link
Author

/retest

@BenTheElder
Copy link
Member

Out of curiosity does the squash and merge github UI button not work? Or is there some additional process in merging prs for kubernetes?

We have a merge robot that ~ merges approved + lgtm'd PRs once they are passing tests against the target branch.

Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
sorry for the huge delay!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2018
@BenTheElder
Copy link
Member

Thanks for working on this! Merge details (tide) here. This PR still needs approval from @janetkuo

@ryanmcnamara
Copy link
Author

ping @janetkuo @BenTheElder

@BenTheElder
Copy link
Member

/test pull-kubernetes-godeps
Will need Janet's approval still, or someone else in the OWNERS /~https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/history/OWNERS

@janetkuo
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, ryanmcnamara

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2018
@janetkuo
Copy link
Member

/test pull-kubernetes-godeps

@BenTheElder
Copy link
Member

godep: error downloading dep (bitbucket.org/ww/goautoneg): exit status 255

ah, we've had issues with hg + bitbucket, there's a fix being worked on. I don't think we have a tracking issue yet.

@ryanmcnamara
Copy link
Author

/retest

@ryanmcnamara
Copy link
Author

/test pull-kubernetes-godeps

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

6 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@BenTheElder
Copy link
Member

/test pull-kubernetes-godeps

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit ec4105a into kubernetes:master Dec 19, 2018
@BenTheElder
Copy link
Member

Sorry that took so long @ryanmcnamara, thanks again for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Statefulset pods sometimes continually rolling
7 participants