Added repeats for github status updates #14530

lebeg · 2019-03-26T15:07:55Z

Description

Currently the build status is not properly propagated to GitHub: #11654

This commit adds 3 repeats do the update with a 1 second interval.

Comments

Using a different Jenkins plugin GitHub Integration Plugin has been considered, but unfortunately after installation there was not way to configure it - the options were missing from the configuration screen
All other plugins required explicit credentials passing as arguments
No way of asking for the commit status for verification has been found to check the status after the update
No way for checking the list of repositories that need to be updated was found (and whether it's empty) - all what happens, happens internally in the plugin

All of this made the proposed solution the only option.

abhinavs95 · 2019-03-26T18:49:11Z

@lebeg Could you please provide a description for the PR?

abhinavs95 · 2019-03-26T18:49:43Z

@mxnet-label-bot add [pr-work-in-progress]

jlcontreras · 2019-03-27T15:44:21Z

ci/Jenkinsfile_utils.groovy

@@ -186,6 +186,42 @@ def update_github_commit_status(state, message) {
    context = get_github_context()
    echo "context=${context}"

+    echo "sending the github status 1..."


Not familiar with groovy, but wouldn't a loop work here? Something like

Suggested change

echo "sending the github status 1..."

for (int i = 0; i <3; i++) {

echo String.format("sending the github status %d...", i+1)

step([

$class: 'GitHubCommitStatusSetter',

reposSource: [$class: "ManuallyEnteredRepositorySource", url: repoUrl],

contextSource: [$class: "ManuallyEnteredCommitContextSource", context: context],

commitShaSource: [$class: "ManuallyEnteredShaSource", sha: commitSha],

statusBackrefSource: [$class: "ManuallyEnteredBackrefSource", backref: "${env.RUN_DISPLAY_URL}"],

errorHandlers: [[$class: 'ShallowAnyErrorHandler']],

statusResultSource: [

$class: 'ConditionalStatusResultSource',

results: [[$class: "AnyBuildResult", message: message, state: state]]

]

])

if(i < 2){

sleep 1

}

}

jlcontreras

Lgtm

jlcontreras

Looks even better to me :)

yuxihu

LGTM

vishaalkapoor · 2019-03-27T17:34:58Z

ci/Jenkinsfile_utils.groovy

-      ]
-    ])
+    // a few attempts need to be made: /~https://github.com/apache/incubator-mxnet/issues/11654
+    for (int attempt = 1; attempt < 4; attempt++) {


nit, use <= 3. it's easier to mentally parse at a glance. same with <= 2. (philosophical aside: 4 isn't the significant piece of info, 3 is. when you have < len(a) it makes sense to avoid the - 1 because that clutters the code, but here we're interested in 3 not 4, and 2 not 3 :)).

Ok, changed as you suggested

vishaalkapoor · 2019-03-27T17:36:28Z

ci/Jenkinsfile_utils.groovy

+    for (int attempt = 1; attempt < 4; attempt++) {
+      echo "Sending GitHub status attempt ${attempt}..."
+
+      step([


I don't understand few points here

(why) will this step be repeated if it's successful? is there a way to see if the step succeeded and not iterate?

is the operation idempotent (in the event you cannot determine if the step succeeded and are forced to do the step several times)?

What kind of outages is the backoff strategy aiming to mitigate and what is the duration and cause of those outages? Is the backoff of 1 second x 3 empirically effective? Have you considered different strategies? note this isn't a high throughput situation so i wouldn't recommend anything like randomized exponential backoff, but I was curious about exponential backoff.

is there a more robust way to handle this planned?
Thanks!

@vishaalkapoor I mentioned the answers in the description, but here they are again explicitly:

(why) will this step be repeated if it's successful? is there a way to see if the step succeeded and not iterate?

There is no known to me way to determine if step was a success. In case of a real failure the step just does nothing - the list of repositories it needs to update is empty.

is the operation idempotent (in the event you cannot determine if the step succeeded and are forced to do the step several times)?

I needed to look in Wikipedia what Idempotence actually is and found out that yes, the operation is idempotent.

What kind of outages is the backoff strategy aiming to mitigate and what is the duration and cause of those outages?

The repository object can not be fetched from a cache for some reason by GitHubCommitStatusSetter. This happens once in a while and became a significant problem because we have split the pipelines.

Is the backoff of 1 second x 3 empirically effective? Have you considered different strategies?

We are aiming to test that on all the PR's. For the test job it didn't fail at all. If this simple strategy will not be enough we might look into other.

is there a more robust way to handle this planned?

Potentially another plugin might be used when it will be mature enough. Currently it doesn't work since it can not be configured properly.

Apologies for missing the first point which was addressed in the comments, but you didn't address backoff or idempotence :) If the back-off strategy isn't right, you may see the issue reappear with some low frequency.

I'm good with committing because it will temporarily fix the issue at least (LGTM)

A suggestion if it's not in the comments already :) Can you repro with a minimal scenario and figure out some probabilities (to justify 3 back-offs and a 1 second delay)?

Can you repro with a minimal scenario and figure out some probabilities

There is a certain problem with that. The test job doesn't catch the failures, but I can see a metric in CloudWatch that this is happening 1-2 times per hour. Once this PR get merged it should have an immediate effect on the metric. If not - the strategy would need to be reconsidered.

vishaalkapoor · 2019-03-27T17:41:23Z

Thanks for working on this @lebeg - some comments added :)

lanking520

Thanks for your contribution

Not fully covered

lanking520 · 2019-04-08T16:03:41Z

ci/Jenkinsfile_utils.groovy

-      ]
-    ])
+    // a few attempts need to be made: /~https://github.com/apache/incubator-mxnet/issues/11654
+    for (int attempt = 1; attempt <= 3; attempt++) {


It seemed not to be an attempt, it will run three times no matter what if the first time succeeded or not. Please correct me if I understand wrongly.

@lanking520 yes, you understand correctly. Do you suggest to rename the variable?

@lebeg Please check the attempt if it successful by checking the result (status:200).

@lanking520 the return code of what you suggest to check?

@lanking520 the problem is that this call doesn't return a value to check against. The strategy here is really to just hit the github endpoint three times (instead of once) as a mitigation for the github status problem. We realise it's not elegant, and that it doesn't solve the problem. But this is just a stop-gap measure to improve the the situation for the developers until we can finally fix the issue.

Maybe I'll mention again here the reasons mentioned in the description of this PR for clarity:

Using a different Jenkins plugin GitHub Integration Plugin has been considered, but unfortunately after installation there was not way to configure it - the options were missing from the configuration screen

All other plugins required explicit credentials passing as arguments

No way of asking for the commit status for verification has been found to check the status after the update

No way for checking the list of repositories that need to be updated was found (and whether it's empty) - all what happens, happens internally in the plugin

* Added retries to GitHub status update * Update Jenkinsfile_utils.groovy * Update Jenkinsfile_utils.groovy

marcoabreu added the pr-work-in-progress PR is still work in progress label Mar 26, 2019

lebeg force-pushed the github_status branch 6 times, most recently from bbb8cf9 to 917d178 Compare March 27, 2019 14:51

lebeg changed the title ~~[WIP] (Do not merge) Removed conflicting GitHub status update~~ Added repeats for github status updates Mar 27, 2019

jlcontreras reviewed Mar 27, 2019

View reviewed changes

lebeg force-pushed the github_status branch from 917d178 to 3181b73 Compare March 27, 2019 15:59

jlcontreras approved these changes Mar 27, 2019

View reviewed changes

Added retries to GitHub status update

cfab719

lebeg force-pushed the github_status branch from 3181b73 to cfab719 Compare March 27, 2019 16:02

jlcontreras approved these changes Mar 27, 2019

View reviewed changes

yuxihu approved these changes Mar 27, 2019

View reviewed changes

vishaalkapoor reviewed Mar 27, 2019

View reviewed changes

lebeg added 2 commits March 28, 2019 13:50

Update Jenkinsfile_utils.groovy

3d678c1

Update Jenkinsfile_utils.groovy

4c2efc4

lanking520 previously approved these changes Apr 8, 2019

View reviewed changes

lanking520 reviewed Apr 8, 2019

View reviewed changes

marcoabreu merged commit 733e54c into apache:master Apr 9, 2019

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Added repeats for github status updates (apache#14530)

67e9c21

* Added retries to GitHub status update * Update Jenkinsfile_utils.groovy * Update Jenkinsfile_utils.groovy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added repeats for github status updates #14530

Added repeats for github status updates #14530

lebeg commented Mar 26, 2019 •

edited

Loading

abhinavs95 commented Mar 26, 2019

abhinavs95 commented Mar 26, 2019

jlcontreras Mar 27, 2019

jlcontreras left a comment

jlcontreras left a comment

yuxihu left a comment

vishaalkapoor Mar 27, 2019 •

edited

Loading

lebeg Mar 28, 2019

vishaalkapoor Mar 27, 2019 •

edited

Loading

lebeg Mar 28, 2019

vishaalkapoor Mar 28, 2019 •

edited

Loading

lebeg Mar 28, 2019

vishaalkapoor commented Mar 27, 2019

lanking520 left a comment

lanking520 Apr 8, 2019

lebeg Apr 8, 2019

lanking520 Apr 8, 2019

lebeg Apr 9, 2019

perdasilva Apr 9, 2019

lebeg Apr 9, 2019

-    echo "sending the github status 1..."
+for (int i = 0; i <3; i++) {
+   echo String.format("sending the github status %d...", i+1)
+   step([
+      $class: 'GitHubCommitStatusSetter',
+      reposSource: [$class: "ManuallyEnteredRepositorySource", url: repoUrl],
+      contextSource: [$class: "ManuallyEnteredCommitContextSource", context: context],
+      commitShaSource: [$class: "ManuallyEnteredShaSource", sha: commitSha],
+      statusBackrefSource: [$class: "ManuallyEnteredBackrefSource", backref: "${env.RUN_DISPLAY_URL}"],
+      errorHandlers: [[$class: 'ShallowAnyErrorHandler']],
+      statusResultSource: [
+        $class: 'ConditionalStatusResultSource',
+        results: [[$class: "AnyBuildResult", message: message, state: state]]
+      ]
+    ])
+    if(i < 2){
+        sleep 1
+    }
+}

Added repeats for github status updates #14530

Added repeats for github status updates #14530

Conversation

lebeg commented Mar 26, 2019 • edited Loading

Description

Comments

abhinavs95 commented Mar 26, 2019

abhinavs95 commented Mar 26, 2019

Choose a reason for hiding this comment

jlcontreras left a comment

Choose a reason for hiding this comment

jlcontreras left a comment

Choose a reason for hiding this comment

yuxihu left a comment

Choose a reason for hiding this comment

vishaalkapoor Mar 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishaalkapoor Mar 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishaalkapoor Mar 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishaalkapoor commented Mar 27, 2019

lanking520 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lebeg commented Mar 26, 2019 •

edited

Loading

vishaalkapoor Mar 27, 2019 •

edited

Loading

vishaalkapoor Mar 27, 2019 •

edited

Loading

vishaalkapoor Mar 28, 2019 •

edited

Loading