-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GH-244]: Fixed issue "Status sync job exits if an error happens for a single user" by added additional logging #245
Conversation
… logging (#2) * [MI-2664]:Fixed issue mattermost#244 on ms calender to add additional logging * [MI-2664]:Added comment for skipping user * [MI-2664]:Fixed unit test cases * [MI-2664]:Fix lint errors * [MI-2664]:Added conditional logging * [MI-2664]:Added constants for limit * [MI-2664]:Fixed review comments * [MI-2664]:Fixed review comments
Hello @Kshitij-Katiyar, Thanks for your pull request! A Core Committer will review your pull request soon. For code contributions, you can learn more about the review process here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kshitij-Katiyar Thanks for the quick turnaround on this 👍 Mostly looks good, just a few requests
… of mscalender plugin (#3) * [MI-2687]:Fixed review comments given by mattermost team on issue mattermost#244 of mscalender plugin * [MI-2687]:Fixed unit test cases * [MI-2687]:Fixed review fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks @Kshitij-Katiyar!
@Kshitij-Katiyar one suggestion to consider here would be - Once the sync job completes, can it write to the logs how many users where sync and how many errored out? This would provide visibility and confirm that the the job is now attempting to sync the entire user base. |
sure @DHaussermann @mickmister We can add a summary after the job ends. Can you please also give suggestions of what are the things I should include in that? Some examples could be
Please give your views on this. |
@Kshitij-Katiyar @DHaussermann I think we want to be cognizant of logging too much here. At the moment, we have a debug log before the job runs, and one after the job runs. These logs simply say that the job began and ended. There are some different levels of granularity we can add here:
The individual errors have the user ids in them, so no need to include the user ids in the aggregate log message at the end of the job. I think the first bullet point is appropriate here. If the second one is not a big hassle, then I think we should also include the "number of statuses changed". |
/update-branch |
We don't have permissions to update this PR, please contact the submitter to apply the update. |
* [MI-2705]:Added summary for status sync job * [MI-2705]:Fixed review comments * Fixed lint errors (mattermost#243) * [MI-2705]:Fixed review comments * [MI-2705]:Fixed review comments --------- Co-authored-by: Manoj Malik <manoj@brightscout.com>
@mickmister @DHaussermann Added the functionality of showing the summary for the status sync job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of unrelated changes in the PR. Could you please remove them?
if numberOfLogs < logTruncateLimit { | ||
m.Logger.Warnf("Not able to load user %s from user index. err=%v", u.MattermostUserID, err) | ||
} else if numberOfLogs == logTruncateLimit { | ||
m.Logger.Warnf(logTruncateMsg) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of posing a truncated message instead of the "real" one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanzei Please read this comment #244 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mickmister I don't understand your point in #244 (comment). If the goal is not to flood the logs, why are messages logged that only contain the information that the logs got trunkated? Isn't the number of log messages still the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanzei @mickmister I think the number of logs will not be the same. We will log the first 4 logs coming from a particular function/code snippet, then for the 5th one we will log the Truncate and after that, we will not be logging anything. So the max number of logs will be 5 in any case. If it is more than that we are truncating means not logging anything for a particular function.
…lendar into MM-244
This PR has been automatically labelled "stale" because it hasn't had recent activity. |
This PR has been automatically labelled "stale" because it hasn't had recent activity. |
@mickmister @hanzei @DHaussermann what's the state of this related to a) code review, b) QA review? I see it is approved for (a), so is it a matter of finalising the QA review on that? |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #245 +/- ##
==========================================
+ Coverage 23.31% 23.68% +0.37%
==========================================
Files 62 62
Lines 3049 3090 +41
==========================================
+ Hits 711 732 +21
- Misses 2259 2277 +18
- Partials 79 81 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@m1lt0n @DHaussermann Indeed it seems this is just awaiting QA review |
@Kshitij-Katiyar I can't test this. When I compile and deploy this to a cloud server (my local Darwin ARM seems to not be supported) The plugin starts up and then starts crashing until health-check eventually takes it down.
After this error it seems to crash Steps:
Can you please let me know if you can reproduce this? |
@DHaussermann @mickmister I have fixed this. Actually in the code at one point I was accessing the attributes of a nil object, resulting in panic by mistake. This could only happen in an instance where there was no previous ms-calendar plugin installed. I have added a check now and also tested it on a new instance everything worked fine. Please test this again and let me know if you find any issues. |
Hi @Kshitij-Katiyar
The piece I'm struggling to test here is to see and error occur on 1 of the records and the job will still complete. from your summary "`err=not found during user status sync job..." How can I reproduce such an error? I have this running locally with 2 users and have tried manually updating several values in KV store to invalidate them to try and cause such an error so I can see the sync complete. So far this has not worked. Updating Are you able to offer any suggestions here? |
Hi @DHaussermann , I have done the changes for the part that even if we get an error from 1 user the job will still continue for other users. If you want to get or reproduce the above error user not found I can tell you what I have tried -
|
Thanks @Kshitij-Katiyar taking another look and using your suggestion of deleting the records from the store - I can see the user not found errors
The issue yesterday was that Also @mickmister does this last commit require code review? |
@DHaussermann Ok yes I have fixed it. It was a very small change. |
err := m.Filter(withSuperuserClient) | ||
if err != nil { | ||
return "", err | ||
return "", &StatusSyncJobSummary{}, errors.Wrap(err, "not able to filter the super user client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick: In general, functions that return a non-nil error should have the other fields set to nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanzei Actually here I am not returning nil because, in its parent functions, I am using an attribute of this object, and if I return nil from here that could give panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and passed
- When the sync fails for one or more users the job still completes
warn
records are now shown is the server log for failed users- A count of processed and failed users is shown in the server logs after the job runs
- Retested after last commit for any functional changes
LGTM!
@mickmister or @hanzei please merge when you have a chance. |
/update-branch |
We don't have permissions to update this PR, please contact the submitter to apply the update. |
…lendar into MM-244
@mickmister @hanzei Synced with master |
We don't have permissions to update this PR, please contact the submitter to apply the update. |
Summary
Users were getting the error in the server logs is an Error during user status sync job. err=not found during user status sync job. The status sync job was exiting if an error happened for a single user. So, I changed the logic to continue the process and log the error in server logs.
There will be at max 5 logs for a particular error/function.
Issue #244