Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log nodes that don't reach consensus #452

Merged
merged 2 commits into from
Dec 19, 2019
Merged

Log nodes that don't reach consensus #452

merged 2 commits into from
Dec 19, 2019

Conversation

steven-sheehy
Copy link
Member

@steven-sheehy steven-sheehy commented Dec 17, 2019

Detailed description:

  • Log nodes that don't pass signature verification
  • Encapsulate the stream of signatures into a FileStreamSignature class that allows different methods to enrich it with data like hash, signature and validity
  • Utilize Multimap to avoid NPE and avoid sorting unnecessarily
  • Fix an issue with continuing to next file when signatures not verified (which would then download and fail hash chain)
  • Update troubleshooting doc with logging changes

Which issue(s) this PR fixes:
Fixes #444

Special notes for your reviewer:
Example output:

 2019-12-19 12:56:22,082 INFO  [pool-5-thread-4] c.h.m.i.d.r.RecordFileDownloader Downloaded 2 signatures for node 0.0.3 in 656.2 ms
 2019-12-19 12:56:22,082 INFO  [pool-5-thread-2] c.h.m.i.d.r.RecordFileDownloader Downloaded 2 signatures for node 0.0.4 in 656.5 ms
 2019-12-19 12:56:22,082 INFO  [main ] c.h.m.i.d.r.RecordFileDownloader Downloaded 4 signatures in 658.6 ms (6/s)
 2019-12-19 12:56:22,104 WARN  [main ] c.h.m.i.d.r.RecordFileDownloader Signature verification failed for 2019-08-30T18_10_00.419072Z.rcd_sig: {MISSING=[0.0.5, 0.0.6], VERIFIED=[0.0.3, 0.0.4]}
 2019-12-19 12:56:22,379 INFO  [main ] c.h.m.i.d.r.RecordFileDownloader Downloaded 8 signatures in 98.44 ms (81/s)
 2019-12-19 12:56:22,380 ERROR [main ] c.h.m.i.u.Utility Unknown file delimiter 99 in signature file /var/folders/dy/3hk8nqg93nbctw8lchz_415r0000gn/T/junit17453419502341935375/recordstreams/record0.0.3/2019-08-30T18_10_00.419072Z.rcd_sig
 2019-12-19 12:56:22,380 ERROR [main ] c.h.m.i.u.Utility Unknown file delimiter 99 in signature file /var/folders/dy/3hk8nqg93nbctw8lchz_415r0000gn/T/junit17453419502341935375/recordstreams/record0.0.4/2019-08-30T18_10_00.419072Z.rcd_sig
 2019-12-19 12:56:22,381 ERROR [main ] c.h.m.i.u.Utility Unknown file delimiter 99 in signature file /var/folders/dy/3hk8nqg93nbctw8lchz_415r0000gn/T/junit17453419502341935375/recordstreams/record0.0.5/2019-08-30T18_10_00.419072Z.rcd_sig
 2019-12-19 12:56:22,381 ERROR [main ] c.h.m.i.u.Utility Unknown file delimiter 99 in signature file /var/folders/dy/3hk8nqg93nbctw8lchz_415r0000gn/T/junit17453419502341935375/recordstreams/record0.0.6/2019-08-30T18_10_00.419072Z.rcd_sig
 2019-12-19 12:56:22,381 WARN  [main ] c.h.m.i.d.r.RecordFileDownloader Signature verification failed for 2019-08-30T18_10_00.419072Z.rcd_sig: {DOWNLOADED=[0.0.3, 0.0.4, 0.0.5, 0.0.6], MISSING=[]}
 2019-12-19 12:56:23,066 INFO  [pool-34-thread-3] c.h.m.i.d.r.RecordFileDownloader Downloaded 2 signatures for node 0.0.5 in 29.32 ms
 2019-12-19 12:56:23,070 INFO  [pool-34-thread-2] c.h.m.i.d.r.RecordFileDownloader Downloaded 2 signatures for node 0.0.4 in 33.76 ms
 2019-12-19 12:56:23,072 INFO  [pool-34-thread-4] c.h.m.i.d.r.RecordFileDownloader Downloaded 2 signatures for node 0.0.3 in 35.78 ms
 2019-12-19 12:56:23,072 INFO  [main ] c.h.m.i.d.r.RecordFileDownloader Downloaded 6 signatures in 36.70 ms (163/s)
 2019-12-19 12:56:23,075 WARN  [main ] c.h.m.i.d.NodeSignatureVerifier Verified signature file 2019-08-30T18_10_00.419072Z.rcd_sig reached consensus but with some errors: {MISSING=[0.0.6], CONSENSUS_REACHED=[0.0.3, 0.0.4, 0.0.5]}
 2019-12-19 12:56:23,086 WARN  [main ] c.h.m.i.d.NodeSignatureVerifier Verified signature file 2019-08-30T18_10_05.249678Z.rcd_sig reached consensus but with some errors: {MISSING=[0.0.6], CONSENSUS_REACHED=[0.0.3, 0.0.4, 0.0.5]}

Checklist

  • Documentation added
  • Tests updated

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@steven-sheehy steven-sheehy added enhancement Type: New feature P1 ops Tasks relating to network operations labels Dec 17, 2019
@steven-sheehy steven-sheehy added this to the Mirror 0.5.0 milestone Dec 17, 2019
@steven-sheehy steven-sheehy self-assigned this Dec 17, 2019
@codecov
Copy link

codecov bot commented Dec 17, 2019

Codecov Report

Merging #452 into master will increase coverage by 0.52%.
The diff coverage is 81.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #452      +/-   ##
==========================================
+ Coverage   57.92%   58.45%   +0.52%     
==========================================
  Files          56       59       +3     
  Lines        2491     2520      +29     
  Branches      325      329       +4     
==========================================
+ Hits         1443     1473      +30     
+ Misses        903      902       -1     
  Partials      145      145
Impacted Files Coverage Δ
...or/importer/exception/InvalidDatasetException.java 0% <0%> (ø) ⬆️
...rter/exception/SignatureVerificationException.java 100% <100%> (ø)
.../java/com/hedera/mirror/importer/util/Utility.java 56.81% <100%> (ø) ⬆️
...a/mirror/importer/exception/ImporterException.java 33.33% <33.33%> (ø)
.../hedera/mirror/importer/downloader/Downloader.java 80.98% <85%> (-0.95%) ⬇️
...ror/importer/downloader/NodeSignatureVerifier.java 83.33% <87.8%> (+15.24%) ⬆️
...ra/mirror/importer/domain/FileStreamSignature.java 92.3% <92.3%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42982fa...b40e259. Read the comment docs.

Nana-EC
Nana-EC previously approved these changes Dec 17, 2019
Copy link
Collaborator

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (!greaterThanSuperMajorityNum(signatures.size(), nodeIDPubKeyMap.size())) {
Set<String> seenNodes = signatures.stream().map(SignatureStream::getNode).collect(Collectors.toSet());
Set<String> missingNodes = new TreeSet<>(Sets.difference(nodeIDPubKeyMap.keySet(), seenNodes));
String message = String.format("Signature verification failed for %s with %s of %s nodes missing: %s",
Copy link
Contributor

@apeksharma apeksharma Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but i got a feeling that this may give many false +ves. Sig files from different nodes will arrive over a window of time. If that window is >500ms (polling rate), we will hit this at least once every 5 sec and log unwanted nodes as 'possible defaulters'. Given Leemon's ask is to find nodes consistently defaulting on publishing good streams, am afraid this won't address that in any meaningful way.
Please try it running locally against mainnet bucket in steady state, and see what is logged. Hopefully the case mentioned above doesn't happen too often.

Either ways, we should move this logging to line 95, where it'll be much more useful since the candidates for 'defaulting nodes' would be much less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code, there is a similar log.warn("Signature file count does not exceed 2/3 of nodes"); in Downloader that is shown with the exact same logic. All I did was push this into the NodeSignatureVerifier class and add some additional detail to the log. There shouldn't be any change in log frequency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I moved it to line 95, then it wouldn't be logically correct as that is the success scenario and logging at just the success scenario would mean the unhappy path doesn't get logged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference from before is expectation.
Earlier, that line didn't matter, unless mirror had some issue and someone looked at the logs. Even then, that line being spammy was okay.
Now, we need a log to help identify issues in mainnet node stream publications.
We want to it to be more useful by just logging node ids that don't contribute to consensus. We don't want to log many node ids because only 2 nodes had uploaded sigs when polling happened. So doing it right after super-majority is best i could think of. Just info level would be okay too.

Doing it as part of super spammy log.warn("Singature file..... just isn't right.
https://console.cloud.google.com/logs/viewer?project=mirrornode&authuser=0&organizationId=130634591014&resource=gce_instance&minLogLevel=0&expandAll=false&filters=text:%22%20ERROR%20%22&timestamp=2019-12-19T16:28:26.000000000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2019-12-18T16:34:29.353Z&dateRangeEnd=2019-12-19T16:34:29.353Z&interval=P1D&advancedFilter=resource.type%3D%22gce_instance%22%0A%22%20Signature%20file%20count%20does%20%22%0Alabels.%22compute.googleapis.com%2Fresource_name%22%3D%22mirrornode-mainnet-1%22&scrollTimestamp=2019-12-19T16:32:23.000000000Z

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the rest of the change (well not the exception part, but i don't care about it much :)). However, this piece of logic here is what's actually addressing the issue we want to fix, rest is just cleanup. If we don't do this right, it beats the whole purpose.

@mike-burrage-hedera mike-burrage-hedera removed their request for review December 19, 2019 00:39
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@steven-sheehy steven-sheehy merged commit 4d76c3d into master Dec 19, 2019
@steven-sheehy steven-sheehy deleted the log-consensus branch December 19, 2019 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Type: New feature ops Tasks relating to network operations P1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log nodes that don't reach consensus
3 participants