-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #518: Add mloginfo --connstats for connection duration metrics #530
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from general suggestions from review, please add test coverage in mtools/test/test_mloginfo.py
@@ -19,6 +20,7 @@ def __init__(self, mloginfo): | |||
|
|||
# add --restarts flag to argparser | |||
self.mloginfo.argparser_sectiongroup.add_argument('--connections', action='store_true', help='outputs information about opened and closed connections') | |||
self.mloginfo.argparser_sectiongroup.add_argument('--stats', action='store_true', help='outputs helpful statistics for connection duration (min/max/avg)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can make this more intuitive to use:
---stats
is too generic given this option is specific to connections.
- The
--stats
option doesn't do anything unless you also specify--connections
.
Suggest you make a top level connection-stats section so the command line usage would be mloginfo --connection-stats ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done now mloginfo --connection --connstats logfile
and mloginfo --connstats logfile
have same meaning
@@ -19,6 +20,7 @@ def __init__(self, mloginfo): | |||
|
|||
# add --restarts flag to argparser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this change, but the comment is incorrect so you could fix if you are adding extra args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,27 @@ | |||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] MongoDB starting : pid=10686 port=27017 dbpath=/Users/nishant/testcase/data 64-bit host=Nishants-MacBook-Pro.local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this log is intended for testing the connection feature, the naming should reflect that. Maybe mongod_3_4-2_connection-stats.log
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness can you generate and include test logs for 3.0 and 3.2? I would also suggest latest 3.4.x rather than 3.4.2.
@@ -1,5 +1,6 @@ | |||
from base_section import BaseSection | |||
from collections import defaultdict | |||
from datetime import datetime, date, time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import doesn't appear to be needed. Not calling any date functions in the current PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
socket_exceptions = 0 | ||
|
||
genstats = self.mloginfo.args['stats'] | ||
if genstats: | ||
genstats = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed (genstats
is already True per the previous line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was earlier using self.mloginfo.args['stats']
directly in comparison, now genstats = True
is redundant. Removed.
ipwise_min_connection_duration[ip] | ||
ipwise_max_connection_duration[ip] | ||
|
||
connid = tokens[4].strip('#') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should perhaps move this into an else clause for the date comparison to avoid doing unnecessary work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also consider moving the parsing of connection ID to logevent since this could be useful elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed parsing of connection ID should be in logevent. The other code now removed as redundant with new changes.
connid = tokens[4].strip('#') | ||
|
||
if dt != None: | ||
connections_start[connid] = dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be checking if there is already a start datetime for the given connection ID?
If there are multiple start datetimes found for the same connection ID should that raise an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done will raise and exception now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will also raise exception if duplicate end connection ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand why we need to raise an exception for this? I just encountered this exception on some 3.4.23 logs. Couldn't this just be a datetime precision issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, can this be raised when the same connection ID is reused during the same log period? The logs I have are for 24 hours but has a very high connection churn, so it's not impossible that this will happen on perfectly OK log files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, filtering to a 1 hour period worked around the exception. Maybe the exception message should state the connection number that repeats?
|
||
if genstats and fullconn_counts > 0: | ||
print "Average connection duration across all IPs %d seconds, Minimum duration %d , Maximum duration %d" % (sum_durations/fullconn_counts, min_connection_duration, max_connection_duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the output style of other mloginfo
sections, I would lowercase Minimum and Maximum.
I think the order of output here should also match the info above (avg/min/max) and be in a similar format for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tokens_conn = line[:pos].split(' ') | ||
end_connid = tokens_conn[3].strip('[|conn|]') | ||
|
||
#Check if we have the starting of this connid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this needs some check of the validity of end_connid
before trying to use as a dict key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
#Check if we have the starting of this connid | ||
if connections_start[end_connid]: | ||
if dt != None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be included in the if
statement above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this issue to improve connections section. Most of changes here just need some tweaks. Good work.
ipwise_min_connection_duration = defaultdict(lambda:9999999999) | ||
ipwise_max_connection_duration = defaultdict(lambda:-1) | ||
|
||
min_connection_duration = 9999999999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could declare min_connection_duration
and max_connection_duration
before ipwise_min
and ipwise_max
declaration to avoid repeating the same values. i.e. lambda: min_connection_duration
[Optional] replace defaultdict with normal dict to be explicit in your dictionary key declaration.
@@ -51,6 +71,18 @@ def run(self): | |||
ip, _ = tokens[3].split(':') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Optional] Not directly related to this change, but we need to consider a case where there is no element entry in the array. Attempting to access an index of array out of bound will produce IndexError
exception, which should be handled
covered_count = ipwise_count[ip] if ip in ipwise_count else 1 | ||
connection_duration_ip = ipwise_sum_durations[ip] if ip in ipwise_sum_durations else 0 | ||
|
||
print "%-15s opened: %-8i closed: %-8i dur-avg(s): %-8i dur-min(s): %-8i dur-max(s): %-8i" % (ip, opened, closed, connection_duration_ip/covered_count, ipwise_min_connection_duration[ip], ipwise_max_connection_duration[ip]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to print them out a stat per line to be consistent with the other outputs of the module. See line 141.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is as per existing format, the stats are printed against their respective ip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your point added overall stats to stat per line format
…ing improvements: - Replace magic numbers with symbolic Constants - Fixed a bug where magic numbers were returned if the related ip does not cover all the ips - Raise NotImplemented Error if Multiple start/end datetimes were found for the same connection ID - Check for connection id is a digit - Add more performant/robust re pattern search on connid - Add nosetests for --connstats - Added folder connstats to mtools/test/logfiles with more logfiles specific to connstats testing ToDO: - Write nosetests for connstats using logfiles in mtools/test/logfiles/connstats folder
Looks like initial feedback has been addressed aside from adding nosetests.
Added nosetests and minor fixes. |
Description of changes
Added option --stats to --connections option. It outputs helpful statistics for connection duration (min/max/avg).
Testing
Usual nosetests, test against different log files.
O/S testing: