-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #518: Add mloginfo --connstats for connection duration metrics #530
Changes from 3 commits
d3bf6b0
c89df5a
e67043e
8861223
3e6e14c
6410659
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
from base_section import BaseSection | ||
from collections import defaultdict | ||
from datetime import datetime, date, time | ||
import re | ||
|
||
try: | ||
|
@@ -19,6 +20,7 @@ def __init__(self, mloginfo): | |
|
||
# add --restarts flag to argparser | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not directly related to this change, but the comment is incorrect so you could fix if you are adding extra args There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
self.mloginfo.argparser_sectiongroup.add_argument('--connections', action='store_true', help='outputs information about opened and closed connections') | ||
self.mloginfo.argparser_sectiongroup.add_argument('--stats', action='store_true', help='outputs helpful statistics for connection duration (min/max/avg)') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can make this more intuitive to use:
Suggest you make a top level connection-stats section so the command line usage would be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done now |
||
|
||
|
||
@property | ||
|
@@ -37,10 +39,28 @@ def run(self): | |
|
||
ip_opened = defaultdict(lambda: 0) | ||
ip_closed = defaultdict(lambda: 0) | ||
|
||
socket_exceptions = 0 | ||
|
||
genstats = self.mloginfo.args['stats'] | ||
if genstats: | ||
genstats = True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not needed ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was earlier using |
||
connections_start = defaultdict(lambda: 0) | ||
ipwise_sum_durations = defaultdict(lambda:0) | ||
ipwise_count = defaultdict(lambda:0) | ||
ipwise_min_connection_duration = defaultdict(lambda:9999999999) | ||
ipwise_max_connection_duration = defaultdict(lambda:-1) | ||
|
||
min_connection_duration = 9999999999 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could declare [Optional] replace defaultdict with normal dict to be explicit in your dictionary key declaration. |
||
max_connection_duration = -1 | ||
|
||
sum_durations = 0 | ||
fullconn_counts = 0 | ||
|
||
for logevent in self.mloginfo.logfile: | ||
line = logevent.line_str | ||
|
||
|
||
pos = line.find('connection accepted') | ||
if pos != -1: | ||
# connection was opened, increase counter | ||
|
@@ -51,6 +71,18 @@ def run(self): | |
ip, _ = tokens[3].split(':') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Optional] Not directly related to this change, but we need to consider a case where there is no element entry in the array. Attempting to access an index of array out of bound will produce |
||
ip_opened[ip] += 1 | ||
|
||
if genstats: | ||
dt = logevent.datetime | ||
|
||
#initialize using default constructor | ||
ipwise_min_connection_duration[ip] | ||
ipwise_max_connection_duration[ip] | ||
|
||
connid = tokens[4].strip('#') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should perhaps move this into an else clause for the date comparison to avoid doing unnecessary work There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also consider moving the parsing of connection ID to logevent since this could be useful elsewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed parsing of connection ID should be in logevent. The other code now removed as redundant with new changes. |
||
|
||
if dt != None: | ||
connections_start[connid] = dt | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be checking if there is already a start datetime for the given connection ID? If there are multiple start datetimes found for the same connection ID should that raise an exception? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done will raise and exception now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will also raise exception if duplicate end connection ids There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to understand why we need to raise an exception for this? I just encountered this exception on some 3.4.23 logs. Couldn't this just be a datetime precision issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, can this be raised when the same connection ID is reused during the same log period? The logs I have are for 24 hours but has a very high connection churn, so it's not impossible that this will happen on perfectly OK log files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Btw, filtering to a 1 hour period worked around the exception. Maybe the exception message should state the connection number that repeats? |
||
|
||
pos = line.find('end connection') | ||
if pos != -1: | ||
# connection was closed, increase counter | ||
|
@@ -61,6 +93,38 @@ def run(self): | |
ip, _ = tokens[2].split(':') | ||
ip_closed[ip] += 1 | ||
|
||
if genstats: | ||
dt = logevent.datetime | ||
|
||
#The connection id value is stored just before end connection -> [conn385] end connection | ||
tokens_conn = line[:pos].split(' ') | ||
end_connid = tokens_conn[3].strip('[|conn|]') | ||
|
||
#Check if we have the starting of this connid | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Think this needs some check of the validity of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
if connections_start[end_connid]: | ||
if dt != None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could be included in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
dur = dt - connections_start[end_connid] | ||
dur_in_sec = dur.seconds | ||
|
||
if dur_in_sec < min_connection_duration: | ||
min_connection_duration = dur_in_sec | ||
|
||
if dur_in_sec > max_connection_duration: | ||
max_connection_duration = dur_in_sec | ||
|
||
if dur_in_sec < ipwise_min_connection_duration[ip]: | ||
ipwise_min_connection_duration[ip] = dur_in_sec | ||
|
||
if dur_in_sec > ipwise_max_connection_duration[ip]: | ||
ipwise_max_connection_duration[ip] = dur_in_sec | ||
|
||
sum_durations += dur.seconds | ||
fullconn_counts += 1 | ||
|
||
ipwise_sum_durations[ip] += dur_in_sec | ||
ipwise_count[ip] += 1 | ||
|
||
|
||
if "SocketException" in line: | ||
socket_exceptions += 1 | ||
|
||
|
@@ -84,5 +148,16 @@ def run(self): | |
opened = ip_opened[ip] if ip in ip_opened else 0 | ||
closed = ip_closed[ip] if ip in ip_closed else 0 | ||
|
||
print "%-15s opened: %-8i closed: %-8i" % (ip, opened, closed) | ||
if genstats: | ||
covered_count = ipwise_count[ip] if ip in ipwise_count else 1 | ||
connection_duration_ip = ipwise_sum_durations[ip] if ip in ipwise_sum_durations else 0 | ||
|
||
print "%-15s opened: %-8i closed: %-8i dur-avg(s): %-8i dur-min(s): %-8i dur-max(s): %-8i" % (ip, opened, closed, connection_duration_ip/covered_count, ipwise_min_connection_duration[ip], ipwise_max_connection_duration[ip]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you be able to print them out a stat per line to be consistent with the other outputs of the module. See line 141. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is as per existing format, the stats are printed against their respective ip There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I got your point added overall stats to stat per line format |
||
else: | ||
print "%-15s opened: %-8i closed: %-8i" % (ip, opened, closed) | ||
|
||
|
||
if genstats and fullconn_counts > 0: | ||
print "Average connection duration across all IPs %d seconds, Minimum duration %d , Maximum duration %d" % (sum_durations/fullconn_counts, min_connection_duration, max_connection_duration) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To match the output style of other I think the order of output here should also match the info above (avg/min/max) and be in a similar format for consistency. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] MongoDB starting : pid=10686 port=27017 dbpath=/Users/nishant/testcase/data 64-bit host=Nishants-MacBook-Pro.local | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this log is intended for testing the connection feature, the naming should reflect that. Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For completeness can you generate and include test logs for 3.0 and 3.2? I would also suggest latest 3.4.x rather than 3.4.2. |
||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] db version v3.4.2 | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] OpenSSL version: OpenSSL 0.9.8zh 14 Jan 2016 | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] allocator: system | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] modules: enterprise | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] build environment: | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] distarch: x86_64 | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] target_arch: x86_64 | ||
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] options: { processManagement: { fork: true }, storage: { dbPath: "data" }, systemLog: { destination: "file", path: "data/log.file" } } | ||
2017-10-09T12:44:35.031+0530 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7680M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), | ||
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] | ||
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database. | ||
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted. | ||
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] | ||
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] | ||
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000 | ||
2017-10-09T12:44:35.796+0530 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/Users/nishant/testcase/data/diagnostic.data' | ||
2017-10-09T12:44:35.982+0530 I INDEX [initandlisten] build index on: admin.system.version properties: { v: 2, key: { version: 1 }, name: "incompatible_with_version_32", ns: "admin.system.version" } | ||
2017-10-09T12:44:35.982+0530 I INDEX [initandlisten] building index using bulk method; build may temporarily use up to 500 megabytes of RAM | ||
2017-10-09T12:44:36.005+0530 I INDEX [initandlisten] build index done. scanned 0 total records. 0 secs | ||
2017-10-09T12:44:36.005+0530 I COMMAND [initandlisten] setting featureCompatibilityVersion to 3.4 | ||
2017-10-09T12:44:36.005+0530 I NETWORK [thread1] waiting for connections on port 27017 | ||
2017-10-09T12:44:40.143+0530 I NETWORK [thread1] connection accepted from 127.0.0.1:54104 #1 (1 connection now open) | ||
2017-10-09T12:44:43.673+0530 I - [conn1] end connection 127.0.0.1:54104 (1 connection now open) | ||
2017-10-09T14:05:39.829+0530 I NETWORK [thread1] connection accepted from 127.0.0.1:55065 #2 (1 connection now open) | ||
2017-10-09T14:05:55.918+0530 I - [conn2] end connection 127.0.0.1:55065 (1 connection now open) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import doesn't appear to be needed. Not calling any date functions in the current PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done