Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #518: Add mloginfo --connstats for connection duration metrics #530

Merged
merged 6 commits into from
Oct 11, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 76 additions & 1 deletion mtools/mloginfo/sections/connection_section.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from base_section import BaseSection
from collections import defaultdict
from datetime import datetime, date, time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import doesn't appear to be needed. Not calling any date functions in the current PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import re

try:
Expand All @@ -19,6 +20,7 @@ def __init__(self, mloginfo):

# add --restarts flag to argparser
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this change, but the comment is incorrect so you could fix if you are adding extra args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.mloginfo.argparser_sectiongroup.add_argument('--connections', action='store_true', help='outputs information about opened and closed connections')
self.mloginfo.argparser_sectiongroup.add_argument('--stats', action='store_true', help='outputs helpful statistics for connection duration (min/max/avg)')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this more intuitive to use:
---stats is too generic given this option is specific to connections.

  • The --stats option doesn't do anything unless you also specify --connections.

Suggest you make a top level connection-stats section so the command line usage would be mloginfo --connection-stats ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done now mloginfo --connection --connstats logfile and mloginfo --connstats logfile have same meaning



@property
Expand All @@ -37,10 +39,28 @@ def run(self):

ip_opened = defaultdict(lambda: 0)
ip_closed = defaultdict(lambda: 0)

socket_exceptions = 0

genstats = self.mloginfo.args['stats']
if genstats:
genstats = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed (genstats is already True per the previous line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was earlier using self.mloginfo.args['stats'] directly in comparison, now genstats = True is redundant. Removed.

connections_start = defaultdict(lambda: 0)
ipwise_sum_durations = defaultdict(lambda:0)
ipwise_count = defaultdict(lambda:0)
ipwise_min_connection_duration = defaultdict(lambda:9999999999)
ipwise_max_connection_duration = defaultdict(lambda:-1)

min_connection_duration = 9999999999
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could declare min_connection_duration and max_connection_duration before ipwise_min and ipwise_max declaration to avoid repeating the same values. i.e. lambda: min_connection_duration

[Optional] replace defaultdict with normal dict to be explicit in your dictionary key declaration.

max_connection_duration = -1

sum_durations = 0
fullconn_counts = 0

for logevent in self.mloginfo.logfile:
line = logevent.line_str


pos = line.find('connection accepted')
if pos != -1:
# connection was opened, increase counter
Expand All @@ -51,6 +71,18 @@ def run(self):
ip, _ = tokens[3].split(':')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] Not directly related to this change, but we need to consider a case where there is no element entry in the array. Attempting to access an index of array out of bound will produce IndexError exception, which should be handled

ip_opened[ip] += 1

if genstats:
dt = logevent.datetime

#initialize using default constructor
ipwise_min_connection_duration[ip]
ipwise_max_connection_duration[ip]

connid = tokens[4].strip('#')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should perhaps move this into an else clause for the date comparison to avoid doing unnecessary work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also consider moving the parsing of connection ID to logevent since this could be useful elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed parsing of connection ID should be in logevent. The other code now removed as redundant with new changes.


if dt != None:
connections_start[connid] = dt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be checking if there is already a start datetime for the given connection ID?

If there are multiple start datetimes found for the same connection ID should that raise an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done will raise and exception now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will also raise exception if duplicate end connection ids

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why we need to raise an exception for this? I just encountered this exception on some 3.4.23 logs. Couldn't this just be a datetime precision issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, can this be raised when the same connection ID is reused during the same log period? The logs I have are for 24 hours but has a very high connection churn, so it's not impossible that this will happen on perfectly OK log files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, filtering to a 1 hour period worked around the exception. Maybe the exception message should state the connection number that repeats?


pos = line.find('end connection')
if pos != -1:
# connection was closed, increase counter
Expand All @@ -61,6 +93,38 @@ def run(self):
ip, _ = tokens[2].split(':')
ip_closed[ip] += 1

if genstats:
dt = logevent.datetime

#The connection id value is stored just before end connection -> [conn385] end connection
tokens_conn = line[:pos].split(' ')
end_connid = tokens_conn[3].strip('[|conn|]')

#Check if we have the starting of this connid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this needs some check of the validity of end_connid before trying to use as a dict key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if connections_start[end_connid]:
if dt != None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be included in the if statement above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dur = dt - connections_start[end_connid]
dur_in_sec = dur.seconds

if dur_in_sec < min_connection_duration:
min_connection_duration = dur_in_sec

if dur_in_sec > max_connection_duration:
max_connection_duration = dur_in_sec

if dur_in_sec < ipwise_min_connection_duration[ip]:
ipwise_min_connection_duration[ip] = dur_in_sec

if dur_in_sec > ipwise_max_connection_duration[ip]:
ipwise_max_connection_duration[ip] = dur_in_sec

sum_durations += dur.seconds
fullconn_counts += 1

ipwise_sum_durations[ip] += dur_in_sec
ipwise_count[ip] += 1


if "SocketException" in line:
socket_exceptions += 1

Expand All @@ -84,5 +148,16 @@ def run(self):
opened = ip_opened[ip] if ip in ip_opened else 0
closed = ip_closed[ip] if ip in ip_closed else 0

print "%-15s opened: %-8i closed: %-8i" % (ip, opened, closed)
if genstats:
covered_count = ipwise_count[ip] if ip in ipwise_count else 1
connection_duration_ip = ipwise_sum_durations[ip] if ip in ipwise_sum_durations else 0

print "%-15s opened: %-8i closed: %-8i dur-avg(s): %-8i dur-min(s): %-8i dur-max(s): %-8i" % (ip, opened, closed, connection_duration_ip/covered_count, ipwise_min_connection_duration[ip], ipwise_max_connection_duration[ip])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to print them out a stat per line to be consistent with the other outputs of the module. See line 141.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is as per existing format, the stats are printed against their respective ip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point added overall stats to stat per line format

else:
print "%-15s opened: %-8i closed: %-8i" % (ip, opened, closed)

print

if genstats and fullconn_counts > 0:
print "Average connection duration across all IPs %d seconds, Minimum duration %d , Maximum duration %d" % (sum_durations/fullconn_counts, min_connection_duration, max_connection_duration)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the output style of other mloginfo sections, I would lowercase Minimum and Maximum.

I think the order of output here should also match the info above (avg/min/max) and be in a similar format for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


27 changes: 27 additions & 0 deletions mtools/test/logfiles/mongod_3_4_2.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] MongoDB starting : pid=10686 port=27017 dbpath=/Users/nishant/testcase/data 64-bit host=Nishants-MacBook-Pro.local
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this log is intended for testing the connection feature, the naming should reflect that. Maybe mongod_3_4-2_connection-stats.log?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness can you generate and include test logs for 3.0 and 3.2? I would also suggest latest 3.4.x rather than 3.4.2.

2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] db version v3.4.2
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] OpenSSL version: OpenSSL 0.9.8zh 14 Jan 2016
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] allocator: system
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] modules: enterprise
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] build environment:
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] distarch: x86_64
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] target_arch: x86_64
2017-10-09T12:44:35.031+0530 I CONTROL [initandlisten] options: { processManagement: { fork: true }, storage: { dbPath: "data" }, systemLog: { destination: "file", path: "data/log.file" } }
2017-10-09T12:44:35.031+0530 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7680M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten]
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten]
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten]
2017-10-09T12:44:35.621+0530 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
2017-10-09T12:44:35.796+0530 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/Users/nishant/testcase/data/diagnostic.data'
2017-10-09T12:44:35.982+0530 I INDEX [initandlisten] build index on: admin.system.version properties: { v: 2, key: { version: 1 }, name: "incompatible_with_version_32", ns: "admin.system.version" }
2017-10-09T12:44:35.982+0530 I INDEX [initandlisten] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2017-10-09T12:44:36.005+0530 I INDEX [initandlisten] build index done. scanned 0 total records. 0 secs
2017-10-09T12:44:36.005+0530 I COMMAND [initandlisten] setting featureCompatibilityVersion to 3.4
2017-10-09T12:44:36.005+0530 I NETWORK [thread1] waiting for connections on port 27017
2017-10-09T12:44:40.143+0530 I NETWORK [thread1] connection accepted from 127.0.0.1:54104 #1 (1 connection now open)
2017-10-09T12:44:43.673+0530 I - [conn1] end connection 127.0.0.1:54104 (1 connection now open)
2017-10-09T14:05:39.829+0530 I NETWORK [thread1] connection accepted from 127.0.0.1:55065 #2 (1 connection now open)
2017-10-09T14:05:55.918+0530 I - [conn2] end connection 127.0.0.1:55065 (1 connection now open)