-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for a reducer step that aggregates per-user metrics #76
Add support for a reducer step that aggregates per-user metrics #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @csgregorian , this is a great start! Besides the inline comments, I have two general comments:
- We should add a Combiner to reduce the data transfer volume between Mappers and Reducers.
- I'm wondering if it makes sense for the
WorkloadReducer
to be user-configurable. I'm thinking it may be better for it to be specified by the mapper, like how aWorkloadMapper
specifies what input format it expects. Since the output keys/values of the mapper must be exactly matched to the input keys/values of the reducer, I don't think it makes sense to configure them independently. Let me know what you think.
...ometer-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/WorkloadDriver.java
Outdated
Show resolved
Hide resolved
...ometer-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/WorkloadDriver.java
Outdated
Show resolved
Hide resolved
...meter-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/WorkloadReducer.java
Outdated
Show resolved
Hide resolved
...kload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditReplayReducer.java
Show resolved
Hide resolved
...rkload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditReplayThread.java
Outdated
Show resolved
Hide resolved
...rkload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditReplayThread.java
Outdated
Show resolved
Hide resolved
...ad/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditTextOutputFormat.java
Outdated
Show resolved
Hide resolved
...ad/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditTextOutputFormat.java
Outdated
Show resolved
Hide resolved
...workload/src/test/java/com/linkedin/dynamometer/workloadgenerator/TestWorkloadGenerator.java
Outdated
Show resolved
Hide resolved
...workload/src/test/java/com/linkedin/dynamometer/workloadgenerator/TestWorkloadGenerator.java
Outdated
Show resolved
Hide resolved
757d342
to
67873d7
Compare
9b61ead
to
dc3e754
Compare
dc3e754
to
75de589
Compare
Simplified a ton of stuff here.
Ready for more reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I love all of the changes. I have mostly minor comments throughout, and my only general comment would be that we should update TestDynamometerInfra
in some way. We're already asserting on the output within TestWorkloadGenerator
, but maybe in the infra test we can at least assert that the -workload_output_path
arg works as expected and puts the file in the right place?
...eter-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/CreateFileMapper.java
Outdated
Show resolved
Hide resolved
...kload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditReplayReducer.java
Show resolved
Hide resolved
...rkload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditReplayThread.java
Outdated
Show resolved
Hide resolved
...-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/UserCommandKey.java
Show resolved
Hide resolved
...kload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/AuditReplayReducer.java
Show resolved
Hide resolved
...-workload/src/main/java/com/linkedin/dynamometer/workloadgenerator/audit/UserCommandKey.java
Outdated
Show resolved
Hide resolved
...workload/src/test/java/com/linkedin/dynamometer/workloadgenerator/TestWorkloadGenerator.java
Outdated
Show resolved
Hide resolved
...workload/src/test/java/com/linkedin/dynamometer/workloadgenerator/TestWorkloadGenerator.java
Outdated
Show resolved
Hide resolved
...workload/src/test/java/com/linkedin/dynamometer/workloadgenerator/TestWorkloadGenerator.java
Outdated
Show resolved
Hide resolved
...workload/src/test/java/com/linkedin/dynamometer/workloadgenerator/TestWorkloadGenerator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor style nits and then it's all good to go!
The goal of this PR is to let Dynamometer collect and emit per-user metrics, as opposed to the whole-workload measurements that it currently collects using Counters. This is done by first changing the
AuditReplayMapper
to emit key-value pairs in the form of(username_type, latency)
wheretype
is eitherREAD
orWRITE
. Then, a reducer stepAuditReplayReducer
is added to sum latencies per key. More generally, this change allows Dynamometer to support arbitrary reducers for stats aggregation along with the existing mappers.I modified
TestWorkloadGenerator
to run with and without a reducer, and confirm that the file was created. This still needs a test to ensure that the actual contents of the output file are correct. As well, it's possible that I changed some things that didn't need to be changed in the process of getting the tests to pass/output to appear.Mostly trying to see if CI will pass, but ready for some preliminary reviews.
EDIT: a couple other things that I'm not sure are the right approach and need to be looked at a bit closer:
WorkloadMapper
to specifyKEYOUT
andVALUEOUT
as parameters is the right moveAuditReplayMapper
writes to the context without a reducer being specified? Shouldjob.setMapOutputKeyClass
be set toLongWritable.class
instead ofNullWritable.class
regardless of whether it's actually outputting anything? It doesn't seem like this caused any errors, but if not, why?WorkloadReducer
just to what it needs to run, but it should probably havegetDescription
andgetConfigDescriptions
as well.