-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of meter interaction #4857
Improve performance of meter interaction #4857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this pull request. It's a great idea for a performance gain for everyone with common usage scenarios and hopefully it will remove the need for bespoke caching when using Micrometer for particularly performance sensitive users.
That said, I'm a bit worried about the side effects of this as it currently is. The biggest thing we want to avoid is causing metrics to break in an unexpected way that is not caught until deployed to production.
Before this pull request, the issue with meters registered before a MeterFilter is configured is that they will not have that filter applied to them. If getOrCreate is called again, then the filter will be applied and a new meter will be created (assuming the filter alters the ID), and if a reference is not maintained to the first meter created (before the MeterFilter was configured), then the first meter will be essentially orphaned. Calls to removeByPreFilterId
will not remove the orphaned meters. Calls to remove
would still work if a reference to the meter or id of the "orphaned" meter was used.
I worry some users may inadvertently be relying on this behavior and it may not be an issue for them if they're not maintaining a reference to a meter registered before a MeterFilter is configured.
Giving a concrete, albeit contrived, example below:
@Test
void meterRegistrationBeforeAndAfterMeterFilterRegistration() {
Counter c1 = registry.counter("counter");
registry.config().meterFilter(MeterFilter.commonTags(Tags.of("common", "tag")));
Counter c2 = registry.counter("counter");
Timer t1 = registry.timer("timer");
registry.remove(c1);
registry.remove(c2);
}
Before this pull request, c1
and c2
will be different meters. c2
and t1
will have the common tags; c1
will not. Each call to remove
will remove the different meters.
After this pull request (as it currently is), c1
and c2
will be the same meter, without the common tags. This is what I worry will be surprising and breaking behavior that won't necessarily get caught until it is too late. The first remove call will remove the only counter registered.
I have some ideas I'd like to try to see if we can get to some compromise where the performance gains are there most of the time but we still allow for late MeterFilter configuration while warning about it, which I'll do in a separate pull request. Timeline wise, we should definitely get the warning log in the RC to start getting user feedback on scenarios that would be affected by this as soon as possible so we can hopefully get more confident about restricting when a MeterFilter can be configured. I would also like get this performance enhancement in as well, but we'll see where we end up on trying to retain the MeterFilter behavior more. I suspect MeterFilters shouldn't be configured so late or often even if they're after some meters have been registered.
System.arraycopy(filters, 0, newFilters, 0, filters.length); | ||
newFilters[filters.length] = filter; | ||
filters = newFilters; | ||
if (meterMap.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making it a no-op is too aggressive of a change. Users have little way of knowing this is wrong behavior now (other than noticing meters registered before don't have the filter applied). I think we should rather log a warning and gather feedback on scenarios in which users are running into this so they can be corrected before we switch to a no-op. It may not be the user's fault. It may be a framework or library doing things that they have no direct control over. I'll open a separate pull request to add the logging because I think that makes sense separate of the goal of this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the promised PR: #4917
Here's an implementation of an idea to maintain existing behavior while allowing this optimization in cases when a MeterFilter hasn't been configured since the meter was last registered. See the two tests added checking the existing behavior and the new behavior. I'll look more into concurrency and benchmarking tomorrow. |
I was away for a while and hence didn't get back on this one quickly. I will have a look at the implementations today. Meanwhile, one additional thing we might want to consider might be How solving this would affect a composite meter registry and its initial design choices. Left a comment here. |
I think we're ready to add the commits over here to this pull request. I've added concurrency tests and benchmarks. I'll share more info when I'm back at the computer later, but feel free to leave any feedback in the meantime. I'm hopeful we can get this into the 1.13.0 RC scheduled for Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for some more potential edge cases, but looks good except for one potential failing case.
At some point would also be good to understand what this would mean in terms of memory usage. Should be fairly low since most of things seem to be references, but you never know!!
@@ -780,6 +815,7 @@ public Config commonTags(String... tags) { | |||
public synchronized Config meterFilter(MeterFilter filter) { | |||
if (!meterMap.isEmpty()) { | |||
logWarningAboutLateFilter(); | |||
stalePreFilterIds.addAll(preFilterIdToMeterMap.keySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we synchronize this on meterMapLock
? Given that meterfilter updation is very very rare, this could potentially avoid any meter creation-related activities when filters are getting updated WDYT.
micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java
Outdated
Show resolved
Hide resolved
micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java
Show resolved
Hide resolved
micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java
Show resolved
Hide resolved
micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java
Show resolved
Hide resolved
Thank you for the additional feedback. I think we will have to consider remaining feedback post-RC1. It's very late for me and we need to get this release out. Feel free to share any other feedback you have. I just wanted to let you know why we're going forward before handling it. |
MeterFilters were always being applied to the given Id because the meterMap we keep only has the mappedId as a key. To avoid the cost of applying MeterFilters to a given Id, adds a map called preFilterIdToMeterMap. This makes it faster to return existing meters than before.
This makes a compromise between optimizing avoiding applying filters and maintaining existing behavior where it would make a difference. The tests demonstrate the behaviors we want.
Adds tests to check concurrent behavior and measure performance of registering meters. Updates the implementation of dealing with stale IDs to make it less prone to stale reads.
Adds test to ensure we are not leaking state when removing a meter marked stale and fixes the corresponding bug.
private final Map<Id, Meter> meterMap = new ConcurrentHashMap<>(); | ||
|
||
// writes guarded by meterMapLock, reads can be stale | ||
// VisibleForTesting | ||
final Map<Id, Meter> preFilterIdToMeterMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if writes are guarded by meterMapLock
, we can't assume it's safe to read from the map. I'm not talking about a potential stale read (which is fine), but the fact that the underlying state of the map could be corrupted and prevent from reading a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why that would happen. Do you know of a scenario where this could happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @tsegismont. I added jcstress tests with the intent of catching such kinds of issues. The current tests are probably lacking scenarios that may run into problems like you mentioned. Could you share something we could add to there to verify the issue and make sure it stays fixed in the future?
Sharing benchmark results from my M1 Macbook Pro: This PR:
With Micrometer 1.13.0-M2:
Of note, registering existing meters appears almost 6x faster and generates about half as much garbage. |
Retrospectively thinking about what sort of memory implications this will have when we have the below scenarios,
new MeterFilter() {
private final Set<String> observedTagValues = ConcurrentHashMap.newKeySet();
@Override
public Meter.Id map(Meter.Id id) {
String value = id.getTag("uri");
if (value != null) {
if (!observedTagValues.contains(value)) {
if (observedTagValues.size() >= 100) {
maskedTag = Tag.of("uri, "MASKED");
return id.withTag(maskedTag);
}
observedTagValues.add(value);
}
}
return id;
}
} The above meterfilter adds a dummy URI if the number of URIs exceeds a limit (in this case 100). Once 100 unique URI's, are added, this will map all the new ID to a single meter. But, after this fix the mapping will be the same but we will keep on adding the prefilter ID's to the map which will grow indefinitely. If filters return a NOOP meter we are fine since we don't store them anyway. But, in the above scenario we are altering the URI tag alone to make sure we don't lose the count of requests but still managing tags to a reasonable count. |
@lenin-jaganathan that does sound like a problem for unbounded cardinality tags if users are doing something like you showed. I'm not really sure what we can do about it. Would you open a separate issue to track it? Ideas welcome on solutions, too. |
Registering a MeterFilter will mark all existing meters as stale, which causes filters to be applied again next time the meter is registered. But the meter may not be changed by the newly registered filter. In this case, it was not being unmarked as stale, which prevented a shortcut for performance. See #4857 (comment)
Fixes #4856,
This PR introduces the following changes,
TODO: