Improve performance of meter interaction #4857

lenin-jaganathan · 2024-03-15T19:49:13Z

This PR introduces the following changes,

Keeps a map to do the look-up between 'Id` without meter filters to the actual meter. When create is called on a meter, we first check if a mapping exists and immediately return the mapped meter from the actual map that holds the meters.
When meter filters are added to the registry, the meters registered until this point are maintained in the set and when the create operation happens on these meters, we re-compute the mapping similarly to keep the existing behavior.

TODO:

Add additional tests to MeterRegistryTest

shakuzen

Thank you very much for this pull request. It's a great idea for a performance gain for everyone with common usage scenarios and hopefully it will remove the need for bespoke caching when using Micrometer for particularly performance sensitive users.

That said, I'm a bit worried about the side effects of this as it currently is. The biggest thing we want to avoid is causing metrics to break in an unexpected way that is not caught until deployed to production.

Before this pull request, the issue with meters registered before a MeterFilter is configured is that they will not have that filter applied to them. If getOrCreate is called again, then the filter will be applied and a new meter will be created (assuming the filter alters the ID), and if a reference is not maintained to the first meter created (before the MeterFilter was configured), then the first meter will be essentially orphaned. Calls to removeByPreFilterId will not remove the orphaned meters. Calls to remove would still work if a reference to the meter or id of the "orphaned" meter was used.

I worry some users may inadvertently be relying on this behavior and it may not be an issue for them if they're not maintaining a reference to a meter registered before a MeterFilter is configured.

Giving a concrete, albeit contrived, example below:

@Test
void meterRegistrationBeforeAndAfterMeterFilterRegistration() {
    Counter c1 = registry.counter("counter");
    registry.config().meterFilter(MeterFilter.commonTags(Tags.of("common", "tag")));
    Counter c2 = registry.counter("counter");
    
    Timer t1 = registry.timer("timer");
    
    registry.remove(c1);
    registry.remove(c2);
}

Before this pull request, c1 and c2 will be different meters. c2 and t1 will have the common tags; c1 will not. Each call to remove will remove the different meters.

After this pull request (as it currently is), c1 and c2 will be the same meter, without the common tags. This is what I worry will be surprising and breaking behavior that won't necessarily get caught until it is too late. The first remove call will remove the only counter registered.

I have some ideas I'd like to try to see if we can get to some compromise where the performance gains are there most of the time but we still allow for late MeterFilter configuration while warning about it, which I'll do in a separate pull request. Timeline wise, we should definitely get the warning log in the RC to start getting user feedback on scenarios that would be affected by this as soon as possible so we can hopefully get more confident about restricting when a MeterFilter can be configured. I would also like get this performance enhancement in as well, but we'll see where we end up on trying to retain the MeterFilter behavior more. I suspect MeterFilters shouldn't be configured so late or often even if they're after some meters have been registered.

shakuzen · 2024-03-28T09:52:54Z

micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java

-            System.arraycopy(filters, 0, newFilters, 0, filters.length);
-            newFilters[filters.length] = filter;
-            filters = newFilters;
+            if (meterMap.isEmpty()) {


I think making it a no-op is too aggressive of a change. Users have little way of knowing this is wrong behavior now (other than noticing meters registered before don't have the filter applied). I think we should rather log a warning and gather feedback on scenarios in which users are running into this so they can be corrected before we switch to a no-op. It may not be the user's fault. It may be a framework or library doing things that they have no direct control over. I'll open a separate pull request to add the logging because I think that makes sense separate of the goal of this pull request.

Here's the promised PR: #4917

shakuzen · 2024-04-03T12:58:40Z

Here's an implementation of an idea to maintain existing behavior while allowing this optimization in cases when a MeterFilter hasn't been configured since the meter was last registered. See the two tests added checking the existing behavior and the new behavior. I'll look more into concurrency and benchmarking tomorrow.

lenin-jaganathan · 2024-04-04T06:03:26Z

I was away for a while and hence didn't get back on this one quickly. I will have a look at the implementations today. Meanwhile, one additional thing we might want to consider might be How solving this would affect a composite meter registry and its initial design choices. Left a comment here.

shakuzen · 2024-04-05T09:39:30Z

I think we're ready to add the commits over here to this pull request. I've added concurrency tests and benchmarks. I'll share more info when I'm back at the computer later, but feel free to leave any feedback in the meantime. I'm hopeful we can get this into the 1.13.0 RC scheduled for Monday.

lenin-jaganathan

Looking for some more potential edge cases, but looks good except for one potential failing case.

At some point would also be good to understand what this would mean in terms of memory usage. Should be fairly low since most of things seem to be references, but you never know!!

lenin-jaganathan · 2024-04-08T07:00:53Z

micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java

@@ -780,6 +815,7 @@ public Config commonTags(String... tags) {
        public synchronized Config meterFilter(MeterFilter filter) {
            if (!meterMap.isEmpty()) {
                logWarningAboutLateFilter();
+                stalePreFilterIds.addAll(preFilterIdToMeterMap.keySet());


Should we synchronize this on meterMapLock? Given that meterfilter updation is very very rare, this could potentially avoid any meter creation-related activities when filters are getting updated WDYT.

micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java

shakuzen · 2024-04-08T15:50:55Z

Thank you for the additional feedback. I think we will have to consider remaining feedback post-RC1. It's very late for me and we need to get this release out. Feel free to share any other feedback you have. I just wanted to let you know why we're going forward before handling it.

MeterFilters were always being applied to the given Id because the meterMap we keep only has the mappedId as a key. To avoid the cost of applying MeterFilters to a given Id, adds a map called preFilterIdToMeterMap. This makes it faster to return existing meters than before.

This makes a compromise between optimizing avoiding applying filters and maintaining existing behavior where it would make a difference. The tests demonstrate the behaviors we want.

Adds tests to check concurrent behavior and measure performance of registering meters. Updates the implementation of dealing with stale IDs to make it less prone to stale reads.

Adds test to ensure we are not leaking state when removing a meter marked stale and fixes the corresponding bug.

tsegismont · 2024-04-08T16:20:12Z

micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java

    private final Map<Id, Meter> meterMap = new ConcurrentHashMap<>();

+    // writes guarded by meterMapLock, reads can be stale
+    // VisibleForTesting
+    final Map<Id, Meter> preFilterIdToMeterMap = new HashMap<>();


Even if writes are guarded by meterMapLock, we can't assume it's safe to read from the map. I'm not talking about a potential stale read (which is fine), but the fact that the underlying state of the map could be corrupted and prevent from reading a value.

I am not sure why that would happen. Do you know of a scenario where this could happen?

Thanks for the review @tsegismont. I added jcstress tests with the intent of catching such kinds of issues. The current tests are probably lacking scenarios that may run into problems like you mentioned. Could you share something we could add to there to verify the issue and make sure it stays fixed in the future?

shakuzen · 2024-04-08T16:37:43Z

Sharing benchmark results from my M1 Macbook Pro:

This PR:

Benchmark                                                                   Mode  Cnt      Score    Error   Units
MeterRegistrationBenchmark.registerExisting                                 avgt    5      0.021 ±  0.001   us/op
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate                   avgt    5  12807.403 ± 53.471  MB/sec
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate.norm              avgt    5    144.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExisting:gc.count                        avgt    5   1206.000           counts
MeterRegistrationBenchmark.registerExisting:gc.time                         avgt    5    625.000               ms
MeterRegistrationBenchmark.registerExistingWithProvider                     avgt    5      0.024 ±  0.001   us/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate       avgt    5  11486.191 ± 48.498  MB/sec
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate.norm  avgt    5    144.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.count            avgt    5   1081.000           counts
MeterRegistrationBenchmark.registerExistingWithProvider:gc.time             avgt    5    555.000               ms
MeterRegistrationBenchmark.registerNew                                        ss  400      7.839 ±  0.688   us/op
MeterRegistrationBenchmark.registerNew:gc.alloc.rate                          ss  400     90.960 ±  5.493  MB/sec
MeterRegistrationBenchmark.registerNew:gc.alloc.rate.norm                     ss  400   4059.800 ± 61.780    B/op
MeterRegistrationBenchmark.registerStale                                      ss  400      6.026 ±  0.908   us/op
MeterRegistrationBenchmark.registerStale:gc.alloc.rate                        ss  400     67.791 ±  4.128  MB/sec
MeterRegistrationBenchmark.registerStale:gc.alloc.rate.norm                   ss  400   4023.130 ± 61.860    B/op

With Micrometer 1.13.0-M2:

Benchmark                                                                   Mode  Cnt     Score    Error   Units
MeterRegistrationBenchmark.registerExisting                                 avgt    5     0.123 ±  0.001   us/op
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate                   avgt    5  4211.074 ± 47.241  MB/sec
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate.norm              avgt    5   272.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExisting:gc.count                        avgt    5   704.000           counts
MeterRegistrationBenchmark.registerExisting:gc.time                         avgt    5   338.000               ms
MeterRegistrationBenchmark.registerExistingWithProvider                     avgt    5     0.101 ±  0.001   us/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate       avgt    5  4083.215 ± 36.611  MB/sec
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate.norm  avgt    5   216.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.count            avgt    5   682.000           counts
MeterRegistrationBenchmark.registerExistingWithProvider:gc.time             avgt    5   322.000               ms
MeterRegistrationBenchmark.registerNew                                        ss  400    10.195 ±  1.950   us/op
MeterRegistrationBenchmark.registerNew:gc.alloc.rate                          ss  400    92.470 ±  6.339  MB/sec
MeterRegistrationBenchmark.registerNew:gc.alloc.rate.norm                     ss  400  4256.750 ± 61.338    B/op
MeterRegistrationBenchmark.registerStale                                      ss  400     6.338 ±  0.849   us/op
MeterRegistrationBenchmark.registerStale:gc.alloc.rate                        ss  400    90.097 ±  5.519  MB/sec
MeterRegistrationBenchmark.registerStale:gc.alloc.rate.norm                   ss  400  4017.450 ± 61.371    B/op

Of note, registering existing meters appears almost 6x faster and generates about half as much garbage.

lenin-jaganathan · 2024-04-10T06:29:48Z

Retrospectively thinking about what sort of memory implications this will have when we have the below scenarios,

A meter with dynamicTagValues like uri with userId and meter-filter imposes some limit?
E.g:

new MeterFilter() {
    private final Set<String> observedTagValues = ConcurrentHashMap.newKeySet();

    @Override
    public Meter.Id map(Meter.Id id) {
        String value = id.getTag("uri");
        if (value != null) {
            if (!observedTagValues.contains(value)) {
                if (observedTagValues.size() >= 100) {
                    maskedTag = Tag.of("uri, "MASKED");
                    return id.withTag(maskedTag);
                }
                observedTagValues.add(value);
            }
        }
        return id;
    }
}

The above meterfilter adds a dummy URI if the number of URIs exceeds a limit (in this case 100). Once 100 unique URI's, are added, this will map all the new ID to a single meter. But, after this fix the mapping will be the same but we will keep on adding the prefilter ID's to the map which will grow indefinitely.

If filters return a NOOP meter we are fine since we don't store them anyway. But, in the above scenario we are altering the URI tag alone to make sure we don't lose the count of requests but still managing tags to a reasonable count.

shakuzen · 2024-04-17T03:25:59Z

@lenin-jaganathan that does sound like a problem for unbounded cardinality tags if users are doing something like you showed. I'm not really sure what we can do about it. Would you open a separate issue to track it? Ideas welcome on solutions, too.

Registering a MeterFilter will mark all existing meters as stale, which causes filters to be applied again next time the meter is registered. But the meter may not be changed by the newly registered filter. In this case, it was not being unmarked as stale, which prevented a shortcut for performance. See #4857 (comment)

lenin-jaganathan mentioned this pull request Mar 15, 2024

Do not re-apply MeterFilters to IDs when registering if unnecessary #4856

Closed

shakuzen reviewed Mar 28, 2024

View reviewed changes

tsegismont mentioned this pull request Apr 4, 2024

Add a meter filters changed callback to MeterRegistry #3461

Closed

shakuzen force-pushed the meter_cache branch from b355113 to c95c08f Compare April 8, 2024 04:58

lenin-jaganathan commented Apr 8, 2024

View reviewed changes

shakuzen force-pushed the meter_cache branch from c95c08f to 858b9e7 Compare April 8, 2024 09:10

lenin-jaganathan commented Apr 8, 2024

View reviewed changes

micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java Show resolved Hide resolved

lenin-jaganathan commented Apr 8, 2024

View reviewed changes

micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java Show resolved Hide resolved

lenin-jaganathan and others added 4 commits April 9, 2024 01:03

Track meters made stale by a MeterFilter registration

778f400

This makes a compromise between optimizing avoiding applying filters and maintaining existing behavior where it would make a difference. The tests demonstrate the behaviors we want.

Concurrency and benchmark tests

30eae39

Adds tests to check concurrent behavior and measure performance of registering meters. Updates the implementation of dealing with stale IDs to make it less prone to stale reads.

Fix removing stale meter bug

8a2db24

Adds test to ensure we are not leaking state when removing a meter marked stale and fixes the corresponding bug.

shakuzen force-pushed the meter_cache branch from 1765726 to 8a2db24 Compare April 8, 2024 16:03

tsegismont reviewed Apr 8, 2024

View reviewed changes

shakuzen merged commit fd497c2 into micrometer-metrics:main Apr 8, 2024
6 checks passed

shakuzen mentioned this pull request Apr 18, 2024

Memory leak with high cardinality tag and MeterFilter meant to cap cardinality #4971

Closed

NiFNi mentioned this pull request Jun 27, 2024

Dynamic MeterFilter results in wrong Meter being reused #5233

Closed

lenin-jaganathan deleted the meter_cache branch July 1, 2024 08:50

This comment was marked as spam.

Sign in to view

izeye added a commit to izeye/micrometer that referenced this pull request Aug 5, 2024

Polish micrometer-metricsgh-4857

c9ec4bc

izeye mentioned this pull request Aug 5, 2024

Polish gh-4857 #5368

Merged

shakuzen pushed a commit that referenced this pull request Aug 6, 2024

Polish gh-4857 (#5368)

43a4b97

joshb1050 mentioned this pull request Sep 5, 2024

Performance regression in MeterRegistry#remove with many meters #5466

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of meter interaction #4857

Improve performance of meter interaction #4857

lenin-jaganathan commented Mar 15, 2024 •

edited

Loading

shakuzen left a comment •

edited

Loading

shakuzen Mar 28, 2024

shakuzen Apr 2, 2024

shakuzen commented Apr 3, 2024

lenin-jaganathan commented Apr 4, 2024

shakuzen commented Apr 5, 2024 •

edited

Loading

lenin-jaganathan left a comment

lenin-jaganathan Apr 8, 2024

shakuzen commented Apr 8, 2024

tsegismont Apr 8, 2024

lenin-jaganathan Apr 8, 2024

shakuzen Apr 8, 2024

shakuzen commented Apr 8, 2024 •

edited

Loading

lenin-jaganathan commented Apr 10, 2024

shakuzen commented Apr 17, 2024

This comment was marked as spam.

This comment was marked as spam.

Improve performance of meter interaction #4857

Improve performance of meter interaction #4857

Conversation

lenin-jaganathan commented Mar 15, 2024 • edited Loading

shakuzen left a comment • edited Loading

Choose a reason for hiding this comment

shakuzen Mar 28, 2024

Choose a reason for hiding this comment

shakuzen Apr 2, 2024

Choose a reason for hiding this comment

shakuzen commented Apr 3, 2024

lenin-jaganathan commented Apr 4, 2024

shakuzen commented Apr 5, 2024 • edited Loading

lenin-jaganathan left a comment

Choose a reason for hiding this comment

lenin-jaganathan Apr 8, 2024

Choose a reason for hiding this comment

shakuzen commented Apr 8, 2024

tsegismont Apr 8, 2024

Choose a reason for hiding this comment

lenin-jaganathan Apr 8, 2024

Choose a reason for hiding this comment

shakuzen Apr 8, 2024

Choose a reason for hiding this comment

shakuzen commented Apr 8, 2024 • edited Loading

lenin-jaganathan commented Apr 10, 2024

shakuzen commented Apr 17, 2024

This comment was marked as spam.

This comment was marked as spam.

lenin-jaganathan commented Mar 15, 2024 •

edited

Loading

shakuzen left a comment •

edited

Loading

shakuzen commented Apr 5, 2024 •

edited

Loading

shakuzen commented Apr 8, 2024 •

edited

Loading