Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of meter interaction #4857

Merged
merged 4 commits into from
Apr 8, 2024

Conversation

lenin-jaganathan
Copy link
Contributor

@lenin-jaganathan lenin-jaganathan commented Mar 15, 2024

Fixes #4856,

This PR introduces the following changes,

  • Keeps a map to do the look-up between 'Id` without meter filters to the actual meter. When create is called on a meter, we first check if a mapping exists and immediately return the mapped meter from the actual map that holds the meters.
  • When meter filters are added to the registry, the meters registered until this point are maintained in the set and when the create operation happens on these meters, we re-compute the mapping similarly to keep the existing behavior.

TODO:

  • Add additional tests to MeterRegistryTest

Copy link
Member

@shakuzen shakuzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this pull request. It's a great idea for a performance gain for everyone with common usage scenarios and hopefully it will remove the need for bespoke caching when using Micrometer for particularly performance sensitive users.

That said, I'm a bit worried about the side effects of this as it currently is. The biggest thing we want to avoid is causing metrics to break in an unexpected way that is not caught until deployed to production.

Before this pull request, the issue with meters registered before a MeterFilter is configured is that they will not have that filter applied to them. If getOrCreate is called again, then the filter will be applied and a new meter will be created (assuming the filter alters the ID), and if a reference is not maintained to the first meter created (before the MeterFilter was configured), then the first meter will be essentially orphaned. Calls to removeByPreFilterId will not remove the orphaned meters. Calls to remove would still work if a reference to the meter or id of the "orphaned" meter was used.

I worry some users may inadvertently be relying on this behavior and it may not be an issue for them if they're not maintaining a reference to a meter registered before a MeterFilter is configured.

Giving a concrete, albeit contrived, example below:

@Test
void meterRegistrationBeforeAndAfterMeterFilterRegistration() {
    Counter c1 = registry.counter("counter");
    registry.config().meterFilter(MeterFilter.commonTags(Tags.of("common", "tag")));
    Counter c2 = registry.counter("counter");
    
    Timer t1 = registry.timer("timer");
    
    registry.remove(c1);
    registry.remove(c2);
}

Before this pull request, c1 and c2 will be different meters. c2 and t1 will have the common tags; c1 will not. Each call to remove will remove the different meters.

After this pull request (as it currently is), c1 and c2 will be the same meter, without the common tags. This is what I worry will be surprising and breaking behavior that won't necessarily get caught until it is too late. The first remove call will remove the only counter registered.

I have some ideas I'd like to try to see if we can get to some compromise where the performance gains are there most of the time but we still allow for late MeterFilter configuration while warning about it, which I'll do in a separate pull request. Timeline wise, we should definitely get the warning log in the RC to start getting user feedback on scenarios that would be affected by this as soon as possible so we can hopefully get more confident about restricting when a MeterFilter can be configured. I would also like get this performance enhancement in as well, but we'll see where we end up on trying to retain the MeterFilter behavior more. I suspect MeterFilters shouldn't be configured so late or often even if they're after some meters have been registered.

System.arraycopy(filters, 0, newFilters, 0, filters.length);
newFilters[filters.length] = filter;
filters = newFilters;
if (meterMap.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it a no-op is too aggressive of a change. Users have little way of knowing this is wrong behavior now (other than noticing meters registered before don't have the filter applied). I think we should rather log a warning and gather feedback on scenarios in which users are running into this so they can be corrected before we switch to a no-op. It may not be the user's fault. It may be a framework or library doing things that they have no direct control over. I'll open a separate pull request to add the logging because I think that makes sense separate of the goal of this pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the promised PR: #4917

@shakuzen
Copy link
Member

shakuzen commented Apr 3, 2024

Here's an implementation of an idea to maintain existing behavior while allowing this optimization in cases when a MeterFilter hasn't been configured since the meter was last registered. See the two tests added checking the existing behavior and the new behavior. I'll look more into concurrency and benchmarking tomorrow.

@lenin-jaganathan
Copy link
Contributor Author

I was away for a while and hence didn't get back on this one quickly. I will have a look at the implementations today. Meanwhile, one additional thing we might want to consider might be How solving this would affect a composite meter registry and its initial design choices. Left a comment here.

@shakuzen
Copy link
Member

shakuzen commented Apr 5, 2024

I think we're ready to add the commits over here to this pull request. I've added concurrency tests and benchmarks. I'll share more info when I'm back at the computer later, but feel free to leave any feedback in the meantime. I'm hopeful we can get this into the 1.13.0 RC scheduled for Monday.

Copy link
Contributor Author

@lenin-jaganathan lenin-jaganathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for some more potential edge cases, but looks good except for one potential failing case.

At some point would also be good to understand what this would mean in terms of memory usage. Should be fairly low since most of things seem to be references, but you never know!!

@@ -780,6 +815,7 @@ public Config commonTags(String... tags) {
public synchronized Config meterFilter(MeterFilter filter) {
if (!meterMap.isEmpty()) {
logWarningAboutLateFilter();
stalePreFilterIds.addAll(preFilterIdToMeterMap.keySet());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we synchronize this on meterMapLock? Given that meterfilter updation is very very rare, this could potentially avoid any meter creation-related activities when filters are getting updated WDYT.

@shakuzen
Copy link
Member

shakuzen commented Apr 8, 2024

Thank you for the additional feedback. I think we will have to consider remaining feedback post-RC1. It's very late for me and we need to get this release out. Feel free to share any other feedback you have. I just wanted to let you know why we're going forward before handling it.

lenin-jaganathan and others added 4 commits April 9, 2024 01:03
MeterFilters were always being applied to the given Id because the meterMap we keep only has the mappedId as a key. To avoid the cost of applying MeterFilters to a given Id, adds a map called preFilterIdToMeterMap. This makes it faster to return existing meters than before.
This makes a compromise between optimizing avoiding applying filters and maintaining existing behavior where it would make a difference. The tests demonstrate the behaviors we want.
Adds tests to check concurrent behavior and measure performance of registering meters. Updates the implementation of dealing with stale IDs to make it less prone to stale reads.
Adds test to ensure we are not leaking state when removing a meter marked stale and fixes the corresponding bug.
private final Map<Id, Meter> meterMap = new ConcurrentHashMap<>();

// writes guarded by meterMapLock, reads can be stale
// VisibleForTesting
final Map<Id, Meter> preFilterIdToMeterMap = new HashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if writes are guarded by meterMapLock, we can't assume it's safe to read from the map. I'm not talking about a potential stale read (which is fine), but the fact that the underlying state of the map could be corrupted and prevent from reading a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why that would happen. Do you know of a scenario where this could happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @tsegismont. I added jcstress tests with the intent of catching such kinds of issues. The current tests are probably lacking scenarios that may run into problems like you mentioned. Could you share something we could add to there to verify the issue and make sure it stays fixed in the future?

@shakuzen shakuzen merged commit fd497c2 into micrometer-metrics:main Apr 8, 2024
6 checks passed
@shakuzen
Copy link
Member

shakuzen commented Apr 8, 2024

Sharing benchmark results from my M1 Macbook Pro:

This PR:

Benchmark                                                                   Mode  Cnt      Score    Error   Units
MeterRegistrationBenchmark.registerExisting                                 avgt    5      0.021 ±  0.001   us/op
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate                   avgt    5  12807.403 ± 53.471  MB/sec
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate.norm              avgt    5    144.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExisting:gc.count                        avgt    5   1206.000           counts
MeterRegistrationBenchmark.registerExisting:gc.time                         avgt    5    625.000               ms
MeterRegistrationBenchmark.registerExistingWithProvider                     avgt    5      0.024 ±  0.001   us/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate       avgt    5  11486.191 ± 48.498  MB/sec
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate.norm  avgt    5    144.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.count            avgt    5   1081.000           counts
MeterRegistrationBenchmark.registerExistingWithProvider:gc.time             avgt    5    555.000               ms
MeterRegistrationBenchmark.registerNew                                        ss  400      7.839 ±  0.688   us/op
MeterRegistrationBenchmark.registerNew:gc.alloc.rate                          ss  400     90.960 ±  5.493  MB/sec
MeterRegistrationBenchmark.registerNew:gc.alloc.rate.norm                     ss  400   4059.800 ± 61.780    B/op
MeterRegistrationBenchmark.registerStale                                      ss  400      6.026 ±  0.908   us/op
MeterRegistrationBenchmark.registerStale:gc.alloc.rate                        ss  400     67.791 ±  4.128  MB/sec
MeterRegistrationBenchmark.registerStale:gc.alloc.rate.norm                   ss  400   4023.130 ± 61.860    B/op

With Micrometer 1.13.0-M2:

Benchmark                                                                   Mode  Cnt     Score    Error   Units
MeterRegistrationBenchmark.registerExisting                                 avgt    5     0.123 ±  0.001   us/op
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate                   avgt    5  4211.074 ± 47.241  MB/sec
MeterRegistrationBenchmark.registerExisting:gc.alloc.rate.norm              avgt    5   272.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExisting:gc.count                        avgt    5   704.000           counts
MeterRegistrationBenchmark.registerExisting:gc.time                         avgt    5   338.000               ms
MeterRegistrationBenchmark.registerExistingWithProvider                     avgt    5     0.101 ±  0.001   us/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate       avgt    5  4083.215 ± 36.611  MB/sec
MeterRegistrationBenchmark.registerExistingWithProvider:gc.alloc.rate.norm  avgt    5   216.000 ±  0.001    B/op
MeterRegistrationBenchmark.registerExistingWithProvider:gc.count            avgt    5   682.000           counts
MeterRegistrationBenchmark.registerExistingWithProvider:gc.time             avgt    5   322.000               ms
MeterRegistrationBenchmark.registerNew                                        ss  400    10.195 ±  1.950   us/op
MeterRegistrationBenchmark.registerNew:gc.alloc.rate                          ss  400    92.470 ±  6.339  MB/sec
MeterRegistrationBenchmark.registerNew:gc.alloc.rate.norm                     ss  400  4256.750 ± 61.338    B/op
MeterRegistrationBenchmark.registerStale                                      ss  400     6.338 ±  0.849   us/op
MeterRegistrationBenchmark.registerStale:gc.alloc.rate                        ss  400    90.097 ±  5.519  MB/sec
MeterRegistrationBenchmark.registerStale:gc.alloc.rate.norm                   ss  400  4017.450 ± 61.371    B/op

Of note, registering existing meters appears almost 6x faster and generates about half as much garbage.

@lenin-jaganathan
Copy link
Contributor Author

Retrospectively thinking about what sort of memory implications this will have when we have the below scenarios,

  1. A meter with dynamicTagValues like uri with userId and meter-filter imposes some limit?
    E.g:
new MeterFilter() {
    private final Set<String> observedTagValues = ConcurrentHashMap.newKeySet();

    @Override
    public Meter.Id map(Meter.Id id) {
        String value = id.getTag("uri");
        if (value != null) {
            if (!observedTagValues.contains(value)) {
                if (observedTagValues.size() >= 100) {
                    maskedTag = Tag.of("uri, "MASKED");
                    return id.withTag(maskedTag);
                }
                observedTagValues.add(value);
            }
        }
        return id;
    }
}

The above meterfilter adds a dummy URI if the number of URIs exceeds a limit (in this case 100). Once 100 unique URI's, are added, this will map all the new ID to a single meter. But, after this fix the mapping will be the same but we will keep on adding the prefilter ID's to the map which will grow indefinitely.

If filters return a NOOP meter we are fine since we don't store them anyway. But, in the above scenario we are altering the URI tag alone to make sure we don't lose the count of requests but still managing tags to a reasonable count.

@shakuzen
Copy link
Member

@lenin-jaganathan that does sound like a problem for unbounded cardinality tags if users are doing something like you showed. I'm not really sure what we can do about it. Would you open a separate issue to track it? Ideas welcome on solutions, too.

shakuzen added a commit that referenced this pull request May 2, 2024
Registering a MeterFilter will mark all existing meters as stale, which causes filters to be applied again next time the meter is registered. But the meter may not be changed by the newly registered filter. In this case, it was not being unmarked as stale, which prevented a shortcut for performance.

See #4857 (comment)
@lenin-jaganathan lenin-jaganathan deleted the meter_cache branch July 1, 2024 08:50
@izeye

This comment was marked as spam.

@izeye

This comment was marked as spam.

izeye added a commit to izeye/micrometer that referenced this pull request Aug 5, 2024
@izeye izeye mentioned this pull request Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not re-apply MeterFilters to IDs when registering if unnecessary
4 participants