Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic exponential histogram #1736

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

xuan-cao-swi
Copy link
Contributor

@xuan-cao-swi xuan-cao-swi commented Sep 27, 2024

Description

This PR introduces basic functionality for exponential histograms, similar to Python and JavaScript. Most of the code is adapted/copied from Python. The only missing feature compared to other languages is merging.

Exponential histogram merging aims to make bucket sizes consistent. With the current otel-ruby metrics collection mechanism, raw data is sent unmodified to the collector (e.g., from @data_point).

I think it's best to implement the merge feature on the collector side to reduce duplicate work and computation on the client side. Downscaling is implemented on the language side to adjust bucket sizes, avoiding out-of-range indices for very large or small numbers.

Test

Most test cases are copied from Python/JavaScript to ensure the correctness of mapping and downscaling.

I didn't add all test case from Python as I think some of the test cases are redundant (e.g. 0 increment, etc.) but please correct me if I am wrong.

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @xuan-cao-swi!


# The default boundaries is calculated based on default max_size and max_scale value
def initialize(
aggregation_temporality: ENV.fetch('OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE', :delta), # TODO: aggregation_temporality should be renamed to collect_aggregation_temporality for clear definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this should be renamed. Can you tell me more? Is there a spec reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a mistake; removed the comment.

record_min_max: true,
zero_threshold: 0
)
@data_points = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @data_points was refactored out of the other aggregations when views were merged in. Why save them as an instance variable in this aggregation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might start the pr before metrics view got merged. Updated.

end

def validate_scale(scale)
return scale unless scale > MAX_SCALE || scale < MIN_SCALE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how performant it is, but another way to implement this could be:

(MIN_SCALE..MAX_SCALE).cover?(scale)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use operator will be faster since it's direct comparisons and short-circuiting behavior; but range comparison looks nice. Let me know if I need to change it.

       user     system      total        real
Direct Comparison:  0.047998   0.000000   0.047998 (  0.047786)
Range Cover:  0.172396   0.000963   0.173359 (  0.173389)


def downscale(change, positive, negative)
return if change == 0
raise 'Invalid change of scale' if change.negative?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this get caught by the error handler? I'm concerned if we raise, we might crash a user's app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return if change <= 0

module Aggregation
module ExponentialHistogram
# Buckets is the fundamental building block of exponential histogram that store bucket/boundary value
class Buckets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this class have test coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no; it's intensively test through test case for exponential_bucket_histogram. If bucket is not correct, the exponential_bucket_histogram will fail.

I will come up with some test case for the class inside /exponential_histogram/

Comment on lines 24 to 25
let(:start_time) { (Time.now.to_r * 1_000_000_000).to_i }
let(:end_time) { ((Time.now + 60).to_r * 1_000_000_000).to_i }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 69 to 71
# The corresponding Go test is TestAlternatingGrowth1 where:
# agg := NewFloat64(NewConfig(WithMaxSize(4)))
# agg is an instance of github.com/lightstep/otel-launcher-go/lightstep/sdk/metric/aggregator/histogram/structure.Histogram[float64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I can't get this link to load. Is there a permalink option?

Copy link
Contributor Author

@xuan-cao-swi xuan-cao-swi Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct_boundary = right_boundary(scale, correct_min_index)

# truffleruby will fail because truffleruby `must_equal` check the exact precision of float point
_(correct_boundary).must_equal(boundary) if RUBY_ENGINE != 'truffleruby'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truffleruby will compare the exact float point when use must_equal.

For example, for scale =-10, boundary is 5.562684646268003e-309; correct_boundary is 1/179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216,

Since 5.562684646268003e-309 will assume all number after 3 will be 0, so it's different compare to correct_boundary. Truffleruby will treat them as not equal but other ruby variation will accept it.

Let me know if I miss something.

xuan-cao-swi and others added 5 commits January 10, 2025 11:04
…ial_bucket_histogram.rb

Co-authored-by: Kayla Reopelle <87386821+kaylareopelle@users.noreply.github.com>
…ial_bucket_histogram.rb

Co-authored-by: Kayla Reopelle <87386821+kaylareopelle@users.noreply.github.com>
…tial_bucket_histogram_test.rb

Co-authored-by: Kayla Reopelle <87386821+kaylareopelle@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Development

Successfully merging this pull request may close these issues.

Implement ExponentialBucketHistogram aggregation
2 participants