Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process.cpu.count metric #2392

Closed
wants to merge 5 commits into from

Conversation

trask
Copy link
Member

@trask trask commented Mar 1, 2022

The motivation for this PR is to find the right place to capture a metric for Java's Runtime.availableProcessor(), in a way that is language-neutral, rather than throwing it under process.runtime.jvm.*.

See initial attempt at #2384, but as @bogdandrutu pointed out and I confirmed via testing, Runtime.availableProcessor() can be less than the system CPU count, e.g. by launching the process via taskset.

Changes

Adds a new metric process.cpu.count to capture the number of CPUs available to the process.

Two open questions:

  • type: Gauge vs Async UpDownCounter
  • name: process.cpu.count vs process.cpu.available

My initial thoughts on these questions:

  • type: Async UpDownCounter since it's counting # of CPUs.
  • name: process.cpu.count seems simpler, and available sounds like availability, but I don't have strong feelings about this.

@trask trask force-pushed the add-process-cpu-count branch from 9060c4b to 4b4d148 Compare March 1, 2022 23:36
@trask trask marked this pull request as ready for review March 1, 2022 23:36
@trask trask requested review from a team March 1, 2022 23:36
@arminru arminru added area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory labels Mar 2, 2022
@trask
Copy link
Member Author

trask commented Mar 2, 2022

@tigrannajaryan @bogdandrutu related to #2384, is it ok for Java process itself to report process.cpu.time and process.cpu.count? or should we define new metrics under process.runtime.jvm.* if we want to report those from inside the Java process?

@bogdandrutu
Copy link
Member

@trask would be good to see if we have more cases like this. For example in a k8s environment, do we consider the cpu_limit as being equivalent with this? Are there any other languages with similar capability?

@trask
Copy link
Member Author

trask commented Mar 2, 2022

@trask would be good to see if we have more cases like this

would it be better for us to define process.runtime.jvm.cpu.time and process.runtime.jvm.cpu.count for now then? we can always migrate to process.cpu.time and process.cpu.count in the future via schema mapping

@bogdandrutu
Copy link
Member

If we cannot find any other use-cases the jvm versions are the answer, but was looking to see if other maintainers have some input here.

@trask
Copy link
Member Author

trask commented Mar 3, 2022

looking to see if other maintainers have some input here

@open-telemetry/dotnet-approvers @open-telemetry/go-approvers do the .NET or Go runtimes have the equivalent of process.cpu.time (and maybe process.cpu.count)?

e.g. similar to the JVM's

@pellared
Copy link
Member

pellared commented Mar 3, 2022

looking to see if other maintainers have some input here

@open-telemetry/dotnet-approvers @open-telemetry/go-approvers do the .NET or Go runtimes have the equivalent of process.cpu.time (and maybe process.cpu.count)?

e.g. similar to the JVM's

.NET:

  • process.cpu.time - could be done e.g. via DateTime.UtcNow - Process.GetCurrentProcess().StartTime.ToUniversalTime()
  • process.cpu.count - Environment.ProcessorCount

Go:

  • process.cpu.time - AFAIK nothing OOTB. But something like cpu.Times from /~https://github.com/shirou/gopsutil could be implemented/used. or some time.Now in initialized in a global variable and calculating the interval using time.Since if the precision is not very important
  • process.cpu.count - runtime.NumCPU

@carlosalberto
Copy link
Contributor

From what @pellared posted, DotNet says that

The value returned by this API is fixed at .NET runtime startup for the process lifetime. It does not reflect changes in the environment settings while the process is running.

And for Go:

The set of available CPUs is checked by querying the operating system at process startup. Changes to operating system CPU allocation after process startup are not reflected.

Which means, at least these specific values, cannot be used for metrics purpose, as they don't change.

@Aneurysm9
Copy link
Member

Go's host instrumentation does have process.cpu.time, which is obtained from the gopsutil/process package @pellared mentioned.

@trask
Copy link
Member Author

trask commented Mar 4, 2022

@bogdandrutu do you need more maintainer feedback?

It looks like Go is already capturing process.cpu.time, and .NET is discussing that they would like to as well: open-telemetry/opentelemetry-dotnet-contrib#207 (comment)

I'd like to propose that we move forward with this PR, and with allowing language-based instrumentation to emit process.* metrics for their own process.

@carlosalberto
Copy link
Contributor

Ping @bogdandrutu

@carlosalberto
Copy link
Contributor

Hey @trask -

Bogdan is on holidays this week. Shall we come back to the discussion next week?

@trask
Copy link
Member Author

trask commented Mar 10, 2022

no problem

@carlosalberto
Copy link
Contributor

Ping @bogdandrutu

@reyang
Copy link
Member

reyang commented Mar 18, 2022

With mobile devices moving towards to the asymmetric multi-core processor, do we consider a dimension which could tell whether it is a Performance vs. Efficiency core? Probably not an interesting topic for service developers, but definitely interesting for device/mobile.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to report a metric like this, but was curios if we need count, or "utilization" to be consistent with system.

@@ -30,6 +30,7 @@ Below is a table of Process metric instruments.
| Name | Instrument | Units | Description | Labels |
|------|------------|-------|-------------|--------|
| `process.cpu.time` | Asynchronous Counter | s | Total CPU seconds broken down by different states. | `state`, if specified, SHOULD be one of: `system`, `user`, `wait`. A process SHOULD be characterized _either_ by data points with no `state` labels, _or only_ data points with `state` labels. |
| `process.cpu.count` | Asynchronous UpDownCounter | 1 | The number of logical CPUs available to the process. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that utilization is what users generally want to see. But utilization collected client-side is just a gauge and can't be aggregated over time, so I think it's nice to capture process.cpu.time and process.cpu.count where possible and display utilization based on those two metrics.

Copy link
Member

@bogdandrutu bogdandrutu Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the same debate for system.cpu.utilization and the result of the discussion is that system.cpu.utilization was simpler for some backends than calculating from two metrics.

But utilization collected client-side is just a gauge and can't be aggregated over time

Not sure this is true, since if you calculate "delta utilization" if process.cpu.count does not change, you can correctly merge them by doing sum:

Timestamp 0 -> cpu.time0/count
Timestamp 1 -> cpu.time1/count -> report (cpu.time 1 - cpu.time 0) / count / (Timestamp 1 - Timestamp 0)
Timestamp 2 -> cpu.time1/count -> report (cpu.time 2 - cpu.time 1) / count / (Timestamp 2 - Timestamp 1) -> percent of total cpu / second.

The idea was that if we report roughly every same interval (also we have start and current time) you can average them two reported value to get the utilization over (Timestamp 2 - Timestamp 0).

If the argument is wrong for the system.cpu.utilization we should change that as well, I want consistency :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I don't have any objection to that approach. From a JVM metrics perspective, there was some interest in capturing available cpu count separate from utilization because it gives clues about GC and common thread pool sizing, but we haven't mapped out GC or thread pool metrics yet, so will revisit that then if/when we have a more specific need. I'll send a new PR to propose adding process.cpu.utilization, with this definition of process.cpu.time divided by elapsed time divided by "available processor count"

@trask
Copy link
Member Author

trask commented Mar 24, 2022

Based on discussion with @bogdandrutu above, I have created #2436 instead.

Closing this, will revisit if/when we have a specific need for it.

@trask trask closed this Mar 24, 2022
@trask trask deleted the add-process-cpu-count branch March 24, 2022 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants