-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do Not Merge] Add description doc for DotNet Runtime metrics #404
Changes from 10 commits
8d37794
f3c0490
6e6c315
ace0885
284cc17
0e40628
d5ff66a
f976606
2c10b72
53628cf
8bccfa3
04a63ef
1f058c6
33c5ee9
c68f86e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
# Runtime metrics description | ||
|
||
Metrics name are prefixed with the `process.runtime.dotnet.` namespace, following | ||
the general guidance for runtime metrics in the | ||
[specs](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#runtime-environment-specific-metrics---processruntimeenvironment). | ||
Instrument Units [should follow](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md#instrument-units) | ||
the Unified Code for Units of Measure. | ||
Comment on lines
+3
to
+7
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re-worded this a bit: Metric names are prefixed with the |
||
|
||
## GC related metrics | ||
|
||
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsGcEnabled` switch. | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-----------------------------------------------|--------------------------|-----------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**gc.totalmemorysize** | Total allocated bytes | `By` | ObservableGauge | `Int64` | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think many of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll probably need to to get more precise on this one. Some different possible interpretations:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some raw notes from our discussion: process.runtime.dotnet.gc.heapsize: live object (not necessarily hold by the user objects/threads, could be hold by the GC itself) + fragmentation all the gen adding up should <= committed (since GC commits at the granularity of page, and it normally would ask for multiple pages instead of just one for caching purpose) neither of the above includes GC's own memory usage github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md GC.GetGenerationSize(n) is not providing the transactional behavior since we have to call it multiple times for different gen, it is okay as long as we mention it in the description/doc remove gc.totalmemorysize since it can be covered by adding up different gens |
||
| process.runtime.dotnet.**gc.count** | Garbage Collection count | `{times}` | ObservableCounter | `Int64` | gen | gen0, gen1, gen2 | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is suggested by @noahfalk - for folks who have been using the EventCounters, we might consider something like "legacy time in GC" and make it an explicit opt-in (with a clear API name indicating that it has an unclear definition, and we don't recommend it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fwiw we should be able to give a precise definition for the legacy counter too, it just isn't always useful or intuitive. I'm pretty sure the existing counter is computed like this: GC_n = most recent GC that has ended prior to computing the value T_end_n = time when GC_n ended inter_gc_time = T_end_n - T_end_n-1 As an example if GC_n-1 ran from 2.1 sec to 2.2 sec and GC_n ran from 2.5 sec to 3.0 sec then People usually assume that the value measures the fraction of GC time during a fixed sampling interval (say every 5 minutes) but really it is measuring the fraction of GC time during a variable interval between the most recent GC and the one before that. If the most recent two GCs happen to be close together the counter may appear unexpectedly high and this often causes confusion. |
||
|
||
Question for .NET team: is GC.GetTotalMemory(false) always equal to the sum of | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
GC.GetGCMemoryInfo().GenerationInfo[i].SizeAfterBytes (i from 0 to 4)? | ||
|
||
I need to decide whether it makes sense to include both of them in the memory/GC | ||
size metrics. | ||
|
||
- [GC.GetTotalMemory](https://docs.microsoft.com/dotnet/api/system.gc.gettotalmemory): | ||
The number of bytes currently thought to be allocated. | ||
It does not wait for garbage collection to occur before returning. | ||
|
||
- [GC.CollectionCount](https://docs.microsoft.com/dotnet/api/system.gc.collectioncount): | ||
The number of times garbage collection has occurred for the specified generation | ||
of objects. | ||
|
||
### Additional GC metrics only available for NETCOREAPP3_1_OR_GREATER | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|---------------------------------------------------|--------------------------------------------------|-------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**gc.allocated.bytes** | Bytes allocated over the lifetime of the process | `By` | ObservableCounter | `Int64` | | | | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| process.runtime.dotnet.**gc.fragmentation.ratio** | GC fragmentation ratio | `1` | ObservableGauge | `Double` | | | | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- [GC.GetTotalAllocatedBytes](https://docs.microsoft.com/dotnet/api/system.gc.gettotalallocatedbytes): | ||
Gets a count of the bytes allocated over the lifetime of the process. The returned | ||
value does not include any native allocations. The value is an approximate count. | ||
|
||
- GC fragmentation ratio is calculated as: | ||
If `gcMemoryInfo.HeapSizeBytes != 0`, | ||
the value is | ||
`gcMemoryInfo.FragmentedBytes * 1.0d / gcMemoryInfo.HeapSizeBytes`, | ||
otherwise the value is `0`, where `var gcMemoryInfo = GC.GetGCMemoryInfo()`. | ||
|
||
- [GCMemoryInfo.FragmentedBytes](https://docs.microsoft.com/dotnet/api/system.gcmemoryinfo.fragmentedbytes?view=netcore-3.1): | ||
Gets the total fragmentation when the last garbage collection occurred. | ||
- [GCMemoryInfo.HeapSizeBytes](https://docs.microsoft.com/dotnet/api/system.gcmemoryinfo.heapsizebytes?view=netcore-3.1#system-gcmemoryinfo-heapsizebytes): | ||
Gets the total heap size when the last garbage collection occurred. | ||
|
||
### Additional GC metrics only available for NET6_0_OR_GREATER | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-----------------------------------------|--------------------|-------|-----------------|------------|------------------|----------------------------| | ||
| process.runtime.dotnet.**gc.committed** | GC Committed Bytes | `By` | ObservableGauge | `Int64` | | | | ||
| process.runtime.dotnet.**gc.heapsize** | | `By` | ObservableGauge | `Int64` | gen | gen0, gen1, gen2, loh, poh | | ||
|
||
- [GCMemoryInfo.TotalCommittedBytes](https://docs.microsoft.com/dotnet/api/system.gcmemoryinfo.totalcommittedbytes?view=net-6.0#system-gcmemoryinfo-totalcommittedbytes): | ||
Gets the total committed bytes of the managed heap. | ||
|
||
- [GC.GetGCMemoryInfo().GenerationInfo[i].SizeAfterBytes](https://docs.microsoft.com/dotnet/api/system.gcgenerationinfo): | ||
Represents the size in bytes of a generation on exit of the GC reported in GCMemoryInfo. | ||
(The number of generations `i` is limited by [GC.MaxGeneration](https://docs.microsoft.com/dotnet/api/system.gc.maxgeneration)) | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## JIT Compiler related metrics | ||
|
||
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsJitEnabled` switch. | ||
|
||
These metrics are only available for NET6_0_OR_GREATER. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using the preprocessor directives in this doc like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is opened to just solicit feedback only. |
||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-------------------------------------------------|--------------------------|-------------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**il.bytes.jitted** | IL Bytes Jitted | `By` | ObservableCounter | `Int64` | | | | ||
| process.runtime.dotnet.**methods.jitted.count** | Number of Methods Jitted | `{methods}` | ObservableCounter | `Int64` | | | | ||
| process.runtime.dotnet.**time.in.jit** | Time spent in JIT | `ns` | ObservableCounter | `Int64` | | | | ||
|
||
[JitInfo.GetCompiledILBytes](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompiledilbytes?view=net-6.0#system-runtime-jitinfo-getcompiledilbytes(system-boolean)): | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Gets the number of bytes of intermediate language that have been compiled. | ||
The scope of this value is global. | ||
|
||
[JitInfo.GetCompiledMethodCount](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompiledmethodcount?view=net-6.0#system-runtime-jitinfo-getcompiledmethodcount(system-boolean)): | ||
Gets the number of methods that have been compiled. | ||
The scope of this value is global. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to state |
||
|
||
[JitInfo.GetCompilationTime](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompilationtime?view=net-6.0#system-runtime-jitinfo-getcompilationtime(system-boolean)): | ||
Gets the amount of time the JIT Compiler has spent compiling methods. | ||
The scope of this value is global. | ||
|
||
## Threading related metrics | ||
|
||
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsThreadingEnabled` switch. | ||
|
||
These metrics are only available for NETCOREAPP3_1_OR_GREATER. | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-------------------------------------------------------------|--------------------------------------|-------------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**monitor.lock.contention.count** | Monitor Lock Contention Count | `{times}` | ObservableCounter | `Int64` | | | | ||
| process.runtime.dotnet.**threadpool.thread.count** | ThreadPool Thread Count | `{threads}` | ObservableGauge | `Int32` | | | | ||
| process.runtime.dotnet.**threadpool.completed.items.count** | ThreadPool Completed Work Item Count | `{items}` | ObservableCounter | `Int64` | | | | ||
| process.runtime.dotnet.**threadpool.queue.length** | ThreadPool Queue Length | `{items}` | ObservableGauge | `Int64` | | | | ||
| process.runtime.dotnet.**active.timer.count** | Number of Active Timers | `{timers}` | ObservableGauge | `Int64` | | | | ||
|
||
- [Monitor.LockContentionCount](https://docs.microsoft.com/dotnet/api/system.threading.monitor.lockcontentioncount?view=netcore-3.1): | ||
Gets the number of times there was contention when trying to take the monitor's | ||
lock. | ||
- [ThreadPool.ThreadCount](https://docs.microsoft.com/dotnet/api/system.threading.threadpool.threadcount?view=netcore-3.1): | ||
Gets the number of thread pool threads that currently exist. | ||
- [ThreadPool.CompletedWorkItemCount](https://docs.microsoft.com/dotnet/api/system.threading.threadpool.completedworkitemcount?view=netcore-3.1): | ||
Gets the number of work items that have been processed so far. | ||
- [ThreadPool.PendingWorkItemCount](https://docs.microsoft.com/dotnet/api/system.threading.threadpool.pendingworkitemcount?view=netcore-3.1): | ||
Gets the number of work items that are currently queued to be processed. | ||
- [Timer.ActiveCount](https://docs.microsoft.com/dotnet/api/system.threading.timer.activecount?view=netcore-3.1): | ||
Gets the number of timers that are currently active. An active timer is registered | ||
to tick at some point in the future, and has not yet been canceled. | ||
|
||
## Process related metrics | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a general question - do we want these to be in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd vote separate package and OpenTelemetry.Instrumentation.Process as a potential name? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsProcessEnabled` switch. | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-------------------------|----------------------------------------|----------------|-------------------|------------|------------------|------------------| | ||
| process.cpu.utilization | CPU utilization of this process | `1` | ObservableGauge | `Double` | | | | ||
| process.cpu.time | Processor time of this process | `s` | ObservableCounter | `Int64` | state | user, system | | ||
| process.cpu.count | The number of available logical CPUs | `{processors}` | ObservableGauge | `Int64` | | | | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| process.memory.usage | The amount of physical memory in use | `By` | ObservableGauge | `Int64` | | | | ||
| process.memory.virtual | The amount of committed virtual memory | `By` | ObservableGauge | `Int64` | | | | ||
xiang17 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- CPU utilization | ||
- [Process.TotalProcessorTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.totalprocessortime) | ||
divided by ([Environment.ProcessorCount](https://docs.microsoft.com/dotnet/api/system.environment.processorcount) | ||
\* ([DateTime.Now](https://docs.microsoft.com/dotnet/api/system.datetime.now) - | ||
[Process.StartTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.starttime))) | ||
|
||
- CPU Time: | ||
- [Process.UserProcessorTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.userprocessortime): | ||
Gets the user processor time for this process. | ||
- [Process.PrivilegedProcessorTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.privilegedprocessortime): | ||
Gets the privileged processor time for this process. | ||
|
||
- [Environment.ProcessorCount](https://docs.microsoft.com/dotnet/api/system.environment.processorcount): | ||
Gets the number of processors available to the current process. | ||
- Memory usage: [Process.GetCurrentProcess().WorkingSet64](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.workingset64): | ||
Gets the amount of physical memory, in bytes, allocated for the currently | ||
active process. | ||
- Memory virtual: [Process.GetCurrentProcess().VirtualMemorySize64](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.virtualmemorysize64): | ||
Gets the amount of the virtual memory, in bytes, allocated for the currently | ||
active process. | ||
|
||
Question: EventCounter implementation exposes a metric named `working-set` with | ||
`Environment.WorkingSet`. Is it equal to `Process.GetCurrentProcess().WorkingSet64` | ||
property? I need to decide on which is more suitable for showing users the memory | ||
usage for the process, or whether to include both. | ||
|
||
- [Environment.WorkingSet](https://docs.microsoft.com/en-us/dotnet/api/system.environment.workingset?view=net-6.0): | ||
A 64-bit signed integer containing the number of bytes of physical memory mapped | ||
to the process context. | ||
|
||
## Assemblies related metrics | ||
|
||
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsAssembliesEnabled` switch. | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-------------------------------------------|-----------------------------|----------------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**assembly.count** | Number of Assemblies Loaded | `{assemblies}` | ObservableCounter | `Int64` | | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. .NET runtime has the ability to unload assemblies, so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your comment is for .NET Framework and not for .NET Core as .NET Core doesn't allow more than one appomain in the process.
I am not sure if this is accurate either. You may unload the whole appdomain and not individual assembly. So, technically the In .NET Core, uses AssemblyLoadContext for unloading all assemblies in that context by unloading the context. CC @noahfalk There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't tested it, but I would assume that assemblies in any AssemblyLoadContext, including an unloadable one, appear in the AppDomain.GetAssemblies() list. In that case AppDomain.GetAssemblies().Length could decrease because a load context unloaded. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will such a counter have really a useful meaning in .NET core apps? And how can users handle such a counter in .NET Framework apps? or we don't care much about .NET Framework cases? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My answer is YES. I can think of a valid use case - certain online service / background job run out of memory after hours and the engineers can tell from the "assembly loaded" metrics that the process kept loading plugins without unloading them. This happened in my previous team, where a backend processing job detects new XML schema and compiles it to assembly (for sake of perf) and load it dynamically, the backend job ended up loading tens of thousands of assemblies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good then! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, Tarek and Noah! I'm getting the assembly count for the "current domain" I found one remark about a special case when the assembly is in the default application domain:
If Noah's right about assemblies in AssemblyLoadContext would appear in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we are reporting the number from the current appdomain only, I'll question the @reyang's scenario To clarify, .NET Framework can have multiple appdomains. I guess it will be more useful reporting the total count of assemblies in all appdomains and not necessary the current appdomain. In .NET Core, there is only one appdomain, so reporting the number from the current appdomain would work, I guess. We need to confirm loading and unloading AssemblyLoadContext is going to be reflected to the number we report. When I get some time, I can try to do that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have confirmed in .NET Core apps when unloading the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
- [AppDomain.GetAssemblies](https://docs.microsoft.com/dotnet/api/system.appdomain.getassemblies): | ||
Gets the number of the assemblies that have been loaded into the execution context | ||
of this application domain. | ||
|
||
## Exception counter metric | ||
|
||
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsExceptionCounterEnabled` switch. | ||
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|--------------------------------------------|--------------------------------------------|------------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**exception.count** | Number of exception thrown in managed code | `{timers}` | ObservableCounter | `Int64` | | | | ||
|
||
- [AppDomain.FirstChanceException](https://docs.microsoft.com/dotnet/api/system.appdomain.firstchanceexception) | ||
Occurs when an exception is thrown in managed code, before the runtime searches | ||
the call stack for an exception handler in the application domain. | ||
|
||
## Currently out of scope | ||
|
||
Regarding process.runtime.dotnet.**time-in-gc**: (DisplayName in [EventCounter implementation](/~https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/RuntimeEventSource.cs#L96) | ||
is "% Time in GC since last GC".) A new metric should replace it by calling a new | ||
API GC.GetTotalPauseDuration(). | ||
The new API is added in code but not available yet. | ||
It is targeted for 7.0.0 milestone in .NET Runtime repo. | ||
See [dotnet/runtime#65989](/~https://github.com/dotnet/runtime/issues/65989) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Runtime metrics overview" or "Runtime metric details" here instead of "description"? Also we should link to this from README.