Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/cgroupruntime] Be aware of ECS task and CPU limits #36814

Closed
r0mdau opened this issue Dec 12, 2024 · 10 comments · Fixed by #36920
Closed

[extension/cgroupruntime] Be aware of ECS task and CPU limits #36814

r0mdau opened this issue Dec 12, 2024 · 10 comments · Fixed by #36920
Labels

Comments

@r0mdau
Copy link
Contributor

r0mdau commented Dec 12, 2024

Component(s)

extension/cgroupruntime

Is your feature request related to a problem? Please describe.

The automaxprocs library from Uber is not setting GOMAXPROCS in a ECS task environment.

See this issue: uber-go/automaxprocs#66

Describe the solution you'd like

Try to integrate the rdforte/gomaxecs library that solves the issue.

Describe alternatives you've considered

Creating a custom collector, disabling gomaxprocs and only using auto memory limit for this extension + adding rdforte/gomaxecs library

Additional context

No response

@r0mdau r0mdau added enhancement New feature or request needs triage New item requiring triage labels Dec 12, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@rogercoll
Copy link
Contributor

Thanks for submitting this feature request @r0mdau, I support on adding this feature as well.

Do you know if there is a reliable way to know if the binary is running within ECS? If ECS_CONTAINER_METADATA_URI_V4 environment variable is always available within an ECS container, we could check its existence to use the /~https://github.com/rdforte/gomaxecs/ instead.

@r0mdau
Copy link
Contributor Author

r0mdau commented Dec 13, 2024

I checked on multiple ECS clusters we manage and I confirm this environment variable is pushed and available in the container (ECS Task).

For reference, you need a certain version

The environment variable is injected by default into the containers of Amazon ECS tasks launched on Amazon EC2 Linux instances that are running at least version 1.39.0 of the Amazon ECS container agent. For Amazon EC2 Windows instances that use awsvpc network mode, the Amazon ECS container agent must be at least version 1.54.0. For more information, see [Amazon ECS Linux container instance management](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/manage-linux.html).

v1.39.0 is April 2020 old : /~https://github.com/aws/amazon-ecs-agent/tree/v1.39.0

All the releases: /~https://github.com/aws/amazon-ecs-agent/releases

@r0mdau
Copy link
Contributor Author

r0mdau commented Dec 14, 2024

Implementation proposition 1, add the logic in the createExtension / newCgroupRuntime function, something that could work from now:

import maxprocsecs "github.com/rdforte/gomaxecs/maxprocs"
...
const awsEcsContainerMetadataUri = "ECS_CONTAINER_METADATA_URI_V4"
...
func() (undoFunc, error) {
	var err error
	var undo func()
	if os.Getenv(awsEcsContainerMetadataUri) != "" {
		prev := runtime.GOMAXPROCS(0)
		undo = func() {
			set.Logger.Debug(fmt.Sprintf("maxprocs: Resetting GOMAXPROCS to %v", prev))
			runtime.GOMAXPROCS(prev)
		}
		maxprocsecs.Set(log.Default())
	} else {
		undo, err = maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
			set.Logger.Debug(fmt.Sprintf(str, params))
		}))
	}
	return undoFunc(undo), err
},

Proposition 2 is to submit a PR to the rdforte/gomaxecs repo to include the same undo func returned and allow to pass the Logger in order to have an implementation looks like uber maxprocs.

@rdforte
Copy link

rdforte commented Dec 14, 2024

@r0mdau If you were leaning towards Proposition 2 I would be onboard to make the change to rdforte/gomaxecs.

@r0mdau
Copy link
Contributor Author

r0mdau commented Dec 15, 2024

@rdforte To me, the uber-go/automaxprocs implementation for the Set() function looks nice, I lean towards proposition 2 of course and I bet @rogercoll will too.

@rdforte
Copy link

rdforte commented Dec 15, 2024

@r0mdau I have added a draft PR which implements the changes as mentioned for Proposition 2. I also added a new method isECS so you shouldn't need to reference the ECS Metadata URI and we can keep those implementation details confined to within the bounds of the package.

rdforte/gomaxecs#8

@rogercoll
Copy link
Contributor

@r0mdau If you were leaning towards Proposition 2 I would be onboard to make the change to rdforte/gomaxecs.

That would be awesome, thanks! Having similar APIs for the packages we use will facilitate the integration and maintainability of the extension.

@rogercoll
Copy link
Contributor

@mx-psi Do you support adding this feature as well? Using /~https://github.com/rdforte/gomaxecs/ looks like a good fit for this use case.

Shall we wait for the integrations tests before implementing this?

@mx-psi mx-psi added priority:p2 Medium and removed needs triage New item requiring triage labels Dec 17, 2024
@mx-psi
Copy link
Member

mx-psi commented Dec 17, 2024

I think we should wait but I support adding the feature after this

@mx-psi mx-psi closed this as completed in cc1232e Jan 8, 2025
AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this issue Jan 13, 2025
…elemetry#36920)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Allow the cgroupruntime extension to set GOMAXPROCS based on AWS ECS
metadata. See related issue for detailed informations.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#36814 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Added integration test with `httptest` for the ECS metadata endpoint.

Something to clarify:
open-telemetry#36617 (comment)

<!--Describe the documentation added.-->
#### Documentation

Added extension name and link in the README.

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Signed-off-by: Romain Dauby <romain.dauby@gmail.com>
Co-authored-by: Roger Coll <roger.coll@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants