Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: 219 Add a new, extendable L3 ECS Service class #221

Closed
wants to merge 6 commits into from

Conversation

nathanpeck
Copy link
Member

@nathanpeck nathanpeck commented Aug 7, 2020


By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license

@mergify
Copy link
Contributor

mergify bot commented Aug 7, 2020

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

@nathanpeck nathanpeck changed the title Adding an RFC for ECS L3 Service class RFC 219 Adding an RFC for ECS L3 Service class Aug 7, 2020
@nathanpeck nathanpeck changed the title RFC 219 Adding an RFC for ECS L3 Service class RFC: 219 Adding an RFC for ECS L3 Service class Aug 7, 2020
@nathanpeck nathanpeck changed the title RFC: 219 Adding an RFC for ECS L3 Service class RFC: 219 Adding a new, extendable L3 ECS Service class Aug 7, 2020
@nathanpeck nathanpeck changed the title RFC: 219 Adding a new, extendable L3 ECS Service class RFC: 219 Add a new, extendable L3 ECS Service class Aug 7, 2020
@nathanpeck nathanpeck mentioned this pull request Aug 7, 2020
7 tasks
Copy link
Contributor

@eladb eladb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments

to the load balancer. An App Mesh extension creates its own App Mesh virtual service and
virtual node as well as configuring the ECS service and ECS task definition to have the
right Envoy sidecar and proxy settings.
- __Extensions are aware of each other__: There are no hidden "gotchas" when enabling a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds a bit hand wavy... How would an extension know about a random 3rd party extension that tweaked and mutated the service?

Maybe this is more "it is possible for extensions to reflect on the service"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is just the tenant, so it is deliberately a bit brief and handwavy. I forgot to elaborate more in the body of the RFC on this. You can see an example in the PR here though: /~https://github.com/aws/aws-cdk/blob/e60b7fdff95bae6d3e1980a64c998dc07d3efbb8/packages/%40aws-cdk/aws-ecs-patterns/lib/service/addons/appmesh.ts#L49-L54

Basically each extension can query for and see other extensions, and change their behavior and settings based on the other extensions. If the other extensions do not match expectations (missing another extension that is required, or has another extension which conflicts) that extension can throw an exception

Comment on lines +125 to +127
- `Environment` - A simple construct which wraps up a VPC, cluster, and other top level
resources needed for ECS services to run on an AWS account
- `Service` - A single application that runs inside an `Environment`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The term "Environment" in the CDK is currently used to represent an AWS account+region. Can we come up with a term that does not overload?
  • What's the relation between Service and AWS::ECS::Service? Are we overloading the term or there a 1:1 mapping between this L3 Service and the underlying AWS::ECS::Service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this may be a bit confusing, however I would argue that "environment" is a sufficiently generic term which is used in many places, by many AWS services, so there should already be a user expectation of "which environment are we talking about". In this case this Environment and Service is deliberately designed to map to the "environment" and "service" in the AWS Copilot project. One of the goals is for this construct to feel familiar, approachable, and consistent to consumers of the new higher level tooling we are building for ECS (not just in CDK, but our CLI, console revamp and other upcoming projects)

There is a 1:1 mapping between this Service construct an an underlying AWS::ECS::Service however this higher level L3 Service also wraps up and abstracts away the AWS::ECS::TaskDefinition and other required resources for a service to run and deploy

Comment on lines +131 to +133
- `ServiceDescription` - A class which is utilized to build out a description of the
application and the features that should be attached to the application. This description
is consumed by the `Service` class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CDK, everything is a "description". What the value in decoupling "Service" and "ServiceDescription"?

Copy link
Member Author

@nathanpeck nathanpeck Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually in the PR have an alternate implementation which abuses the Construct.prepare() method to allow lazy construction of the service's L2 constructs at synthesis time, you can see how that works here: /~https://github.com/aws/aws-cdk/blob/e60b7fdff95bae6d3e1980a64c998dc07d3efbb8/packages/%40aws-cdk/aws-ecs-patterns/test/service-addons/integ.all-service-addons.ts

While the API is simpler and does not require a ServiceDescription it leads to unexpected behavior as the underlying L2 constructs do not exist until synthesis time, so they can not be referenced in your stacks, you can't get attributes off them, etc. In discussions with the team we agreed that a ServiceDescription that is passed to the Service better fit the traditional model of L3 constructs, where all the underlying L2 constructs were being created in the Service constructor and are immediately available and immutable from that point forward

requires touching all of these different constructs and resulting resources it is not
possible to fully implement an extension system at the L2 level. For this reason the
existing `TaskDefinition.addExtension()` is not a viable alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a design perspective, have you considered the following alternative:

  1. Define extensions as simple constructs instead of introducing an additional composition model. Constructs should be sufficient.
  2. Expose read/write APIs in Service so that extensions can interact with it.

The result would look something like this, which includes less moving parts and is more idiomatic to CDK:

const svc = new Service(this, 'MyService', { environment });

new CloudWatchAgentExtension(this, 'CloudWatchAgent', { service: svc, ... });
new AppMeshExtension(this, 'AppMesh', { service: svc, ... });
new FireLensExtension(...);

Another alternative that you could consider is a subclass-based API. In this type of API, the extensions will "discover" the service by traversing up the construct tree (similar to how the Stack is discovered). I am not sure that's a superior model.

class MyAwesomeService extends Service {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    new CloudWatchAgent(this, 'Foo', { ... });
}

Copy link
Member Author

@nathanpeck nathanpeck Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the limitation is the underlying L2 constructs, and the approach to immutability.

So far from what I've seen in CDK the vast majority of the properties of a CDK construct are always readonly: they get set once at constructor time and are immutable from that point forward. This L3 API works within that system. The extensions in the ServiceDescrpition define the properties for the underlying L2 construct, and then when you create that L3 construct it actually creates the underlying L2 constructs, which create the underlying Cfn resources, baking everything into an immutable state from that point forward.

So the general idea is that this will stick with customer expectations that once they call new Construct(stack, 'name', props) the underlying construct is immutable from that point forward. Constructs can't mutate other constructs that are already created. (I don't think I've seen anything similar to that anywhere else in CDK, I'd be happy to learn from an existing example of this if there is one that I'm unaware of).

The idea of a subclass is interesting. However it is a lot more boilerplate to define a service, and its not as intuitive of an interface as just saying Service.add(Feature)


## Class: `ServiceExtension`

This class defines an actual extension itself. The extension is collection of hooks which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you chose an "inversion of control" pattern (hooks) over simple mutation API for Service? IOC is considered harder to reason about and by definition very "leaky" and prone to mistakes. Extensions can easily run over each other without knowledge.

I can see why this might be appealing at first glance but our experience shows that it's worth to actually expose appropriate degrees of freedom (mutation API) in order to allow the object being mutated to maintain it's invariants and perform validations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few reasons:

  • Not everything in the underlying L2 constructs is mutable. For example once a TaskDefinition is created it is no longer possible to modify the proxyConfiguration. That can only be set at the time of construction. This is an example of where immutability of certain L2 properties requires the Service class to have preknowledge of all settings prior to creating the underlying L2 resource. We can't just create the underlying L2 resources first and then let extensions change them after it is created, in part because it isn't possible for all properties.
  • The hook approach enforces and encourages best practices. For example the resolveContainerDependencies hook encourages all extension owners to think about how their extension's container relates to other extension containers and what other containers it might need to wait on during task start phase. With an unstructured approach of mutability there would be no gentle encouragement of an unimplemented hook waiting for the extension owner to fill out with their own logic. It would be too easy to implement an extension that is not fully featured and does not account for container dependencies. This applies to other hooks as well.
  • Inversion of control gives extensions a way to reason about what other extensions are added. If extensions are unaware of each other and just changing an underlying construct directly they only see each other's changes after the fact, and aren't fully aware that the other extension exists. This inversion of control enforces that all extensions have been added to the service first, and are able to know of each other prior to the extension's mutation logic running. Without inversion of control an extension can start freely mutating underlying resources prematurely, making decisions that do not account for the future addition of another extension that would otherwise have changed the behavior of the first extension. For that reason direct control would require users to reason about the order that they add extensions, vs inversion of control which allows users to add all their extensions in any order, and at service construction time every extension is able to access the full picture to make independent decisions at the time that each hook is run

To summarize I felt that inversion of control would provide an opinionated, structured approach that makes it easier to implement an extension that behaves correctly, and it would allow us to maintain immutability of the underlying construct resources. I'd love to chat more on this subject and see if I'm missing an easy way to solve these problems without inversion of control.

@davidbarsky
Copy link

Since @nathanpeck tweeted about this RFC asking for feedback, I'd like to voice my support for a more extensible L3 ECS modules. I'm a fan of this RFC and Elad's proposed feedback because this API introduces a far more gradual curve in complexity and expressiveness. I feel—hence this might be incorrect!—that there's a learning cliff with the current L3 ECS APIs, where things are great if I stay within the bounds of what is supported by API, but the moment I stray outside of those bounds, I need to drop down to the L2 APIs.

Don't get me wrong: designing an extensible and gradual API is extremely hard. It's something I've been struggling with over the past year in the libraries that I support!

App Mesh virtual service and virtual node to the service. Because the full feature
requires touching all of these different constructs and resulting resources it is not
possible to fully implement an extension system at the L2 level. For this reason the
existing `TaskDefinition.addExtension()` is not a viable alternative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly true. Yes, the interface of the IExtension itself should be narrow, but there's no need for the implementation to live in the L2 library that would restrict its dependencies.

The specific extensions themselves could totally be in a different library and reach out to AppMesh and whatnot.

resulting `Service` and `TaskDefinition`. Each `ServiceExtension` can implement the
following hooks:

- `addHooks()` - This hook is called after all the extensions are added to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to keep this API as small as possible. IF (big if) the constructs were mutable, then the useXxx() APIs would mostly be all that's required, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately many of the construct properties are not mutable. Questionable whether we want to make the L2's fully mutable or use a multi step process of generating the props first then creating?

version of each L3 construct. Ideally there would be a way for customers to create a
service which is agnostic to whether it is deployed on EC2 or Fargate.

## Tenants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏚

(tenets?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops😅

- __Service class is agnostic to EC2 vs Fargate__: Rather than choosing the capacity strategy
in the class name customers just create a service, and add it to an environment. The
service automatically modifies its settings as needed to adjust to the capacity available
in the environment.
Copy link
Contributor

@SoManyHs SoManyHs Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there certain extensions that would need to take into account restrictions in capacity/orchestration providers? Curious if that would be handled by one of the hooks, since IIRC the resolve hook only checks the other extensions and not anything else in the infrastructure setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is likely in the future that an extension could choose to deploy itself as either a sidecar or as a cluster daemon depending on whether the extension is being added to a Fargate or EC2 environment


public mutateServiceProps(props: ServiceBuild) {
return {
...props,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is convenient in TypeScript/JavaScript but will not translate to other languages well.

Something to keep in mind...

@mrgrain
Copy link
Contributor

mrgrain commented Oct 27, 2023

This ended up being separate package /~https://github.com/cdklabs/cdk-ecs-service-extensions

@mrgrain mrgrain closed this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants