Best options for resolving the address of an activated/activating Grain? #9353

d-jagoda · 2025-02-16T21:33:08Z

Hi,

I have a messaging bridge that ingests message requests to grains and in response the grains could be doing heavy work. Having too many concurrent grain calls such as these can ultimately starve the system of resources (CPU). I have tested a feedback controller that adjusts concurrency based on input rate, throughput and CPU utilization during message ingestion and it works quite well as long as the controller is Silo aware. There is one controller per target silo, and I was relying on pre-emptive grain placement (Sending a message to the grain via a grain extension) to track where the current activation might be - unfortunately this is problematic because activating a grain may induce a resourcing bottleneck due to some code in OnActivateAsync method (for example initializing data from a REST API that doesn't scale well).

Given the following constraints:

Weak consistency is fine as long as the lag is small and large disruptive events that can invalidate grain addressed can be known (IClusterMembershipService for cluster changes, any other such disruptive events?)
I don't want to add an external grain directory or write a custom one

What are the best options for:

Resolving the grain address of an active grain (apart from IManagementGrain.GetActivationAddress)?
Resolve the potential address of an inactive grain (would it be bad to resolve and use the placement directors?)?

Many thanks,
DJ.

scalalang2 · 2025-02-17T00:16:48Z

I'm just asking out of curiousity, but why don't you use a placement strategy based on the power-of-two choices?

d-jagoda · 2025-02-17T11:06:22Z

I'm just asking out of curiousity, but why don't you use a placement strategy based on the power-of-two choices?

Placement isn't the problem. The problem is concurrent execution of grain calls. Lets say Some grain is performing a CPU intensive calculation that takes 200ms and if I contentiously send many such requests without managing flow control, the application will be overwhelmed and grain calls will timeout. Its the same with slow IO work, where a grain makes an IO call to a REST API or a database that doesn't scale well. The resource that cannot scale will eventually become overwhelmed and grain calls will timeout. If it was CPU, the application responsiveness will drop and will eventually be killed by a monitoring applications - a kubernetes liveness probe for example.

scalalang2 · 2025-02-17T13:08:21Z

I do not have full understanding of what problem you're solving now.
So what I'm saying now is may not fully applicable for you.

Here are several things I have thought.

You can monitor incoming calls to grains by using IncomingGrainCallFilter. If you implement a logic here to throttle subsequent requests when traffic overflows, it seems the desired goal can be achieved.
Use AlwaysInterleave attribute.
Requests that are sent to single grain is queued by default, which means it sequentially handles your request.
You can scale operations by attaching AlwaysInterleave attribute If the grain works by stateless
If the Grain itself is very very hot data, Orleans might not have been the best choice to begin with. Orleans is suitable for scenarios where you need to perform relatively small computations but with an extremely large number of actors.

ledjon-behluli · 2025-02-19T11:10:39Z

@d-jagoda might this help you out? /~https://github.com/ReubenBond/DistributedRateLimiting.Orleans

d-jagoda · 2025-02-22T13:20:29Z

@ledjon-behluli thanks for the suggestions. It is the right type of idea, however in my case, rate limiting will be based on the cluster's ability to process requests without degrading it's performance. The limits would self tuning - much like the threadpool, and will respond to elasticity of the cluster (when auto scaling helps). In the simpler scenarios having access to the grain directory in order to resolve the Silo address of a grain would help. In more complex scenarios - where one grain calls another grain, I would have to profile the request path to access the full impact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best options for resolving the address of an activated/activating Grain? #9353

Best options for resolving the address of an activated/activating Grain? #9353

d-jagoda commented Feb 16, 2025

scalalang2 commented Feb 17, 2025

d-jagoda commented Feb 17, 2025

scalalang2 commented Feb 17, 2025

ledjon-behluli commented Feb 19, 2025

d-jagoda commented Feb 22, 2025

Best options for resolving the address of an activated/activating Grain? #9353

Best options for resolving the address of an activated/activating Grain? #9353

Comments

d-jagoda commented Feb 16, 2025

scalalang2 commented Feb 17, 2025

d-jagoda commented Feb 17, 2025

scalalang2 commented Feb 17, 2025

ledjon-behluli commented Feb 19, 2025

d-jagoda commented Feb 22, 2025