Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add simple actor telemetry #6294

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions src/core/Akka.API.Tests/CoreAPISpec.ApproveCore.verified.txt
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,13 @@ namespace Akka.Actor
public static readonly Akka.Actor.IActorRef NoSender;
public static readonly Akka.Actor.Nobody Nobody;
}
public struct ActorRestarted : Akka.Actor.IActorTelemetryEvent, Akka.Actor.INoSerializationVerificationNeeded, Akka.Actor.INotInfluenceReceiveTimeout
{
public ActorRestarted(Akka.Actor.IActorRef subject, System.Type actorType, Akka.Actor.Reason reason) { }
public System.Type ActorType { get; }
public Akka.Actor.Reason Reason { get; }
public Akka.Actor.IActorRef Subject { get; }
}
public class ActorSelection : Akka.Actor.ICanTell
{
public ActorSelection() { }
Expand All @@ -340,13 +347,25 @@ namespace Akka.Actor
public Akka.Actor.ActorSelectionMessage Copy(object message = null, Akka.Actor.SelectionPathElement[] elements = null, System.Nullable<bool> wildCardFanOut = null) { }
public override string ToString() { }
}
public struct ActorStarted : Akka.Actor.IActorTelemetryEvent, Akka.Actor.INoSerializationVerificationNeeded, Akka.Actor.INotInfluenceReceiveTimeout
{
public ActorStarted(Akka.Actor.IActorRef subject, System.Type actorType) { }
public System.Type ActorType { get; }
public Akka.Actor.IActorRef Subject { get; }
}
public class ActorStashPlugin : Akka.Actor.ActorProducerPluginBase
{
public ActorStashPlugin() { }
public override void AfterIncarnated(Akka.Actor.ActorBase actor, Akka.Actor.IActorContext context) { }
public override void BeforeIncarnated(Akka.Actor.ActorBase actor, Akka.Actor.IActorContext context) { }
public override bool CanBeAppliedTo(System.Type actorType) { }
}
public struct ActorStopped : Akka.Actor.IActorTelemetryEvent, Akka.Actor.INoSerializationVerificationNeeded, Akka.Actor.INotInfluenceReceiveTimeout
{
public ActorStopped(Akka.Actor.IActorRef subject, System.Type actorType) { }
public System.Type ActorType { get; }
public Akka.Actor.IActorRef Subject { get; }
}
public abstract class ActorSystem : Akka.Actor.IActorRefFactory, System.IDisposable
{
protected ActorSystem() { }
Expand Down Expand Up @@ -988,6 +1007,11 @@ namespace Akka.Actor
{
Akka.Actor.IStash Stash { get; set; }
}
public interface IActorTelemetryEvent : Akka.Actor.INoSerializationVerificationNeeded, Akka.Actor.INotInfluenceReceiveTimeout
{
System.Type ActorType { get; }
Akka.Actor.IActorRef Subject { get; }
}
public interface IAdvancedScheduler : Akka.Actor.IActionScheduler, Akka.Actor.IRunnableScheduler { }
public interface IAutoReceivedMessage { }
public interface ICanTell
Expand Down Expand Up @@ -1486,6 +1510,16 @@ namespace Akka.Actor
public static readonly Akka.Actor.ProviderSelection.Remote Instance;
}
}
public sealed class Reason
{
public Reason(string type, string message = null) { }
public string Message { get; }
public string Type { get; }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public guid Error property is missing.
It would make it possible to tag and serialize/log exceptions into elasticsearch, seq, opentelemetry, Redis or simply into a S3 bucket and only transmit this Reason to remote nodes.
If needed the receiving remote system could try restore/deserialize the exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of Guid Error a property string Url would be better.
It will allow to have many/mix of error service types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a log entry for this? These are just meant to be simple metric updates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should just include the full Exception payload in here instead and get rid of the Reason. We don't / can't use Reason for stops (no way to tell why we're terminating) and I mentioned in the spec here that 100% of these events are meant to be consumed in-process via the EventStream. You can always transform / project them into something else later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Reason is need to transfer unknown Exception types between nodes transparently.
Has not much to to with the IActorTelemetryEvent them self.

Actor.Remote creates the assumption that all remoting is transparent.
But if a node transmit a unknown exception type to a node it fails horrible.

The general idea is to put the Èxception into memory cache or distributed storage/cache
then can then be loaded when really needed,
to get from Reason => Exception again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design, these events are not designed to be transmitted over the wire - local aggregation only. We don't want this data coupled to remoting, serialization, or persistence. "Simple metrics" is the goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking only to make the Reason type multi purpose that it can be in general used instead of an exception.
The remote transparency has this intrinsic conflict with the Exception dotnet type.
The MS guys for system.runtime.remoting in framework 1.1 times already discovered it.

public class static ReasonExtensions
{
public static Akka.Actor.Reason ToReason(this System.Exception ex) { }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extensions method should be on the ActorContext or ActorSystem,
to make it possible to retrieve a new AkkaErrorService from Extensions list.
The Default implementation can simply convert the Exception to a Reason with Error = Guid.Empty

Copy link
Member Author

@Aaronontheweb Aaronontheweb Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are only produced within the ActorCell and it's performed automatically by Akka.NET - meaning there shouldn't be a reason to produce these from the outside.

In fact, that makes me wonder if I should be marking the constructors as internal....

public delegate bool Receive(object message);
public abstract class ReceiveActor : Akka.Actor.UntypedActor, Akka.Actor.Internal.IInitializableActor
{
Expand Down Expand Up @@ -1682,6 +1716,7 @@ namespace Akka.Actor
public bool DebugRouterMisconfiguration { get; }
public bool DebugUnhandledMessage { get; }
public int DefaultVirtualNodesFactor { get; }
public bool EmitActorTelemetry { get; }
public bool FsmDebugEvent { get; }
public bool HasCluster { get; }
public string Home { get; }
Expand Down
150 changes: 150 additions & 0 deletions src/core/Akka.Remote.Tests/RemoteActorTelemetrySpecs.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
//-----------------------------------------------------------------------
// <copyright file="RemoteActorTelemetrySpecs.cs" company="Akka.NET Project">
// Copyright (C) 2009-2022 Lightbend Inc. <http://www.lightbend.com>
// Copyright (C) 2013-2022 .NET Foundation </~https://github.com/akkadotnet/akka.net>
// </copyright>
//-----------------------------------------------------------------------

using System;
using System.Threading.Tasks;
using Akka.Actor;
using Akka.TestKit;
using Akka.TestKit.TestActors;
using Akka.Util.Internal;
using Xunit;
using Xunit.Abstractions;

namespace Akka.Remote.Tests
{
public class RemoteActorTelemetrySpecs : AkkaSpec
{
// create HOCON configuraiton that enables telemetry and Akka.Remote
private static readonly string Config = @"
akka {
actor {
provider = ""Akka.Remote.RemoteActorRefProvider, Akka.Remote""
telemetry.enabled = true
}
remote {
log-remote-lifecycle-events = on
dot-netty.tcp {
port = 0
hostname = localhost
}
}
}";

public RemoteActorTelemetrySpecs(ITestOutputHelper outputHelper) : base(Config, outputHelper)
{

}

private class TelemetrySubscriber : ReceiveActor
{
// keep track of integer counters for each event type
private int _actorCreated;
private int _actorStopped;
private int _actorRestarted;

// create a message type that will send the current values of all counters
public sealed class GetTelemetry
{
public int ActorCreated { get; }
public int ActorStopped { get; }
public int ActorRestarted { get; }

public GetTelemetry(int actorCreated, int actorStopped, int actorRestarted)
{
ActorCreated = actorCreated;
ActorStopped = actorStopped;
ActorRestarted = actorRestarted;
}
}

public class GetTelemetryRequest
{
// make singleton
public static readonly GetTelemetryRequest Instance = new GetTelemetryRequest();

private GetTelemetryRequest()
{
}
}

public TelemetrySubscriber()
{
// Receive each type of IActorTelemetryEvent
Receive<ActorStarted>(e => { _actorCreated++; });
Receive<ActorStopped>(e => { _actorStopped++; });
Receive<ActorRestarted>(e => { _actorRestarted++; });
// receive a request for current counter values and return a GetTelemetry result
Receive<GetTelemetryRequest>(e =>
Sender.Tell(new GetTelemetry(_actorCreated, _actorStopped, _actorRestarted)));
}

protected override void PreStart()
{
Context.System.EventStream.Subscribe(Self, typeof(IActorTelemetryEvent));
}
}

// create a unit test where a second ActorSystem connects to Sys and receives an IActorRef from Sys and subscribes to Telemetry events
[Fact]
public async Task RemoteActorRefs_should_not_produce_telemetry()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any IActorRef without an ActorCell will not produce metrics (by design) - so FutureActorRefs from Ask<T> and RemoteActorRefs don't influence these totals.

{
// create a second ActorSystem that connects to Sys
var system2 = ActorSystem.Create(Sys.Name, Sys.Settings.Config);
try
{
// create a subscriber to receive telemetry events
var subscriber = system2.ActorOf(Props.Create<TelemetrySubscriber>());

// send a request for the current telemetry counters
var telemetry = await subscriber
.Ask<TelemetrySubscriber.GetTelemetry>(TelemetrySubscriber.GetTelemetryRequest.Instance);

// verify that the counters are all correct
Assert.Equal(0, telemetry.ActorCreated);
Assert.Equal(0, telemetry.ActorStopped);
Assert.Equal(0, telemetry.ActorRestarted);

// create an actor in Sys
var actor1 = Sys.ActorOf(BlackHoleActor.Props, "actor1");

// resolve the currently bound Akka.Remote address for Sys
var address = Sys.AsInstanceOf<ExtendedActorSystem>().Provider.DefaultAddress;

// create a RootActorPath for actor1 that uses the previous address value
var actor1Path = new RootActorPath(address) / "user" / "actor1";

// have system2 send a request to actor1 via Akka.Remote
var actor2 = await system2.ActorSelection(actor1Path).ResolveOne(RemainingOrDefault);

// send a request for the current telemetry counters
telemetry = await subscriber
.Ask<TelemetrySubscriber.GetTelemetry>(TelemetrySubscriber.GetTelemetryRequest.Instance);

// verify that created actors is greater than 1
var previouslyCreated = telemetry.ActorCreated;
Assert.True(previouslyCreated > 1); // should have had some /system actors started as well
Assert.Equal(0, telemetry.ActorStopped);
Assert.Equal(0, telemetry.ActorRestarted);

// stop the actor in Sys
Sys.Stop(actor1);

// send a request for the current telemetry counters
telemetry = await subscriber
.Ask<TelemetrySubscriber.GetTelemetry>(TelemetrySubscriber.GetTelemetryRequest.Instance);
// verify that the counters are all zero
Assert.Equal(previouslyCreated, telemetry.ActorCreated); // should not have changed
Assert.Equal(0, telemetry.ActorStopped);
Assert.Equal(0, telemetry.ActorRestarted);
}
finally
{
Shutdown(system2);
}
}
}
}
Loading