Skip to content
This repository has been archived by the owner on Mar 16, 2022. It is now read-only.

Formal spec #119

Closed
wants to merge 19 commits into from
Closed

Formal spec #119

wants to merge 19 commits into from

Conversation

viktorklang
Copy link
Contributor

As discussed in this week's contributor's call, we need to start iterating on the specification.

The proposed methodology for making progress is to make comments and suggested edits on this draft PR, then when it is in an acceptable state, to modify the TCK to start to "enforce" the spec.

@viktorklang viktorklang added user platform Issues related to the different user target platforms backend platform Issues related to the backend platform labels Oct 2, 2019
build.sbt Outdated
`operator`,
`tck`,
`docs`
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include protocols,?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ignasi35 Not yet, that's: #118

@@ -12,23 +12,42 @@
// See the License for the specific language governing permissions and
// limitations under the License.

// gRPC interface for Event Sourced Entity user functions.

// gRPC definitions for Entity Discovery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove leading space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

repeated string supported_entity_types = 5;
}

// Entity discovery service.
//The *EntityDiscovery* Service is the entrypoint for CloudState Proxies
Copy link
Contributor

@marcellanz marcellanz Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punctuation at the end of the sentence.
EntityDiscovery is bold here, whereas messages above are not. Is that intentional?

(I like the consistent naming used here of the message names/services that follow the spec-comment; also punctuation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a different idea for Services vs Messages, I'm not going to die on that hill though :)

// - *MUST* yield a failure if the provided `ProxyInfo` is not deemed compatible with the User Function
// - `reportError`
// - *MUST* only be invoked to inform a User Function about violations of the CloudState Specification
// - *MUST* only receive failures not originating from invoking `reportError`, since that would cause a crash-loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use punctuation consitently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should. Let's make sure that the text is correct first tho :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure :-)

// - `id` TODO
// - `name` TODO
// - `payload` TODO
// - `streamed` TODO
message Command {

// The ID of the entity.
string entity_id = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does Command carry the entity_id where an EventSourcedEvent does not (or vice-versa)? The entity_id is established through the EventSourcedInit message and seems to be stable for a EventSourcedStreamIn session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz I believe this is due to the fact that Commands get redistributed amongst the CloudState Proxies based on the entity_key. @jroper should be able to sanity-check my response :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Marcel that the entity_id should be stable to the all EventSourcedStreamIn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @jroper for historical context :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document somewhere if we don't support client-side streaming RPCs?
I've re-read the CRDT spec, and I think we don't, yes?

//
// - An `Failure` is considered to be valid iff:
// - `command_id` TODO
// - `description` TODO
message Failure {

// The id of the command being replied to. Must match the input command.
Copy link
Contributor

@marcellanz marcellanz Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might mention that the command_id is optional and has not be to be set (leaving it with its default value of 0) if a Failure is sent as a reply not according to an input command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 166 states that "Failure represents a Command processing failure reply." In this sense, this Failure Message can't be used for anything not command related. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz Good point, worth clarifying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure is used in two places.

The first place is as part of ClientAction. When sent as a client action, it indicates an error that should be sent to the client - this is an intentional, explicit error message that the user function wants to communicate with the client. In that case, command id is actually irrelevant, since it's already included in a reply message that contains the command id. Typically, this will be used for errors made by the client, for example, commands that failed validation, and could be thought of as an HTTP 4xx error.

The second place that Failure is used is by itself, as a top level message in the oneof for a stream out for the event sourcing, crdt, etc protocols. In this case, it's used for server errors, logically equivalent to HTTP 5xx errors, whose message should not be communicated back to the client (since the client hasn't done anything wrong and can't do anything to rectify, so the error message will be meaningless to the client, plus leaking error messages from arbitrary exceptions can be a security issue), rather a server error status code with a generic error message is sent, but the error will be logged by the proxy for diagnostic purposes. These may or may not have a command id. For example, if the users code throws an unexpected exception, or does something illegal according to the support library, this will be used, and if that occurred during the processing of a command, it will have a command id. However, if the error occurred outside of command handling, it won't have a command id. An example of where we currently send it without a command id is if the proxy sends two init messages (that's not allowed) the user function support library responds with this failure message and no command id. When this is sent, the user function and the proxy are both expected to immediately close the stream after sending/receiving it, since it's not safe to make any assumptions about the state of the stream, the state of the other end of the stream, and any local state related to the stream, when such an error occurs.

Does it make sense to use the same message for both purposes? Maybe, maybe not.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great explanations, if this is well documented I don't see why segregate into more messages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jroper thanks for the explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is sent, the user function and the proxy are both expected to immediately close the stream after sending/receiving it, since it's not safe to make any assumptions about the state of the stream, the state of the other end of the stream, and any local state related to the stream, when such an error occurs.

This would be similar to a ClientAction.Failure? After such an intentional error, we can't make any assumptions about the local state, and probably too would close the stream and let the proxy re-establish the local state if he likes to.

int64 id = 2;
}

// A failure reply. If this is returned, it will be translated into a gRPC unknown
// error with the corresponding description if supplied.
//Failure represents a Command processing failure reply.
Copy link
Contributor

@marcellanz marcellanz Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the state of the User Function after such a Failure was sent? Is this a fatal situation (not recoverable)? Should the user function be "passivated"...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz Yes, that's a great question. We should outline that on the service, rather than the message, IMO. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang sure, that sounds very reasonable.

// message. The entity should reply in order, and any events that the entity requests to be
// persisted the entity should handle itself, applying them to its own state, as if they had
// arrived as events when the event stream was being replayed on load.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity is expected to reply to each command with exactly one reply message.

What is the state of the event sourced entity after a failure was sent as aEventSourcedStreamOut.Failure?
Is this a fatal situation (not recoverable)? Should the entity instance be "passivated"...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz Good question—what does the proxy do in this case right now? (Haven't got access to my codebase right now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang yeah, could also check. The question is rather old and its here to mark a point that it could be described here for protocol semantics.

Copy link
Contributor

@marcellanz marcellanz Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang may I describe here, in the spec, what the state of the entity is, according to the current proxy implementation? Otherwise its as I wrote a question to be potentially answered by yours or James.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz Yeah, that's a good idea, to provide a foundation for formalization. Thank you! :)

Reply reply = 1;

// Forward to another entity
// Forward to another entity.
Forward forward = 2;

// Send a failure to the client
Failure failure = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How differs a Failure here and one returned as a EventSourcedStreamOut.Failure? What is the state of the event sourced entity instance after such a failure. (See also a related question on service EventSourced)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz My gut feeling is that Failure at this level is about failure to process a command, whereas failure at the EventSourcedStreamOut.Failure would be a failure not necessarily linked to a specific command. @jroper, wdyt?

rpc discover(ProxyInfo) returns (EntitySpec) {}

// Lets a CloudState Proxy inform a User Function about specification violations.
rpc reportError(UserFunctionError) returns (google.protobuf.Empty) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what state would a user function be after such an error? If the implementation would not be compliant with violations of the CloudState Specification how should it act after this.

Also, such an error would not have any context at all. So does "inform" mean, that the proxy would passivate this entity? See also the discussion in go-support /~https://github.com/cloudstateio/go-support/pull/5/files#r337442535

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point, @marcellanz. We should spec the intended sequence of actions after reportError has been called. It would seem like it is a fatal error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang I have the impression, the proxy should passivate the user function, and re-initialise the entity. The user function might behave the same the second time and the proxy might re-try a few times. I don't know what's the proxies strategy here. Does the proxy force-restart user functions completely?

reportError has a symmetry with a protocol.Failure here, where the user function can decide how to proceed. It can report the failure and then close the stream and go on. It could even panic/os.exit(70) itself and get be re-started.

It seems on both occasions, on reportError and protocol.Failure, the other side only can log what it got reported, but the error detecting sending party knows best how to proceed having the error and its context at hand.

For the Go and its panic semantics I noted and got back to this part of the spec.

// there are two ways to do it
// a) report and close the stream and keep others running
// b) report and panic and therefore crash the program
// how can we decide that the panic keeps the user function in
// a consistent state. this one occasion could be perfectly ok
// to crash, but thousands of others kept running. why get them all down?
// so then, there is the presumption that a panic is truly exceptional
// and we can't be sure about anything to be safe after a panic.
// The proxy is well prepared for this, it is able to re-establish 
// and also isolate the erroneous entity type from others after failures.

It would be interesting to know what is the intended flow of events in such a scenario. The simpler the better perhaps. As the user functions gets its state from the proxy, the proxy also should know best what to do.

(note: reportError is not called reportFailure)

@@ -164,27 +272,69 @@ message Entity {
string persistence_id = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/~https://github.com/cloudstateio/cloudstate/blame/master/docs/src/main/paradox/user/lang/java/eventsourced.md#L9 states that the persistence_id is optional:

It defaults to the simple name for the entity class (in this case, ShoppingCartEntity), 
it’s good practice to select one explicitly, this means your database isn’t depend 
on classnames in your code.

go-support has it implemented as optional but there are open questions if that is appropriate, see: /~https://github.com/cloudstateio/go-support/pull/5/files#r335443726

message CrdtStreamIn {

oneof message {

// Always sent first, and only once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the support library supposed to do in this situation? Send back a Failure (Init has no command id, so that could not correlated), close the stream, send back an error via gRPC service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All protocol errors should be reported back to the proxy by sending a Failure followed by closing the stream.

The command id should be left unset, the failure is not in relation to a command, it's in relation to the stream, and since it's sent down the same stream that the protocol error occurred on, it's implicitly correlated.

Also, note that the purpose of correlating errors to commands is so that errors can be sent back to the client that initiated the command. But in the case of a protocol error, the thing that made the error is the proxy, not a client of the proxy. There's nothing to correlate, the proxy just needs to be told you've done the wrong thing, so it can log it, so that a human can manually come, see the log message, debug, and fix the bug in the proxy (or fix the bug in the user function support library that incorrectly identified a protocol error).

@@ -51,20 +56,41 @@ option go_package = "cloudstate/protocol";
// Special care must be taken when working with maps and sets. The keys/values are google.protobuf.Any, which encodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and may subsequently be terminated if that entity becomes idle, or if the entity is deleted.

How does an entity become idle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be kept alive and used to handle multiple commands, and may subsequently be terminated if that entity becomes idle, or if the entity is deleted.

The entity being "terminated" and "a terminated handle stream" do not mean the same, yes?

Shutdown is typically done for efficiency reasons, unless the entity is explicitly deleted, a terminated handle stream does not mean the proxy has stopped tracking the state of the entity in its memory.

If the handle stream is terminated, what state does the entity have when it's not explicitly deleted? Can a proxy re-establish the handle stream, having an entity still not be deleted? and therefore send a non CrdtInit message over the freshly initiated handle stream (what would run against the obligation to have a CrdtInit always be the first message on a fresh handle stream).

In the Java/Scala implementation I see a context can go inactive, is that an entity being idle?
Perhaps we could clarify the terms "idle", "terminated entity", "terminated stream", "shutdown" and "deleted" and their implications to the state of an entity to be managed by a user support library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm writing some functional tests and I come back to these points, @jroper @pvlugter wdyt here?

See also below the question regarding CrdtDelete, if we don't close a stream without a failure or reason, what state would the stream be if the proxy instructed the user function to delete the CRDT?

A proxy may decide to terminate the stream after sending this.

If the proxy does not decide to terminate the stream, what messages is the user function allowed to receive after a CRDT was deleted? A State message or none?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go Support (v0.2.x) supports the stream to be re-used, clears its state completely and would allow a new init message.

@@ -92,8 +118,20 @@ message CrdtStreamIn {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the CrdtDelta message named changed and not delta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz Good question. Arguably it should then be change rather than changed in any case. :) I have no real preference here, especially as names shouldn't matter in proto :D

Copy link
Contributor

@marcellanz marcellanz Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang There could be a good reason in the domain of CRDTs which I don't know well. I have no preference too, but would expect to have them named consistent, as the other ones are.

CrdtInit init = 1;
CrdtState state = 2;
CrdtDelta changed = 3;
CrdtDelete deleted = 4;
Command command = 5;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picky is good :)

// - `delete` TODO
// - `write_consistency` TODO
//
// FIXME set the ordinals to start at 1?
message CrdtStateAction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through the CRDT implementation, I found we could perhaps describe more clearly how the proxy gets the initial state of a CRDT within the user library. Its written implicitly I think, put trying to implement the protocol by only the spec, I had to guess (lightly).

If I understand it correctly, the very first instance, the value of it, of a CRDT from a registered entity gets to know to the proxy by a CrdtStateAction being part of a CrdtReply in response to a CrdtStreamIn.Command as the proxy never can know of a CRDT before and with the first CrdtInit message.

@@ -92,8 +118,20 @@ message CrdtStreamIn {
}
Copy link
Contributor

@marcellanz marcellanz Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Delete the entity. May be sent at any time. The user function should clear its state when it receives this."
What does "clear its state" mean? deleted? Is the proxy allowed to send ever again a message for this entity? and if not, should the support library detect that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding a deleted CRDT entity and the spec saying:

Delete the entity. May be sent at any time. The user function should clear its state when it receives this.
A proxy may decide to terminate the stream after sending this.

the Go Support clears any state for an entity and in theory, would allow a new entity to be initialised on that cleared stream. The stream, therefore, will not be terminated for a deleted entity.

message CrdtStreamOut {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not entirely clear what should be replied for a non command incoming message, solely by reading the spec.

CrdtReply is commented with "A reply to an incoming command." Any of the CrdtStreamIn not beeing the Command message are not command messages, right?
In the Crdt service it's stated that: "The user function must respond with one reply per command in." so this applies to Command messages only, yes?

// as well as deltas as they arrive, and the entire state if either the entity is created, or the proxy wishes the
// user function to replace its entire state.
// - TODO: The user function must respond with one reply per command in. They do not necessarily have to be sent in the same
// order that the commands were sent, the command ID is used to correlate commands to replies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the order of replies can go out of order? This would imply that command replies can stream back concurrently and therefore possibly out of order to their arrivals sequence, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jroper wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or this is meant to be the "command ID is used to correlate commands to replies and messages", the ones generated by subscription streaming messages, no? A Reply is here then both a CrdtReply or a CrdtStreamedMessage. And the order is relevant because these, and implementation specific here solely, depends how subscription callbacks where registered.

The only other reason to mention that could be, that commands theoretically get handled concurrently, or, they're allowed to.

@viktorklang @jroper … would you like to give me a hint here? I find nothing in the implementation that would support the latter paragraph. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz Hmmm, good question. I would presume that @jroper would be more in tune with this than I am at the moment. What does the Proxy currently do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viktorklang thanks, good hint. so far I intentionally did not read what the proxy does but guided the implementation on the spec, documentation and scala/node support for it, and therefore forgot I could :).
I'm still interested in @jroper's intent on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol supports commands being handled concurrently, the implementation of the proxy only issues one command at a time to crdts and event sourced entities, though i think may issue concurrent commands to stateless "entities".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, streamed messages get correlated to the original command by the command id.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that stateless entities can, and should, support parallel calls. Unless the developers assumes sequential processing and mistakenly considers this to be a premise in their function logic. Therefore, good documentation can guide developers to keep parallelism in mind.

Copy link
Contributor

@marcellanz marcellanz Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jroper thanks.

The protocol supports commands being handled concurrently

I found state changes can get handled concurrently and their resulting streamed messages therefore be sent to the proxy out of order. This might be useful if an entity has many streamed commands running, where their user functions take enough time to handle the change related to the time it takes to send their resulting streamed messages back to the proxy.

I'm not sure how much concurrent handling of state changes can help since sending actions and effects have to be serialised when sent down the grpc stream and I'm not sure if there are use cases worth it.

Spec: They do not necessarily have to be sent in the same order that the commands were sent

I do realise re-reading this sentence not even has to imply concurrent processing of state changes, but the fact that the order of handling of state changes (EntityRunner#notifySubscribers in the Scala implementation) related to the command sequence does not have to be in the same order.

import "google/protobuf/any.proto";
import "cloudstate/entity.proto";

option java_package = "io.cloudstate.protocol";
option go_package = "cloudstate/protocol";

// CRDT Protocol
//The *Crdt* Service is the entrypoint for CRDT-based User Function entities.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will * Crdt* be the final name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely, but we haven't settled on a good enough replacement yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hepin1989 Follow the discussion here #144

@cloudstateio cloudstateio deleted a comment from viktorklang Apr 4, 2020
LOCAL = 0;
// TODO
MAJORITY = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJORITY with +N level?

//`Specification`:
//
// - A `PNCounterState` is considered to be valid iff:
// - `value` TODO
message PNCounterState {

// The current value of the counter.
int64 value = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signed integers, if the value is to be expected negative, can be more efficient encoded regarding its size. A small negative int64 is encoded always 10 bytes long. We could use an sint64 instead of an int64 here I think, the same as we do for the delta. And we could be more explicit about the intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcellanz sint64 seems like a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context from another discussion: #353 (comment) where its not advised to use unsigned in (public) API's.

//`Specification`:
//
// - A `ServiceInfo` is considered to be valid iff:
// - `service_name` TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should perhaps stated how this service_name differs or is similar or the same from any registered cloudstate.Entity.service_name where it is "The name of the service to load from the protobuf file".

If a service info can hold multiple entities, how do their service_names relate to the service infos service_name. As it is stated that all ServiceInfo is optional, it might not matter at all but it might help to clarify that.

// message. The entity should reply in order, and any events that the entity requests to be
// persisted the entity should handle itself, applying them to its own state, as if they had
// arrived as events when the event stream was being replayed on load.
// TODO
rpc handle(stream EventSourcedStreamIn) returns (stream EventSourcedStreamOut) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beside sending a protocol.Failure and then closing the stream, is a user function allowed to close a stream without a failure sent before and the user function indicating a non-failure reason for the stream closed by a gRPC status code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it SHOULD NOT do this, as opposed to MUST NOT (rfc2119 notation). There's always going to be situations where the user function may close the stream without sending a failure - eg, if the process crashes. But the problem with allowing such a scenario is if the proxy has a command to send to it, just before it receives the stream closed signal, that command is now going to fail when it tries to send. It's the proxy that decides when entities are active or not.

If we wanted, we could introduce a way for user functions to initiate the passivation of entities, that's pretty straight forward, however, closing the stream won't cut it, first the user function has to send a message to request the proxy to passivate, then the proxy can start its passivation sequence, which is to first drain its queue of existing commands, then start buffering any new commands that come in, and then passivate the entity, and finally if any new commands did come in while the entity was being passivated, restarting it to handle those buffered commands. Simply closing the stream is not enough.

oneof action {
// TODO
CrdtState create = 5;
Copy link
Contributor

@marcellanz marcellanz Sep 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be defined why the proxy does not expect a CrdtState Create message when a CRDT was initialised as a "new" CRDT and has no Delta after a command was handled.

message VoteState {

// The number of votes for
// The number of votes for.
uint32 votes_for = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is votes_for and total_voters of type uint32?

// - *MUST* be invoked with a *valid* `ProxyInfo`
// - *MUST* be *idempotent*
// - *MAY* be called any number of times by its CloudState Proxy
// - *MUST* yield a failure if the provided `ProxyInfo` is not deemed compatible with the User Function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we start trying out versioning we have to define here how to "yield a failure". I think we can't by protocol?

// - *MAY* be called any number of times by its CloudState Proxy
// - *MUST* yield a failure if the provided `ProxyInfo` is not deemed compatible with the User Function
// - `reportError`
// - *MUST* only be invoked to inform a User Function about violations of the CloudState Specification

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the reportError return to report a version mismatch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing some parts of the code I noticed that the only way to return an incompatibility error today would be to throw an exception. Forget what I said about reportError as it obviously wouldn't work.
We may have to add more information in ServiceInfo

Copy link

@sleipnir sleipnir Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or transfer the responsibility to the Proxy that he can use the reportError, reversing the validation flow. The user function should only report the supported proxy versions and it would be up to the proxy to decide whether or not to continue with the communication flow. Someday somewhere I mentioned that the operator should call Discover because that way he could inject a proxy compatible with the version declared by the user function. Of course, in either case (Proxy or Operator) EntitySpec should have an attribute that declares the versions supported by the user's function.

@sleipnir
Copy link

Hi @ALL in cloudstateio/python-support#42 (comment) discussion we need to make sure that stateless functions have no associated persistence id or command id. I believe that neither of them is necessary, or even desired (bigger doubt about the command id), but it would be interesting to have this formalized here.

@marcellanz
Copy link
Contributor

Hi @ALL in cloudstateio/python-support#42 (comment) discussion we need to make sure that stateless functions have no associated persistence id or command id. I believe that neither of them is necessary, or even desired (bigger doubt about the command id), but it would be interesting to have this formalized here.

@sleipnir I propose to place the comment too in the entity.proto file PRed here at the persistence_id field so it doesn't get lost. Thi PR is a container for all things getting the Formal Spec and when worked on should be clarified, fixed, evolved, discussed or abandoned. Mentioned here again that the spec, not the implementation should clarify: cloudstateio/python-support#42 (comment)

@sleipnir
Copy link

@marcellanz I agree. Even many questions raised here I think have been lost. It would be good to bring this PR into the minutes of meetings

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend platform Issues related to the backend platform user platform Issues related to the different user target platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants