-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement retry for user deletion #50
Implement retry for user deletion #50
Conversation
639f955
to
e1a664b
Compare
I was passing by out of curiosity and have the following question/observation. (not a critique, but curious to get your opinion). Short version: Long version: |
Hello @tsologub, thanks for giving out your opinion. For me, this need to be implemented at the SDK level for couple reasons:
I think you pointed out the real problem we both encountered. On most cases, operations are not spread on Azure read replicas quick enough or at least the API is not making sure that they have been properly replicated before responding. What is really awkward to me is that if you execute a More generally, I think we need to have the choice to implement a client with or without a retry mechanism so that it will cover most use cases. I would like to here @manicminer 's take on this since this SDK is developed by him and maybe he does have a better idea. |
Unfortunately or luckily, depends on how you look, the Azure AD team went with the eventual consistency model instead of strong consistency.
They will be spread eventually.
This would call strong consistency. And every interaction would work much slower. i.e., AD would have to ensure that every read replica gets an update before responding to the client.
For that node, where your request landed, the write was successful; that is why AD responds as 200.
I believe it is not the API level problem; it is Microsoft's deliberate decision. To achieve this, Microsoft must completely redesign the Active Directory. I am getting your point. It would be nice to have such retries on the SDK level, especially in the GET scenarios. i.e., if the first response is 404, try a couple of times more. But I can't imagine how this could be implemented for the GET but with lists. An example. I have a group, I delete this group, and now I want to list all deleted groups. Will my deleted group be on that list? - It depends on which node my Maybe the retries that you are talking about don't have to be implemented for every GET, only for some. I don't know, simply thinking out loud. |
Where it's feasible, I would like to handle these kinds of consistency issues in the SDK. We might not catch everything but if we can absorb the most common of these I think it will still improve the experience for app developers. I was hoping the I've been doing some experimenting with implicit request retrying and have had some success. It seems safe to retry on 404, and for 400 responses where the odata error is understood with confidence, e.g. when creating a service principal, if the related application is very fresh and not yet found by the node receiving the SP creation, a 400 with a distinctive odata error can be observed. I'm hoping this can be built in to the base client and pending some more testing, I may have an implementation with safe defaults (i.e. no retrying unless opted in) and also allowing top level clients to decide what's retryable and what isn't. |
@romainDavaze I'm going to go ahead and close this one now that #57 is merged. Many thanks for the idea and inspiration 🙇 |
Implementing retry for user deletion due to Graph API inconsistency when deleting a user right after creating it.
The API will return a 200 status code on the GET request indicating that the user has been created. However if you call the DELETE endpoint right after, it will return a 404 because it did not have time to spread user creation on API nodes.
After discussion with @manicminer, implementing retry here seems to be the cleaner way to handle this.
Maybe we should try to implement a proper retry mechanism in the future in the performRequest function as it only handles retries on rate-limiting related status codes for now.