Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud storage v2 API, add support for batch operations #2122

Open
martin-traverse opened this issue Jul 17, 2023 · 2 comments
Open

Cloud storage v2 API, add support for batch operations #2122

martin-traverse opened this issue Jul 17, 2023 · 2 comments
Labels
api: storage Issues related to the googleapis/java-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@martin-traverse
Copy link

Is your feature request related to a problem? Please describe.

The old storage API (the Storage class in the Java SDK) supports batch operations over http. Our solution is built using the new StorageClient for gRPC operations, because we really wanted client streaming for uploads. The one bit we can't do using the new StorageClient is batch operations, which we use for recursive delete of objects under a prefix. We can list the objects using StorageClient, but the batch delete has to be sent using the old Storage API and only works with http. Since the restriction also appies to the old API using the grpc() version, I assume this is because of a limitation in the gRPC API itself?

Describe the solution you'd like

Ideally it would be good to have batch operations on the new StorageClient API, similar to what exists in the old Storage API. For our solution we only care about batch deletes at present, although it makes sense more generally that batch capabilities that were needed before will still be needed.

What you want to happen

I'd like to see batch operations supported on StorageClient. I'm assuming there is a dependency on adding them in the underlying gRPC APIs, that is just an assumption though! Since I can do 95% of what I need with the new API, it is a shame to still create both clients. If I can use just hte new client, then I only need to worry about one set of resources, handle one set of errors etc.

Describe alternatives you've considered

For now I had to create a legacy Storage object as well as the new StorageClient, and I use the old API just for doing batch delete operations. This does work, but it's not great as a long term solution and means its not possible to provide a full solution on the new API.

Additional context

Our product is an open source data and analytics platform: /~https://github.com/finos/tracdap

The core platform is built on gRPC and Apache Arrow, using Netty as the transport. Our storage plugin for GCP sits on top of the same resources (event loops, allocators etc). We use client / server streaming to transfer data in pipelines where the format and size of data is not known in advance. Using the old Storage API would involve buffering and worker thread pools which we've managed to avoid elsewhere. The new StorageClient is great for us, because we've already built streaming pipelines on gRPC so we can just follow the same pattern.

I appreciate these APIs are very new, we started using them as soon as they came available! Still the results have been good for us so far. If we can get rid of the need to use the old API at all, that would be ideal.

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/java-storage API. label Jul 17, 2023
@BenWhitehead BenWhitehead added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jul 17, 2023
@BenWhitehead
Copy link
Collaborator

The storage v2 api, does not currently implement batch operations and its position on the roadmap does not have a public date associated with it.

@martin-traverse
Copy link
Author

Hi Ben - thanks again for the quick response. We can carry on using the dual approach for now. Hopefully at some future point batch calls get released in the v2 API , and we can simplify our code then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants