Cloud storage v2 API, add support for batch operations #2122
Labels
api: storage
Issues related to the googleapis/java-storage API.
priority: p2
Moderately-important priority. Fix may not be included in next release.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
Milestone
Is your feature request related to a problem? Please describe.
The old storage API (the Storage class in the Java SDK) supports batch operations over http. Our solution is built using the new StorageClient for gRPC operations, because we really wanted client streaming for uploads. The one bit we can't do using the new StorageClient is batch operations, which we use for recursive delete of objects under a prefix. We can list the objects using StorageClient, but the batch delete has to be sent using the old Storage API and only works with http. Since the restriction also appies to the old API using the grpc() version, I assume this is because of a limitation in the gRPC API itself?
Describe the solution you'd like
Ideally it would be good to have batch operations on the new StorageClient API, similar to what exists in the old Storage API. For our solution we only care about batch deletes at present, although it makes sense more generally that batch capabilities that were needed before will still be needed.
What you want to happen
I'd like to see batch operations supported on StorageClient. I'm assuming there is a dependency on adding them in the underlying gRPC APIs, that is just an assumption though! Since I can do 95% of what I need with the new API, it is a shame to still create both clients. If I can use just hte new client, then I only need to worry about one set of resources, handle one set of errors etc.
Describe alternatives you've considered
For now I had to create a legacy Storage object as well as the new StorageClient, and I use the old API just for doing batch delete operations. This does work, but it's not great as a long term solution and means its not possible to provide a full solution on the new API.
Additional context
Our product is an open source data and analytics platform: /~https://github.com/finos/tracdap
The core platform is built on gRPC and Apache Arrow, using Netty as the transport. Our storage plugin for GCP sits on top of the same resources (event loops, allocators etc). We use client / server streaming to transfer data in pipelines where the format and size of data is not known in advance. Using the old Storage API would involve buffering and worker thread pools which we've managed to avoid elsewhere. The new StorageClient is great for us, because we've already built streaming pipelines on gRPC so we can just follow the same pattern.
I appreciate these APIs are very new, we started using them as soon as they came available! Still the results have been good for us so far. If we can get rid of the need to use the old API at all, that would be ideal.
The text was updated successfully, but these errors were encountered: