-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reduce memory footprint of deleteFiles by utilizing getFilesStre… #2147
fix: reduce memory footprint of deleteFiles by utilizing getFilesStre… #2147
Conversation
…am and using smaller queue of promises
const filesStream = this.getFilesStream(query); | ||
|
||
for await (const curFile of filesStream) { | ||
if (promises.length >= MAX_QUEUE_SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is both MAX_QUEUE_SIZE
and MAX_PARALLEL_LIMIT
. Do we need both as they are trying to accomplish the same thing? Limiting the number of files being deleted at once.
Also 10 seems pretty low for deleting files, but I could be missing something, 1000 seems like a more reasonable number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full disclosure I'm not really sure why the original authors decided to utilize pLimit
for this one particular function. That said, MAX_PARALLEL_LIMIT
represents how many concurrent promises will be allowed to execute and I'm not sure why it was set to 10. We are having some discussion around if we want to expose this to be user setable. In unrelated testing we have found that setting concurrent promises really high / uncapped led to other problems, i.e. network saturation. It is probably less of a concern here as delete
doesn't return much in the way of data.
I implemented MAX_QUEUE_SIZE
as a means of further reducing memory utilization. The issue you were seeing was because 1. we fetched every file into memory during getFiles
and 2. we then created a giant array of promises 1 for each file delete. We are also discussing allowing this to be user setable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, when the authors leave it can be hard to figure out why they wrote something, and potentially dangerous to remove without sufficient testing. It seems after more testing a better number between 10 and 1000 could be set, or potentially as you said making it user-configurable.
|
||
for await (const curFile of filesStream) { | ||
if (promises.length >= MAX_QUEUE_SIZE) { | ||
await Promise.all(promises); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we hit the queue limit, all execution gets stopped until all the pending promises have finished. This technically works, but doesn't maximize the benefit of streams and results in lower performance.
When I was trying to find a workaround for this issue, I wrote this example transform stream for limiting execution of a stream: https://gist.github.com/timscottbell/364e2683354683c05cdfd93c708e9072. This is not production-ready code or fully tested, but more to give an idea of the direction I thought this solution would go in.
Our current workaround is this:
async deleteBucket(name: string) {
const bucket = StorageManagerGoogle._bucket(name, config.CloudProjectId);
const bucketExists = await StorageManagerGoogle._exists(bucket);
if (!bucketExists) {
logger.warn(`Bucket [${name}] does not exist, cannot delete it`);
return;
}
while (true) {
const [files] = await bucket.getFiles({
autoPaginate: false,
maxResults: 1000,
});
if (files.length === 0) {
break;
}
await Promise.all(
files.map(async (file: File) => {
try {
await file.delete({ignoreNotFound: true});
} catch (error) {
const {response} = error;
if (
!response ||
response.statusCode !== 409 ||
response.statusMessage !== 'Conflict'
) {
throw error;
}
}
})
);
}
await bucket.delete();
}
which looks very surprisingly similar to the approach here. (We opted for the less performant solution over the stream solution in the gist because some team members have less familiarity with streams, and it was a workaround we only wanted to commit temporarily).
I couldn't find any standard libraries that do this (I didn't look super hard). Surprised... there really should be one that handles this common generic use case.
Maybe you could use pmap
since the stream is an async iterable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some other improvements to our backlog around deleteFiles
performance in addition to some other stream related work we have planned for the next major version.
You are correct, we essentially back pressure the stream once the queue is full. I did this intentionally to not hold thousands / tens of thousands of objects in the queue at the same time. Hopefully this is enough of a reduction in memory for the time being. I don't think this will degrade performance as even the previous version only allowed 10 concurrent promises executing even if the queue contained 100,000 items.
If this doesn't help to resolve the issue you were seeing please feel free to update the open ticket and we will of course be happy to take a look.
…am and using smaller queue of promises
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕