-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Reference engine from chunk via weak pointer #14591
Reference engine from chunk via weak pointer #14591
Conversation
🎆 |
Thanks for your contribution @KellenSunderland . @mxnet-label-bot Add [Numpy, pr-awaiting-review] |
Thanks @piyushghai. Not immediately clear to me why the numpy label is attached. Does this PR have a Numpy related aspect to it? |
Aah. My bad. I was adding ndarray, and I ended up typing numpy instead :/ @mxnet-label-bot Update[ndarray, pr-awaiting-review] |
@piyushghai Gotcha! Thanks for the quick responses on the PR. |
LGTM |
@anirudhacharya @eric-haibin-lin Could one of you guys do a quick review (it's a pretty small change). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the change looks good. Thanks for the fix! Can we add a cpp test for this.
c2a6287
to
dd55710
Compare
@anirudh2290 Good point. Took me a while to think of how we can test this. Reproducing the issue is only possible when we have static de-allocating. A bit tricky to test without modifying the code to make non-static scoped. I created a test that at a minimum crashes on my machine without the fix. It'll mark the test as passed but it will crash the test process (which will prevent regressions in CI I hope). Stepping through I can see it's taking the branches you'd expect w and w/o the fix. I.e. with the fix it skips the engine delete. Output w/o fix:
|
dd55710
to
041f17b
Compare
engine->Stop(); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KellenSunderland Thanks for the nice work on the tests. I understand this may be difficult to test, and I am okay with the crash for verification. Would it be easy to pinpoint where it crashed when run with other tests ? Otherwise better to move it to a different source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a little difficult in fact. Let's move it to it's own test suite to hopefully make it a little more obvious for future contributors.
b760cae
to
b3c6278
Compare
b3c6278
to
69ce1a3
Compare
@anirudh2290 Rebased and moved the test into its own fixture, should be ready for another look. |
Reference engine from chunk via weak pointer (apache#14591)
Description
This PR fixes an issue we're seeing in TRT integration which causes a crash on shutdown. I believe the issue should be present under other usages as well. This PR tracks a weak reference on the engine to ensure we don't try to schedule work on the engine after it has been shutdown (for example cleaning up memory from NDChunks that are still in scope).
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.