Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDNN] Second fix to #33021 #33471

Merged
merged 2 commits into from
Jun 11, 2021
Merged

Conversation

jczaja
Copy link
Contributor

@jczaja jczaja commented Jun 9, 2021

PR types

Bug fixes

PR changes

Others

Describe

It is second part of fixes to #33021. Problem addressed here was that Predictor after finishing execution of Run() is restoring default settings of cache. Those default settings are unrestriced size of cache (cache_capacity = 0). but after Run() is completed , There are sometimes Tensor:: CopyToCpu() methods called which are using oneDNN objects and could make cache growing.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jun 9, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@jczaja jczaja added the Intel label Jun 9, 2021
@@ -353,6 +351,9 @@ void AnalysisPredictor::MkldnnPreSet(
VLOG(2) << "Set input shape=" << ss.str();
platform::MKLDNNDeviceContext::tls().set_cur_input_shape_str(ss.str());
}
platform::MKLDNNDeviceContext::tls().set_cur_input_shape_cache_capacity(
config_.mkldnn_cache_capacity_);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there difference to move position of this line of code? What is influenced ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously , cache_cpacity was only set when it was non-zero (cache clearing mode) . But as zero value is default it was
not set. But if you have two predictors : one working in restriced cache (clear mode) and other without this limitation
then if you run executor with restricted cache size and then go to execution of seconf executor that is being run on default settings then you would actually use restriced caching again. I mean we were doing reverting to defaults in MKLDNNPostReset() , but I we cannot do it anymore so I had to move this resetting to when execution begins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok it is now in preset function. Sorry I did not notice. Thanks for your explanation

out_data.resize(out_num);
output_t->CopyToCpu(out_data.data());

// Release predictor (relevant cache should be emptied)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"predictor.reset(nullptr);" I feel we don't call often call this in API tests. Usually we just do predictor.Run(), that's all and seems assuming that cache will be release during deconstructor of some object?
What will be the result if we remove this predictor.reset(nullptr) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predictor.reset(nullptr) This enforces that predictor is released and its cached objects as well. Normally you do not need to do it, as it is released when get out of scope, but I want to check if cache is released so I could not wait till end of scope.

Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidanqing-intel
Copy link
Contributor

LGTM.

Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jczaja
Copy link
Contributor Author

jczaja commented Jun 11, 2021

As agreed with @juncaipeng After passing internal review (two reviewers) for PRs specific to oneDNN we are allowed to merge .

@jczaja jczaja merged commit 3c49f08 into PaddlePaddle:develop Jun 11, 2021
@jczaja jczaja deleted the prv-33021-2-pr branch June 11, 2021 07:48
lidanqing-intel pushed a commit to lidanqing-intel/Paddle that referenced this pull request Jun 15, 2021
Superjomn pushed a commit that referenced this pull request Jun 16, 2021
…y consumption (#33571)

* [oneDNN] First fix to #33021  (#33174)

* - First fix to #33021

* [oneDNN] Second fix to #33021 (#33471)

* use older download_data function

Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants