-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new load and save ops for storing model params in a single file #7780
Conversation
public: | ||
LoadCombineOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddOutput("Out", "(Tensor) The tensor need to be load_combineed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the implementation of load_op
? The load_combined
operator is a little confusing.
Maybe we can change the output Out
of load_op
to a Duplicable
:
AddOutput("Out", "(Tensor) The tensor need to be loaded").AsDuplicable();
Then Out
will be a vector of Variable
s, and we can load all the Variable
s from a file through one load_op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can use offset
in a file instead of the position_counter
? And can we avoid to read the content of the file again and again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can combine it with load_op
. But in my opinion, it will look more like:
//book keeping code
if( combine_param ) {
// do stuff from combine_op
} else
// do stuff from usual op
}
Regarding your Duplicable
suggestion: I am not sure I follow completely, but still I think we will need a deserialization logic over this merged together tensor (which maybe the same amount of computation).
Regarding your suggestion about offset
: I think it has 2 aspects:
- If we use the offset, we will have to save it in some variable, (either make a class and make a member variable assign the offset, or something like that)
- I don't see how an offset helps much in computation saving. I think even if we have an offset, everytime we open the file to load, we still have to move the file pointer from starting to the actual location. And we still do it in this code by hopping through different chunks (and of course, in the current implementation, there is some extra overheard of
new
and copy toistringstream
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can combine it with load_op
I agree with it. Maybe can do it in next PRs.
Regarding your suggestion about offset. I think even if we have an offset, everytime we open the file to load, we still have to move the file pointer from starting to the actual location.
The current LoadCombineOp only process one file each time. Can it process many files at the same time? That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. Thus, it can avoid reading the content of the file again and again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. Thus, it can avoid reading the content of the file again and again.
@luotao1 : Can you please explain a bit? I am not sure if I understand the main idea. (Did you mean : can it process many tensors (instead of files) at the same time?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To just explain my point of view a bit more, I thought that the op will be used in the manner as done currently in save_vars()
in fluid/io.py
(ref. So basically, i assumed that we will have some of usage which looks like this:
/~https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/io.py#L89-L97
(with save
replaced with save_combine
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can combine it with load_op
I agree.
That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. @luotao1 : Can you please explain a bit?
I think what @luotao1 is saying is that you can construct a save/load program desc for loading/saving all parameters data at once. This program desc has only one load/save op that takes a list of var names as input/output and do the load/save tasks for once. This list can be obtained from the main program desc by traversing variables. By doing this, we can save a lot of time reading/write files. Thus I prefer this option.
@@ -0,0 +1,125 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2016->2018 in Copyright
public: | ||
LoadCombineOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddOutput("Out", "(Tensor) The tensor need to be load_combineed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can combine it with load_op
I agree with it. Maybe can do it in next PRs.
Regarding your suggestion about offset. I think even if we have an offset, everytime we open the file to load, we still have to move the file pointer from starting to the actual location.
The current LoadCombineOp only process one file each time. Can it process many files at the same time? That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. Thus, it can avoid reading the content of the file again and again.
50, 20, lod4, "test_var4", 3, "tensor.save", scope, place, expect_lod4); | ||
|
||
paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4; | ||
int* actual1 = create_and_run_load_combine_op("out_var1", 0, "tensor.save", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When LoadCombineOp loads all parameters at the same time, line 112 maybe like:
create_and_run_load_combine_op("out_var_list", "tensor.save", scope, place)
?
@sidgoyal78 since #7909 is merged, can you close this PR? |
This PR addresses issue #7722 by implementing the approach described in the issue.
TODO: As pointed out by @reyoung in the implementation of
save_op
, we need to move few file-related helper functions to a different namespace. I will do it maybe in this PR or maybe another. For the time being the helper functions are replicated and are static.