Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new load and save ops for storing model params in a single file #7780

Closed

Conversation

sidgoyal78
Copy link
Contributor

This PR addresses issue #7722 by implementing the approach described in the issue.

TODO: As pointed out by @reyoung in the implementation of save_op, we need to move few file-related helper functions to a different namespace. I will do it maybe in this PR or maybe another. For the time being the helper functions are replicated and are static.

@sidgoyal78 sidgoyal78 requested review from Xreki and kexinzhao January 23, 2018 07:19
@Xreki Xreki added the 预测 原名Inference,包含Capi预测问题等 label Jan 23, 2018
public:
LoadCombineOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) The tensor need to be load_combineed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the implementation of load_op? The load_combined operator is a little confusing.

Maybe we can change the output Out of load_op to a Duplicable:

AddOutput("Out", "(Tensor) The tensor need to be loaded").AsDuplicable();

Then Out will be a vector of Variables, and we can load all the Variables from a file through one load_op.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can use offset in a file instead of the position_counter? And can we avoid to read the content of the file again and again?

Copy link
Contributor Author

@sidgoyal78 sidgoyal78 Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine it with load_op. But in my opinion, it will look more like:

//book keeping code
if( combine_param ) {
   // do stuff from combine_op
} else 
  // do stuff from usual op 
}

Regarding your Duplicable suggestion: I am not sure I follow completely, but still I think we will need a deserialization logic over this merged together tensor (which maybe the same amount of computation).

Regarding your suggestion about offset: I think it has 2 aspects:

  • If we use the offset, we will have to save it in some variable, (either make a class and make a member variable assign the offset, or something like that)
  • I don't see how an offset helps much in computation saving. I think even if we have an offset, everytime we open the file to load, we still have to move the file pointer from starting to the actual location. And we still do it in this code by hopping through different chunks (and of course, in the current implementation, there is some extra overheard of new and copy to istringstream).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine it with load_op

I agree with it. Maybe can do it in next PRs.

Regarding your suggestion about offset. I think even if we have an offset, everytime we open the file to load, we still have to move the file pointer from starting to the actual location.

The current LoadCombineOp only process one file each time. Can it process many files at the same time? That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. Thus, it can avoid reading the content of the file again and again.

Copy link
Contributor Author

@sidgoyal78 sidgoyal78 Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. Thus, it can avoid reading the content of the file again and again.

@luotao1 : Can you please explain a bit? I am not sure if I understand the main idea. (Did you mean : can it process many tensors (instead of files) at the same time?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To just explain my point of view a bit more, I thought that the op will be used in the manner as done currently in save_vars() in fluid/io.py (ref. So basically, i assumed that we will have some of usage which looks like this:

/~https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/io.py#L89-L97

(with save replaced with save_combine)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine it with load_op

I agree.

That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. @luotao1 : Can you please explain a bit?

I think what @luotao1 is saying is that you can construct a save/load program desc for loading/saving all parameters data at once. This program desc has only one load/save op that takes a list of var names as input/output and do the load/save tasks for once. This list can be obtained from the main program desc by traversing variables. By doing this, we can save a lot of time reading/write files. Thus I prefer this option.

@Xreki Xreki requested review from qingqing01 and luotao1 January 23, 2018 12:03
@@ -0,0 +1,125 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2016->2018 in Copyright

public:
LoadCombineOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) The tensor need to be load_combineed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine it with load_op

I agree with it. Maybe can do it in next PRs.

Regarding your suggestion about offset. I think even if we have an offset, everytime we open the file to load, we still have to move the file pointer from starting to the actual location.

The current LoadCombineOp only process one file each time. Can it process many files at the same time? That is to say, this op loads all parameters at the same time and saves them into one file, and it doesn't need offset or point_counter. Thus, it can avoid reading the content of the file again and again.

50, 20, lod4, "test_var4", 3, "tensor.save", scope, place, expect_lod4);

paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4;
int* actual1 = create_and_run_load_combine_op("out_var1", 0, "tensor.save",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When LoadCombineOp loads all parameters at the same time, line 112 maybe like:
create_and_run_load_combine_op("out_var_list", "tensor.save", scope, place) ?

@Xreki
Copy link
Contributor

Xreki commented Jan 31, 2018

@sidgoyal78 since #7909 is merged, can you close this PR?

@sidgoyal78 sidgoyal78 closed this Jan 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
预测 原名Inference,包含Capi预测问题等
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants