-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer C_API #2168
Comments
需要注意Regularizer的部分还是必须独立出来,目前代码中这个部分尝试optimizer和regularizer结合起来。看起来处于无用状态。 希望的达到状态是:统一的库接管trainer和ParameterServer在优化方面的计算,封装一层math里的applySGD函数。 |
|
👍 放在一个proto文件定义这些声明?
这个接口非常简单, |
我看到引用的代码是gpu代码,提醒一下,现在pserver不需要支持GPU,太费GPU资源了,CPU也不一定慢很多。 建议Optimizer C++代码我们自己先写一份吧,现在Tensor也在重构,到时候需要的话再换到共享的Tensor库。 |
请看/~https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md 其中的 int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto); 是需要一个proto文件定义pserver的设置。但是建议Optimizer不用去理解proto,由pserver理解完了proto再根据proto要求创建Optimizer。 |
"一个接口确实已经可以完全表达所有优化函数接口。":应该是可以了,可能需要稍作修改: typedef enum {
SGD = 0,
ASGD = 1,
ADAM = 2,
// ...
} optimizer_identifier;
typdef struct optimizer optimizer; // forward declaration
optimizer* paddle_create_optimizer(optimizer_identifier identifier);
void paddle_release_optimizer(optimizer* o);
int paddle_update_parameter(optimizer* o, void *buffer, paddle_element_type datatype, const void* gradient, int num_bytes, double learning_rate); 添加 |
赞同comment,开始着手重写一份optimizer |
@helinwang
|
@dzhwinter avx is supported in almost all modern CPUs. However, maybe let's do non-avx CPU first (development time is faster as well because of no need to learn the avx intrinsics) and do avx if optimization is necessary. |
We had already done the SGD algorithm job in v1, no more effort should be paid here since we are in a hurry. I found that parse the configuration is inevitable.
How about pass a serialized proto file to describe the configuration? we need this config when we create optimizer, otherwise we need a standalone create function in pure C for every optimizer method. |
@dzhwinter Before we wanted the optimizer to be super simple, as few state as possible, and the Go part do the messy part like manage momentum memory, parsing config, so Go store many states (like momentum memory, since it needs to parse the config to know if it's momentum based optimizer). If we want to let optimizer parse and create the specific optimizer, Go better don't parse it again. So all the config related states will need to be managed by C++, that would complicate interfaces (e.g., C++ will need to manage momentum memory for each parameter, so it needs the parameter name exposed by the C++-Go API). I am a little worried about it complicating things more than "need a standalone create a function in pure C for every optimizer method.". How do you think? |
I have no tendency if the Go part should responsible with the states and parsing config. Actually, I just thought that if we write in Go part way, we need a bunch of different create function, update function |
I just write some pseudo code here show what will happen if we put it in Go part. // for create optimizer function signature
create_SGDOptimizer(learning_rate)
create_Momentum(learning_rate, mu)
create_Adam(learning_rate, )
// for update function signature
updateSGD(parameter, gradient, mu, learning_rate);
updateMomentum(parameter, momentum, gradient, mu, learning_rate);
updateAdam(parameter, momentum, gradient, mu, rha, learning_rate);
|
I have a new idea up for discussion, sorry that my opinion changes on this one, because I have not think very clear on the problem. Before in my mind, the optimizer is used like this (maybe the same in your mind as well): One optimizer instance for all parameters:
The problem with single optimizer instance is that it can not save per parameter state like momentum. Then we can do a workaround, let Go save momentum as well:
This walkaround has some problem: what if there is a new per parameter state, should Go maintain it as well? Another solution would be: not one optimizer instance for all parameters, but one optimizer instance for each parameter:
If we use this approach, the config parsing could be in C++, since Go does not need to maintain momentum, or parameter. The interface could be: optimizer* paddle_create_optimizer(const unsigned char* config_proto, int config_proto_len, const unsigned char* buffer, int num_bytes);
void paddle_release_optimizer(optimizer* o);
int paddle_update_parameter(optimizer* o, paddle_element_type datatype, const unsigned char* gradient, int num_bytes);
const unsigned char* paddle_optimizer_param(optimizer* o, *int num_bytes); |
well, solution 1 seems it is the right direction of a stateless library. after I dive into the all kinds of optimizer algorithms of Paddle, tensorflow. I find that Go maintain all the state can be achieved.
the problem can be solved since there are limited kinds of optimizer parameters. I thought we can write it in this way struct parameter_map{
void *parameter,
void *gradient,
void *momentum,
void *l1,
void *l2,
....
double learning_rate,
};
int paddle_update_parameter(optimizer* o, paddle_element_type datatype, parameter_map* params, int num_bytes); we can clearly figure how many parameters would be there by listing algorithm in PaddlePaddle. - pserver process
- optimizer instance
- parameter map
- "param_a": memory_buffer, momentum_buffer
- "param_b": memory_buffer, momentum_buffer
- ... However, I found that each pserver process assume all the parameter in this node have the same optimizer algorithm, because they share the optimizer process. But different parameter always has different optimizers. e.g. deep and wide model https://arxiv.org/abs/1606.07792 It is Sunday in your time Zone, so I rewrite optimizer in solution2, and focus other clound developing mission , hope it is the right choice. |
by the way, if there is any solution can maintain the parameter state in Go part, I prefer that way since we are designing a library. Rewrite this part would not consume too much time, if there is any improved method, I can just re-write it. |
@dzhwinter Writing in solution 2 sounds great! optimizer* paddle_create_optimizer(const unsigned char* config_proto, int config_proto_len, const unsigned char* buffer, int num_bytes);
void paddle_release_optimizer(optimizer* o);
int paddle_update_parameter(optimizer* o, paddle_element_type datatype, const unsigned char* gradient, int num_bytes);
const unsigned char* paddle_optimizer_param(optimizer* o); |
Thanks for the reply! >optimizer* paddle_create_optimizer(const unsigned char* config_proto, int config_proto_len, const unsigned char* buffer, int num_bytes); > int paddle_update_parameter(optimizer* o, paddle_element_type datatype, const unsigned char* gradient, int num_bytes);
the argument should contain learning_rate, right? int paddle_update_parameter(optimizer* o, paddle_element_type datatype, const unsigned char* gradient, int num_bytes, double learning_rate); |
@dzhwinter Here is what in my mind, could be wrong, just for discussion: The The learning rate is initialized by Paddle/proto/TrainerConfig.proto Line 31 in 8b6f374
|
@helinwang so, make a double check, is that this function returns the parameter name lists from optimizer? const unsigned char* paddle_optimizer_param(optimizer* o, *int num_bytes); In addition, there is a little problem of learning_rate policy if optimizer library takes it. virtual real calcLearningRate(int64_t numSamplesProcessed, int64_t pass) = 0;
e.g
class PolyLRS : public BaseLRS {
public:
explicit PolyLRS(const OptimizationConfig& config) : BaseLRS(config) {}
virtual real calcLearningRate(int64_t numSamplesProcessed, int64_t pass) {
return learningRate_ * pow(1.0 + a_ * numSamplesProcessed, -b_);
}
}; numSamplesProcessed will be another state, which goes the oppose way to our origin optimizer library idea. |
嗯是的,optimizer拥有parameter memory。这个memory需要初始化,就是为什么会有那个 因为optimizer拥有parameter memory,Go语言需要一个接口把这个memroy读出来,
这里面的变量numSamplesProcessed被用到了,可以是optimizer自己存一个状态。Solution 2我理解是所有状态都Optimizer存,每一个parameter创建一个Optimizer,Go部分只存name到optimizer的映射。 |
上次讨论提到
Model Optimization Using Gradients
这两种更新参数的方法计划都支持。目前v1版本只支持1(On Client的方法),由于两种方法都需要Optimizer更新的策略,因此选择将Optimizer封装成一个库。
ParameterServer为Go语言实现,需要一个Optimizer的C接口,定义如下
1、是否能将sparseUpdate和denseUpdate都使用这一个接口?
SparseUpdate存储为SparseRowMatrix,可以复用这个接口。
2、是否能将Regularizer一起封装在这个库里?
On Client的参数更新方式已经和通信耦合,特别是SparseUpdate时候,由于Update的过程是lazy的,本地迭代了多次,Regularizer需要保存计算的轮数,并且需要在某次读取时候触发更新。没想到合适的办法拆分通信状态
Optimizer计划封装math库里的底层applySGD等操作,详细代码位置见:
/~https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/math/TrainingAlgorithmOp.cu#L25
后期接入Majel可以迁移这部分代码
The text was updated successfully, but these errors were encountered: