-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR Attention model backward debug #1
Comments
在v1 mixed layer (FullMatrixProjection)的实现中,每个rnn step都会update parameter 在fluid的实现中,每个rnn step之后并不会立即update parameter (待确认) |
V1 step7 log:
fluid step7 log:
|
v1 step6 log:
fluid step6 log:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
在ocr_attemtion的decode rnn (fluid , v1) 中,每一个
step
的output
都通过一个利用softmax
激活函数的fc layer
, 该fc layer
用于把memory
转为对num_classes
个单词的预测。fluid 使用fc layer: /~https://github.com/wanghaoshuang/ocr_attention/blob/master/fluid/attention_model.py#L191
v1 使用的是mixed layer : /~https://github.com/wanghaoshuang/ocr_attention/blob/master/v1/trainer_conf.py#L151
在对齐backward过程中,发现上述
fluid fc layer
和V1 mixed layer
表现不一致。对比实验
fluid fc layer
和v1 mixed layer
都去掉softmax
激活,加载相同model, 使用一张图片跑一个pass:在fluid中,在fc layer前后加上了
fluid.layers.Print(input=h, print_phase="backward")
, 如下连接所示:/~https://github.com/wanghaoshuang/ocr_attention/blob/master/fluid/attention_model.py#L190-L195
在V1中,在mixed layer前后加上了
gradient_printer_evaluator(input=gru_step)
, 如下连接所示:/~https://github.com/wanghaoshuang/ocr_attention/blob/master/v1/trainer_conf.py#L150-L157
输出日志表明(共7个rnn step):
手动计算验证
从模型中加载v1 mixed layer的weights, 根据日志中打印出的out_grad, 手动计算input_grad.
手动计算结果与fluid fc layer给出的计算结果一致。
The text was updated successfully, but these errors were encountered: