OCR Attention model backward debug #1

wanghaoshuang · 2018-05-21T03:40:37Z

在ocr_attemtion的decode rnn (fluid , v1) 中，每一个step的output都通过一个利用softmax激活函数的fc layer, 该fc layer用于把memory转为对num_classes个单词的预测。
fluid 使用fc layer： /~https://github.com/wanghaoshuang/ocr_attention/blob/master/fluid/attention_model.py#L191
v1 使用的是mixed layer : /~https://github.com/wanghaoshuang/ocr_attention/blob/master/v1/trainer_conf.py#L151

在对齐backward过程中，发现上述fluid fc layer和 V1 mixed layer表现不一致。

对比实验

fluid fc layer和v1 mixed layer都去掉softmax激活，加载相同model, 使用一张图片跑一个pass:

在fluid中，在fc layer前后加上了fluid.layers.Print(input=h, print_phase="backward")，如下连接所示：
/~https://github.com/wanghaoshuang/ocr_attention/blob/master/fluid/attention_model.py#L190-L195

在V1中，在mixed layer前后加上了gradient_printer_evaluator(input=gru_step), 如下连接所示：
/~https://github.com/wanghaoshuang/ocr_attention/blob/master/v1/trainer_conf.py#L150-L157

输出日志表明（共7个rnn step）：

fluid fc layer 和 v1 mixed layer的out_grad都是一致的。
在最后一个step， fluid fc llayer和v1 mixed layer计算得到的input_grad一致
在1-6个step, fluid fc llayer和v1 mixed layer计算得到的input_grad不一致

手动计算验证

从模型中加载v1 mixed layer的weights, 根据日志中打印出的out_grad, 手动计算input_grad.
手动计算结果与fluid fc layer给出的计算结果一致。

The text was updated successfully, but these errors were encountered:

wanghaoshuang · 2018-05-21T04:57:53Z

在v1 mixed layer (FullMatrixProjection)的实现中，每个rnn step都会update parameter

在fluid的实现中，每个rnn step之后并不会立即update parameter （待确认）

wanghaoshuang · 2018-05-21T06:27:40Z

V1 step7 log:

I0518 21:05:36.121757 27471 Evaluator.cpp:959] layer=gru_decoder@decoder_group new grad matrix:
 73 I0518 21:05:36.121775 27471 Evaluator.cpp:976] 0.00259665 -0.000160558 -0.000443643 -0.00150834 0.00335695 -0.0    0298472 0.00156726 0.0031843 -0.000498105 -0.000574104 0.000175909 0.000810787 0.00148678 -0.000382574 0.000707    332 0.00277693 -0.000727298 -0.00050394 0.00157571 -0.000472443 0.00260408 0.00821105 -0.00119629 0.00124277 0.    00182442 -0.00214355 -0.000432484 -0.00210629 -0.00288149 0.00328714 -0.000639475 -0.00246654 0.000158796 -0.00    122223 -0.00205544 0.00271084 -0.00253633 -0.000656171 0.00104133 0.00340126 0.000254594 -0.000696978 -0.001678    38 -0.00197562 -2.07276e-05 0.00119227 -0.000355771 0.000592401 -0.000270102 0.000868395 0.00201253 -0.00403969     -0.00239023 -6.89677e-06 0.00185185 -5.60677e-05 0.00242219 0.00144621 -0.000359975 -0.00402414 0.00795226 0.0    0200194 -0.00176959 -0.00478488 0.000922825 -0.000664621 0.00127238 -0.00191353 -0.000559233 -0.0033156 -0.0020    6255 -0.00174128 0.00116406 0.00493749 0.00233107 0.00174144 -0.00516108 -0.00127721 0.00146097 0.00278689 0.00    28184 -0.000559354 0.00258661 0.00158846 0.00142273 0.00221341 0.000767139 0.00584951 0.00218939 0.000912885 0.    000406815 0.00871274 -0.00173787 0.000600034 0.00270166 0.00095701 0.0017513 0.000125336 -0.00189119 0.00259775     -0.00131946 0.000553104 -0.00279177 -0.00302586 -0.00183147 -0.000971036 0.00297192 -0.00209951 0.00157097 0.0    00790601 0.00187841 0.00226327 -0.00599897 0.00163954 0.0019998 0.00227321 0.00090493 0.00188178 0.00347843 0.0    0384294 0.00259144 -0.00377346 0.00120053 -0.000421648 0.00236531 -0.00249254 -0.00141073 -0.00110959
 74 I0518 21:05:36.121857 27471 Evaluator.cpp:959] layer=__mixed_3__@decoder_group new grad matrix:
 75 I0518 21:05:36.121872 27471 Evaluator.cpp:976] 0 -0.00682783 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0

fluid step7 log:

1526874794      Tensor[print_1.tmp_0@GRAD]
     dtype: f
     data: 0,-0.00682783,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0    ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
] 1526874794      Tensor[print_0.tmp_0@GRAD]
     shape: [1,128,]
     dtype: f
     data: 0.00259665,-0.000160558,-0.000443643,-0.00150834,0.00335695,-0.00298472,0.00156726,0.0031843,-0.00049    8105,-0.000574104,0.000175909,0.000810787,0.00148678,-0.000382574,0.000707332,0.00277693,-0.000727298,-0.000503    94,0.00157571,-0.000472443,0.00260408,0.00821105,-0.00119629,0.00124277,0.00182442,-0.00214355,-0.000432484,-0.    00210629,-0.00288149,0.00328714,-0.000639475,-0.00246654,0.000158796,-0.00122223,-0.00205544,0.00271084,-0.0025    3633,-0.000656171,0.00104133,0.00340126,0.000254594,-0.000696978,-0.00167838,-0.00197562,-2.07276e-05,0.0011922    7,-0.000355771,0.000592401,-0.000270102,0.000868395,0.00201253,-0.00403969,-0.00239023,-6.89677e-06,0.00185185,    -5.60677e-05,0.00242219,0.00144621,-0.000359975,-0.00402414,0.00795226,0.00200194,-0.00176959,-0.00478488,0.000    922825,-0.000664621,0.00127238,-0.00191353,-0.000559233,-0.0033156,-0.00206255,-0.00174128,0.00116406,0.0049374    9,0.00233107,0.00174144,-0.00516108,-0.00127721,0.00146097,0.00278689,0.0028184,-0.000559354,0.00258661,0.00158    846,0.00142273,0.00221341,0.000767139,0.00584951,0.00218939,0.000912885,0.000406815,0.00871274,-0.00173787,0.00    0600034,0.00270166,0.00095701,0.0017513,0.000125336,-0.00189119,0.00259775,-0.00131946,0.000553104,-0.00279177,    -0.00302586,-0.00183147,-0.000971036,0.00297192,-0.00209951,0.00157097,0.000790601,0.00187841,0.00226327,-0.005    99897,0.00163954,0.0019998,0.00227321,0.00090493,0.00188178,0.00347843,0.00384294,0.00259144,-0.00377346,0.0012    0053,-0.000421648,0.00236531,-0.00249254,-0.00141073,-0.00110959,

wanghaoshuang · 2018-05-21T06:29:35Z

v1 step6 log:

I0518 21:05:36.121501 27471 Evaluator.cpp:959] layer=gru_decoder@decoder_group new grad matrix:
 67 I0518 21:05:36.121520 27471 Evaluator.cpp:976] 0.00195129 0.00696817 0.00337138 -0.00331084 -0.0012873 -0.00101    725 0.0015318 0.00281486 0.00110389 -0.00783806 0.00279485 0.000483486 -0.00587682 -0.000406022 -0.00444318 0.0    0311902 -0.000286885 -0.00182699 0.00609785 6.39376e-06 0.0026729 -0.00152868 0.000638021 -0.000681568 0.004192    63 -0.00447789 -0.000450203 -0.0049679 -0.00493234 0.00435118 -0.00112588 -0.00271887 0.00149361 0.000293961 0.    000474197 -0.0041792 0.00216285 0.00365145 0.0030447 0.00148862 0.0062595 0.00222197 -0.00032478 -0.00331019 -0    .00430819 0.00111168 -0.000237889 0.00209829 -0.00210756 -0.00214455 0.00344501 -0.00129543 -0.00363674 -0.0007    67621 0.00202503 -0.00225403 0.000690179 0.00261361 -0.00308932 -0.000143414 -0.000243178 0.00279573 -0.0056750    6 0.00125122 -0.000486985 -0.000190908 0.00349188 0.000370814 -0.00065552 0.00430407 0.00402884 0.00321153 -0.0    0181442 0.00540607 0.00332404 -0.00352391 -0.000731768 -0.00529314 -0.00176608 -0.00229928 0.0014307 0.00295825     0.0051803 0.00414077 -0.00173698 0.000565638 -0.00538867 -0.00602313 0.00371452 0.00350338 0.00799726 -0.00230    276 0.00218312 -0.00182899 0.00492448 -0.00184285 -0.00220591 0.00131281 -0.00195217 -0.00471082 -0.00661464 -0    .00567748 0.00385582 -0.00435521 0.000868063 0.0053495 0.00479739 -0.0034153 -0.00678458 -0.00197338 0.00404837     0.00234378 0.00895149 0.00476145 0.00100003 0.00672587 -0.000117887 0.00184391 0.00531259 0.00466084 -0.004480    95 -0.00478376 -0.00386377 0.00342916 0.000417518 0.00178956 -0.00056568 -0.000898012
 68 I0518 21:05:36.121601 27471 Evaluator.cpp:959] layer=__mixed_3__@decoder_group new grad matrix:
 69 I0518 21:05:36.121616 27471 Evaluator.cpp:976] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.00903066 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0

fluid step6 log:

1526874794      Tensor[print_1.tmp_0@GRAD]
 27     shape: [1,97,]
 28     dtype: f
 29     data: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0    ,0,-0.00903066,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 1526874794      Tensor[print_0.tmp_0@GRAD]
     shape: [1,128,]
     dtype: f
     data: 0.00257621,0.0055683,0.00116893,-0.00242795,-0.00184488,-0.00207971,-0.000187879,0.00110172,-0.000350    247,-0.00676898,0.00282837,0.00148561,-0.0055419,-0.000398527,-0.00245015,0.00125287,-0.000825821,-0.00229468,0    .0017099,0.00142786,-0.000572159,0.00104176,-0.000174144,0.00163064,0.00222587,-0.00478678,-0.00139681,-0.00468    937,-0.00249963,0.00177533,-0.00130032,-0.000603338,0.00156789,-0.000134418,-0.00145653,-0.00353903,0.0022065,0    .00271195,0.00105443,0.00327108,0.00498615,0.00257492,-0.00200571,-0.00289722,-0.00362656,3.73827e-05,-0.000733    516,0.000725259,-0.00319921,-0.00164274,0.000413312,-0.0011129,-0.00188553,-0.000727051,0.00260391,-0.0025843,0    .00053729,0.00115065,-0.00300947,0.00108684,-0.00060007,0.00218412,-0.0030831,-9.03671e-06,0.000198779,0.000190    235,0.00354167,0.00144787,1.29675e-05,0.00397718,0.00228078,0.00197726,-0.00206132,0.00653767,-0.00105931,-0.00    281189,-0.00197616,-0.000384897,8.89603e-05,-0.00353494,5.68105e-05,0.00229945,0.00161412,0.00261329,-0.0012765    3,0.000744648,-0.00341167,-0.00380866,0.0021809,0.00174046,0.00599654,-0.00224944,0.00319137,0.00230823,0.00090    6493,-0.000455288,-0.0028061,-0.000861754,-0.000980657,-0.00424987,-0.0039778,-0.00627297,4.23144e-05,-0.002543    32,-0.00119582,0.00432329,0.00240232,-0.000826505,-0.0039817,-0.00259758,0.00164273,0.00231245,0.00631008,0.004    0108,0.00153362,0.00370465,0.000815095,-0.00140137,0.00194219,0.00465141,-0.00422437,-0.00477802,-0.00180537,0.    00198789,0.000870398,0.00186976,-0.00160577,0.00229918,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR Attention model backward debug #1

OCR Attention model backward debug #1

wanghaoshuang commented May 21, 2018 •

edited

Loading

wanghaoshuang commented May 21, 2018

wanghaoshuang commented May 21, 2018

wanghaoshuang commented May 21, 2018

OCR Attention model backward debug #1

OCR Attention model backward debug #1

Comments

wanghaoshuang commented May 21, 2018 • edited Loading

对比实验

手动计算验证

wanghaoshuang commented May 21, 2018

wanghaoshuang commented May 21, 2018

wanghaoshuang commented May 21, 2018

wanghaoshuang commented May 21, 2018 •

edited

Loading