Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR Attention model backward debug #1

Open
wanghaoshuang opened this issue May 21, 2018 · 3 comments
Open

OCR Attention model backward debug #1

wanghaoshuang opened this issue May 21, 2018 · 3 comments

Comments

@wanghaoshuang
Copy link
Owner

wanghaoshuang commented May 21, 2018

在ocr_attemtion的decode rnn (fluid , v1) 中,每一个stepoutput都通过一个利用softmax激活函数的fc layer, 该fc layer用于把memory转为对num_classes个单词的预测。
fluid 使用fc layer: /~https://github.com/wanghaoshuang/ocr_attention/blob/master/fluid/attention_model.py#L191
v1 使用的是mixed layer : /~https://github.com/wanghaoshuang/ocr_attention/blob/master/v1/trainer_conf.py#L151

在对齐backward过程中,发现上述fluid fc layerV1 mixed layer表现不一致。

对比实验

fluid fc layerv1 mixed layer都去掉softmax激活,加载相同model, 使用一张图片跑一个pass:

在fluid中,在fc layer前后加上了fluid.layers.Print(input=h, print_phase="backward"), 如下连接所示:
/~https://github.com/wanghaoshuang/ocr_attention/blob/master/fluid/attention_model.py#L190-L195

在V1中,在mixed layer前后加上了gradient_printer_evaluator(input=gru_step), 如下连接所示:
/~https://github.com/wanghaoshuang/ocr_attention/blob/master/v1/trainer_conf.py#L150-L157

输出日志表明(共7个rnn step):

  • fluid fc layer 和 v1 mixed layer的out_grad都是一致的。
  • 在最后一个step, fluid fc llayer和v1 mixed layer计算得到的input_grad一致
  • 在1-6个step, fluid fc llayer和v1 mixed layer计算得到的input_grad不一致

手动计算验证

从模型中加载v1 mixed layer的weights, 根据日志中打印出的out_grad, 手动计算input_grad.
手动计算结果与fluid fc layer给出的计算结果一致。

@wanghaoshuang
Copy link
Owner Author

在v1 mixed layer (FullMatrixProjection)的实现中,每个rnn step都会update parameter

在fluid的实现中,每个rnn step之后并不会立即update parameter (待确认

@wanghaoshuang
Copy link
Owner Author

V1 step7 log:

I0518 21:05:36.121757 27471 Evaluator.cpp:959] layer=gru_decoder@decoder_group new grad matrix:
 73 I0518 21:05:36.121775 27471 Evaluator.cpp:976] 0.00259665 -0.000160558 -0.000443643 -0.00150834 0.00335695 -0.0    0298472 0.00156726 0.0031843 -0.000498105 -0.000574104 0.000175909 0.000810787 0.00148678 -0.000382574 0.000707    332 0.00277693 -0.000727298 -0.00050394 0.00157571 -0.000472443 0.00260408 0.00821105 -0.00119629 0.00124277 0.    00182442 -0.00214355 -0.000432484 -0.00210629 -0.00288149 0.00328714 -0.000639475 -0.00246654 0.000158796 -0.00    122223 -0.00205544 0.00271084 -0.00253633 -0.000656171 0.00104133 0.00340126 0.000254594 -0.000696978 -0.001678    38 -0.00197562 -2.07276e-05 0.00119227 -0.000355771 0.000592401 -0.000270102 0.000868395 0.00201253 -0.00403969     -0.00239023 -6.89677e-06 0.00185185 -5.60677e-05 0.00242219 0.00144621 -0.000359975 -0.00402414 0.00795226 0.0    0200194 -0.00176959 -0.00478488 0.000922825 -0.000664621 0.00127238 -0.00191353 -0.000559233 -0.0033156 -0.0020    6255 -0.00174128 0.00116406 0.00493749 0.00233107 0.00174144 -0.00516108 -0.00127721 0.00146097 0.00278689 0.00    28184 -0.000559354 0.00258661 0.00158846 0.00142273 0.00221341 0.000767139 0.00584951 0.00218939 0.000912885 0.    000406815 0.00871274 -0.00173787 0.000600034 0.00270166 0.00095701 0.0017513 0.000125336 -0.00189119 0.00259775     -0.00131946 0.000553104 -0.00279177 -0.00302586 -0.00183147 -0.000971036 0.00297192 -0.00209951 0.00157097 0.0    00790601 0.00187841 0.00226327 -0.00599897 0.00163954 0.0019998 0.00227321 0.00090493 0.00188178 0.00347843 0.0    0384294 0.00259144 -0.00377346 0.00120053 -0.000421648 0.00236531 -0.00249254 -0.00141073 -0.00110959
 74 I0518 21:05:36.121857 27471 Evaluator.cpp:959] layer=__mixed_3__@decoder_group new grad matrix:
 75 I0518 21:05:36.121872 27471 Evaluator.cpp:976] 0 -0.00682783 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0

fluid step7 log:

1526874794      Tensor[print_1.tmp_0@GRAD]
     dtype: f
     data: 0,-0.00682783,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0    ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
] 1526874794      Tensor[print_0.tmp_0@GRAD]
     shape: [1,128,]
     dtype: f
     data: 0.00259665,-0.000160558,-0.000443643,-0.00150834,0.00335695,-0.00298472,0.00156726,0.0031843,-0.00049    8105,-0.000574104,0.000175909,0.000810787,0.00148678,-0.000382574,0.000707332,0.00277693,-0.000727298,-0.000503    94,0.00157571,-0.000472443,0.00260408,0.00821105,-0.00119629,0.00124277,0.00182442,-0.00214355,-0.000432484,-0.    00210629,-0.00288149,0.00328714,-0.000639475,-0.00246654,0.000158796,-0.00122223,-0.00205544,0.00271084,-0.0025    3633,-0.000656171,0.00104133,0.00340126,0.000254594,-0.000696978,-0.00167838,-0.00197562,-2.07276e-05,0.0011922    7,-0.000355771,0.000592401,-0.000270102,0.000868395,0.00201253,-0.00403969,-0.00239023,-6.89677e-06,0.00185185,    -5.60677e-05,0.00242219,0.00144621,-0.000359975,-0.00402414,0.00795226,0.00200194,-0.00176959,-0.00478488,0.000    922825,-0.000664621,0.00127238,-0.00191353,-0.000559233,-0.0033156,-0.00206255,-0.00174128,0.00116406,0.0049374    9,0.00233107,0.00174144,-0.00516108,-0.00127721,0.00146097,0.00278689,0.0028184,-0.000559354,0.00258661,0.00158    846,0.00142273,0.00221341,0.000767139,0.00584951,0.00218939,0.000912885,0.000406815,0.00871274,-0.00173787,0.00    0600034,0.00270166,0.00095701,0.0017513,0.000125336,-0.00189119,0.00259775,-0.00131946,0.000553104,-0.00279177,    -0.00302586,-0.00183147,-0.000971036,0.00297192,-0.00209951,0.00157097,0.000790601,0.00187841,0.00226327,-0.005    99897,0.00163954,0.0019998,0.00227321,0.00090493,0.00188178,0.00347843,0.00384294,0.00259144,-0.00377346,0.0012    0053,-0.000421648,0.00236531,-0.00249254,-0.00141073,-0.00110959,

@wanghaoshuang
Copy link
Owner Author

v1 step6 log:

I0518 21:05:36.121501 27471 Evaluator.cpp:959] layer=gru_decoder@decoder_group new grad matrix:
 67 I0518 21:05:36.121520 27471 Evaluator.cpp:976] 0.00195129 0.00696817 0.00337138 -0.00331084 -0.0012873 -0.00101    725 0.0015318 0.00281486 0.00110389 -0.00783806 0.00279485 0.000483486 -0.00587682 -0.000406022 -0.00444318 0.0    0311902 -0.000286885 -0.00182699 0.00609785 6.39376e-06 0.0026729 -0.00152868 0.000638021 -0.000681568 0.004192    63 -0.00447789 -0.000450203 -0.0049679 -0.00493234 0.00435118 -0.00112588 -0.00271887 0.00149361 0.000293961 0.    000474197 -0.0041792 0.00216285 0.00365145 0.0030447 0.00148862 0.0062595 0.00222197 -0.00032478 -0.00331019 -0    .00430819 0.00111168 -0.000237889 0.00209829 -0.00210756 -0.00214455 0.00344501 -0.00129543 -0.00363674 -0.0007    67621 0.00202503 -0.00225403 0.000690179 0.00261361 -0.00308932 -0.000143414 -0.000243178 0.00279573 -0.0056750    6 0.00125122 -0.000486985 -0.000190908 0.00349188 0.000370814 -0.00065552 0.00430407 0.00402884 0.00321153 -0.0    0181442 0.00540607 0.00332404 -0.00352391 -0.000731768 -0.00529314 -0.00176608 -0.00229928 0.0014307 0.00295825     0.0051803 0.00414077 -0.00173698 0.000565638 -0.00538867 -0.00602313 0.00371452 0.00350338 0.00799726 -0.00230    276 0.00218312 -0.00182899 0.00492448 -0.00184285 -0.00220591 0.00131281 -0.00195217 -0.00471082 -0.00661464 -0    .00567748 0.00385582 -0.00435521 0.000868063 0.0053495 0.00479739 -0.0034153 -0.00678458 -0.00197338 0.00404837     0.00234378 0.00895149 0.00476145 0.00100003 0.00672587 -0.000117887 0.00184391 0.00531259 0.00466084 -0.004480    95 -0.00478376 -0.00386377 0.00342916 0.000417518 0.00178956 -0.00056568 -0.000898012
 68 I0518 21:05:36.121601 27471 Evaluator.cpp:959] layer=__mixed_3__@decoder_group new grad matrix:
 69 I0518 21:05:36.121616 27471 Evaluator.cpp:976] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.00903066 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0 0

fluid step6 log:

1526874794      Tensor[print_1.tmp_0@GRAD]
 27     shape: [1,97,]
 28     dtype: f
 29     data: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0    ,0,-0.00903066,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 1526874794      Tensor[print_0.tmp_0@GRAD]
     shape: [1,128,]
     dtype: f
     data: 0.00257621,0.0055683,0.00116893,-0.00242795,-0.00184488,-0.00207971,-0.000187879,0.00110172,-0.000350    247,-0.00676898,0.00282837,0.00148561,-0.0055419,-0.000398527,-0.00245015,0.00125287,-0.000825821,-0.00229468,0    .0017099,0.00142786,-0.000572159,0.00104176,-0.000174144,0.00163064,0.00222587,-0.00478678,-0.00139681,-0.00468    937,-0.00249963,0.00177533,-0.00130032,-0.000603338,0.00156789,-0.000134418,-0.00145653,-0.00353903,0.0022065,0    .00271195,0.00105443,0.00327108,0.00498615,0.00257492,-0.00200571,-0.00289722,-0.00362656,3.73827e-05,-0.000733    516,0.000725259,-0.00319921,-0.00164274,0.000413312,-0.0011129,-0.00188553,-0.000727051,0.00260391,-0.0025843,0    .00053729,0.00115065,-0.00300947,0.00108684,-0.00060007,0.00218412,-0.0030831,-9.03671e-06,0.000198779,0.000190    235,0.00354167,0.00144787,1.29675e-05,0.00397718,0.00228078,0.00197726,-0.00206132,0.00653767,-0.00105931,-0.00    281189,-0.00197616,-0.000384897,8.89603e-05,-0.00353494,5.68105e-05,0.00229945,0.00161412,0.00261329,-0.0012765    3,0.000744648,-0.00341167,-0.00380866,0.0021809,0.00174046,0.00599654,-0.00224944,0.00319137,0.00230823,0.00090    6493,-0.000455288,-0.0028061,-0.000861754,-0.000980657,-0.00424987,-0.0039778,-0.00627297,4.23144e-05,-0.002543    32,-0.00119582,0.00432329,0.00240232,-0.000826505,-0.0039817,-0.00259758,0.00164273,0.00231245,0.00631008,0.004    0108,0.00153362,0.00370465,0.000815095,-0.00140137,0.00194219,0.00465141,-0.00422437,-0.00477802,-0.00180537,0.    00198789,0.000870398,0.00186976,-0.00160577,0.00229918,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant