Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the profiler's bug in multi-gpu mode #8596

Merged
merged 2 commits into from
Feb 28, 2018

Conversation

kuke
Copy link
Contributor

@kuke kuke commented Feb 27, 2018

@CLAassistant
Copy link

CLAassistant commented Feb 27, 2018

CLA assistant check
All committers have signed the CLA.

@kuke kuke requested a review from qingqing01 February 27, 2018 02:19
qingqing01
qingqing01 previously approved these changes Feb 27, 2018
Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kuke
Copy link
Contributor Author

kuke commented Feb 28, 2018

An example profiling on two GPUs:

-----------  Configuration Arguments -----------
batch_size: 32
device: GPU
feature_lst: data/feature.lst
first_batches_to_skip: 1
hidden_dim: 1024
label_lst: data/label.lst
learning_rate: 0.002
max_batch_num: 10
mean_var: data/global_mean_var_search26kHr
minimum_batch_size: 1
parallel: True
print_train_acc: False
proj_dim: 512
sorted_key: total
stacked_num: 5
------------------------------------------------
..........
Time consumed: 34.362017 s, performance: 2157.294773 frames/s.

------------------------->     Profiling Report     <-------------------------

Place: CUDA
Time unit: ms
Sorted by total time in descending order in the same thread

Event                             Calls       Total       Min.        Max.        Ave.
thread24::lstmp_grad              5           2156.81     430.872     432.05      431.362
thread24::elementwise_add_grad    7           105.169     4.76192     18.4784     15.0241
thread24::mul_grad                6           88.4601     6.09978     25.2523     14.7433
thread24::sequence_conv_grad      1           35.3374     35.3374     35.3374     35.3374
thread24::batch_norm_grad         6           34.1612     4.73926     9.38403     5.69354
thread24::softmax_grad            1           1.36        1.36        1.36        1.36
thread24::sigmoid_grad            6           1.24931     0.177888    0.351488    0.208219
thread24::cross_entropy_grad      1           0.200256    0.200256    0.200256    0.200256
thread24::fill_zeros_like         12          0.137472    0.009088    0.015584    0.011456
thread24::mean_grad               1           0.038112    0.038112    0.038112    0.038112
thread23::lstmp_grad              5           2490.33     496.883     498.61      498.066
thread23::elementwise_add_grad    7           108.446     4.86621     19.0566     15.4923
thread23::mul_grad                6           95.1479     6.37309     28.1987     15.858
thread23::sequence_conv_grad      1           38.4246     38.4246     38.4246     38.4246
thread23::batch_norm_grad         6           38.1619     5.34458     10.522      6.36031
thread23::softmax_grad            1           1.46688     1.46688     1.46688     1.46688
thread23::sigmoid_grad            6           1.3305      0.188288    0.376096    0.221749
thread23::cross_entropy_grad      1           0.208832    0.208832    0.208832    0.208832
thread23::fill_zeros_like         12          0.129952    0.00736     0.015552    0.0108293
thread23::mean_grad               1           0.041536    0.041536    0.041536    0.041536
thread22::lstmp_grad              5           2242.32     447.854     449.22      448.464
thread22::elementwise_add_grad    7           99.0354     4.41686     17.4535     14.1479
thread22::mul_grad                6           77.749      5.16925     23.0996     12.9582
thread22::sequence_conv_grad      1           31.5833     31.5833     31.5833     31.5833
thread22::batch_norm_grad         6           27.5149     3.84422     7.67088     4.58582
thread22::softmax_grad            1           1.20726     1.20726     1.20726     1.20726
thread22::sigmoid_grad            6           1.12208     0.159328    0.317216    0.187013
thread22::cross_entropy_grad      1           0.18112     0.18112     0.18112     0.18112
thread22::fill_zeros_like         12          0.130912    0.009248    0.014528    0.0109093
thread22::mean_grad               1           0.03888     0.03888     0.03888     0.03888
thread21::lstmp_grad              5           2149.55     428.957     430.708     429.91
thread21::elementwise_add_grad    7           103.995     4.63203     18.3186     14.8564
thread21::mul_grad                6           85.2573     5.84669     25.8327     14.2095
thread21::sequence_conv_grad      1           34.8651     34.8651     34.8651     34.8651
thread21::batch_norm_grad         6           32.3545     4.6224      8.86797     5.39242
thread21::softmax_grad            1           1.34835     1.34835     1.34835     1.34835
thread21::sigmoid_grad            6           1.22627     0.175488    0.34544     0.204379
thread21::cross_entropy_grad      1           0.196448    0.196448    0.196448    0.196448
thread21::fill_zeros_like         12          0.127424    0.008608    0.014112    0.0106187
thread21::mean_grad               1           0.03744     0.03744     0.03744     0.03744
thread20::lstmp_grad              10          4711.42     468.746     473.353     471.142
thread20::elementwise_add_grad    14          215.055     4.6328      19.6817     15.3611
thread20::mul_grad                12          184.552     5.79645     28.1579     15.3794
thread20::sequence_conv_grad      2           74.3613     34.9339     39.4274     37.1806
thread20::batch_norm_grad         12          72.4048     4.3704      11.3734     6.03374
thread20::softmax_grad            2           2.83718     1.30749     1.5297      1.41859
thread20::sigmoid_grad            12          2.59046     0.168672    0.394752    0.215872
thread20::cross_entropy_grad      2           0.419424    0.191168    0.228256    0.209712
thread20::fill_zeros_like         24          0.264736    0.00864     0.017024    0.0110307
thread20::mean_grad               2           0.081728    0.040256    0.041472    0.040864
thread19::lstmp_grad              5           1782.14     355.451     357.407     356.427
thread19::lstmp                   5           799.677     159.759     160.25      159.935
thread19::elementwise_add_grad    7           100.182     4.48403     17.6444     14.3117
thread19::mul_grad                6           80.0079     5.35302     23.605      13.3347
thread19::mul                     6           40.4019     3.00035     11.9796     6.73365
thread19::batch_norm              6           33.798      4.82144     9.27997     5.633
thread19::sequence_conv_grad      1           32.815      32.815      32.815      32.815
thread19::batch_norm_grad         6           27.9609     4.01056     7.63677     4.66015
thread19::sequence_conv           1           16.4751     16.4751     16.4751     16.4751
thread19::elementwise_add         7           5.58589     0.246496    0.98928     0.797984
thread19::softmax                 1           3.4921      3.4921      3.4921      3.4921
thread19::softmax_grad            1           1.22944     1.22944     1.22944     1.22944
thread19::sigmoid_grad            6           1.1264      0.159936    0.319808    0.187733
thread19::sigmoid                 6           0.884832    0.124416    0.240736    0.147472
thread19::top_k                   1           0.542624    0.542624    0.542624    0.542624
thread19::mean                    1           0.376256    0.376256    0.376256    0.376256
thread19::cross_entropy_grad      1           0.179072    0.179072    0.179072    0.179072
thread19::fill_zeros_like         12          0.133984    0.008704    0.015712    0.0111653
thread19::accuracy                1           0.03344     0.03344     0.03344     0.03344
thread19::mean_grad               1           0.029376    0.029376    0.029376    0.029376
thread19::cross_entropy           1           0.0136      0.0136      0.0136      0.0136
thread18::lstmp                   5           610         121.842     122.194     122
thread18::mul                     6           37.6209     2.73587     11.11       6.27015
thread18::batch_norm              6           30.2807     4.3201      8.46918     5.04678
thread18::sequence_conv           1           16.1363     16.1363     16.1363     16.1363
thread18::elementwise_add         7           6.4896      0.228736    1.18717     0.927086
thread18::softmax                 1           3.37264     3.37264     3.37264     3.37264
thread18::sigmoid                 6           0.804448    0.113536    0.221952    0.134075
thread18::top_k                   1           0.48624     0.48624     0.48624     0.48624
thread18::mean                    1           0.34736     0.34736     0.34736     0.34736
thread18::accuracy                1           0.033056    0.033056    0.033056    0.033056
thread18::cross_entropy           1           0.013152    0.013152    0.013152    0.013152
thread17::lstmp                   10          1616.74     161.314     162.349     161.674
thread17::mul                     12          87.1046     2.96032     13.778      7.25872
thread17::batch_norm              12          72.1755     4.9207      10.6914     6.01463
thread17::sequence_conv           2           34.4298     15.8944     18.5354     17.2149
thread17::elementwise_add         14          12.0404     0.281632    1.14659     0.860032
thread17::softmax                 2           7.30666     3.49574     3.81091     3.65333
thread17::sigmoid                 12          1.82912     0.121056    0.272096    0.152427
thread17::top_k                   2           1.14064     0.520416    0.620224    0.57032
thread17::mean                    2           0.807104    0.374208    0.432896    0.403552
thread17::accuracy                2           0.070144    0.035008    0.035136    0.035072
thread17::cross_entropy           2           0.02832     0.01392     0.0144      0.01416
thread16::lstmp_grad              10          5043.94     477.253     539.744     504.394
thread16::elementwise_add_grad    14          226.146     4.82192     20.9548     16.1533
thread16::mul_grad                12          199.032     6.26861     32.1024     16.586
thread16::batch_norm_grad         12          83.1491     5.1433      12.9305     6.92909
thread16::sequence_conv_grad      2           81.2186     37.163      44.0556     40.6093
thread16::softmax_grad            2           3.09757     1.42029     1.67728     1.54878
thread16::sigmoid_grad            12          2.81686     0.181568    0.431904    0.234739
thread16::cross_entropy_grad      2           0.445216    0.202912    0.242304    0.222608
thread16::fill_zeros_like         24          0.25744     0.007552    0.015488    0.0107267
thread16::mean_grad               2           0.093664    0.027968    0.065696    0.046832
thread15::lstmp_grad              5           1910.71     381.847     382.934     382.143
thread15::lstmp                   5           812.457     162.118     163.073     162.491
thread15::elementwise_add_grad    7           103.263     4.64061     18.1885     14.7519
thread15::mul_grad                6           83.2882     5.83606     25.5569     13.8814
thread15::mul                     6           42.9396     3.12915     12.7004     7.1566
thread15::batch_norm              6           36.9734     5.27818     10.2232     6.16223
thread15::sequence_conv_grad      1           34.7863     34.7863     34.7863     34.7863
thread15::batch_norm_grad         6           31.3185     4.48259     8.64573     5.21975
thread15::sequence_conv           1           16.2845     16.2845     16.2845     16.2845
thread15::elementwise_add         7           6.02077     0.268544    1.07331     0.86011
thread15::softmax                 1           3.64963     3.64963     3.64963     3.64963
thread15::softmax_grad            1           1.32163     1.32163     1.32163     1.32163
thread15::sigmoid_grad            6           1.20733     0.170816    0.338816    0.201221
thread15::sigmoid                 6           0.92432     0.128032    0.253856    0.154053
thread15::top_k                   1           0.556928    0.556928    0.556928    0.556928
thread15::mean                    1           0.400512    0.400512    0.400512    0.400512
thread15::cross_entropy_grad      1           0.1968      0.1968      0.1968      0.1968
thread15::fill_zeros_like         12          0.138784    0.008928    0.015552    0.0115653
thread15::mean_grad               1           0.039328    0.039328    0.039328    0.039328
thread15::accuracy                1           0.034272    0.034272    0.034272    0.034272
thread15::cross_entropy           1           0.014176    0.014176    0.014176    0.014176
thread14::lstmp                   10          1644.54     148.293     185.806     164.454
thread14::mul                     12          92.1109     3.00848     14.7966     7.67591
thread14::batch_norm              12          77.9084     4.97901     11.8627     6.49236
thread14::sequence_conv           2           36.2657     16.6147     19.651      18.1328
thread14::elementwise_add         14          12.9545     0.257952    1.26544     0.925319
thread14::softmax                 2           7.64016     3.5673      4.07286     3.82008
thread14::sigmoid                 12          1.96554     0.124128    0.299232    0.163795
thread14::top_k                   2           1.21805     0.55424     0.663808    0.609024
thread14::mean                    2           0.863232    0.386592    0.47664     0.431616
thread14::accuracy                2           0.0688      0.033984    0.034816    0.0344
thread14::cross_entropy           2           0.028064    0.013088    0.014976    0.014032
thread13::lstmp                   5           651.73      130.092     130.724     130.346
thread13::mul                     6           40.175      3.01802     11.726      6.69583
thread13::batch_norm              6           33.9073     4.87251     9.29398     5.65122
thread13::sequence_conv           1           15.8169     15.8169     15.8169     15.8169
thread13::elementwise_add         7           5.56566     0.2464      0.984832    0.795095
thread13::softmax                 1           3.47907     3.47907     3.47907     3.47907
thread13::sigmoid                 6           0.860864    0.120512    0.237152    0.143477
thread13::top_k                   1           0.534528    0.534528    0.534528    0.534528
thread13::mean                    1           0.375584    0.375584    0.375584    0.375584
thread13::accuracy                1           0.033792    0.033792    0.033792    0.033792
thread13::cross_entropy           1           0.014016    0.014016    0.014016    0.014016
thread12::lstmp_grad              10          4494.78     374.18      524.555     449.478
thread12::elementwise_add_grad    14          213.281     4.63341     19.4181     15.2343
thread12::mul_grad                12          181.659     5.76125     27.9171     15.1383
thread12::sequence_conv_grad      2           74.3573     34.9572     39.4001     37.1787
thread12::batch_norm_grad         12          71.2463     4.39386     10.6972     5.93719
thread12::softmax_grad            2           2.80701     1.312       1.49501     1.4035
thread12::sigmoid_grad            12          2.5815      0.171776    0.386848    0.215125
thread12::cross_entropy_grad      2           0.419232    0.200352    0.21888     0.209616
thread12::fill_zeros_like         24          0.260832    0.008704    0.016608    0.010868
thread12::mean_grad               2           0.106432    0.043168    0.063264    0.053216
thread11::lstmp_grad              5           1721.61     343.383     344.752     344.321
thread11::lstmp                   5           650.254     129.906     130.277     130.051
thread11::elementwise_add_grad    7           100.703     4.54022     17.7577     14.3862
thread11::mul_grad                6           79.7965     5.45507     23.7718     13.2994
thread11::mul                     6           40.4323     3           11.9368     6.73871
thread11::batch_norm              6           33.5067     4.7999      9.25626     5.58445
thread11::sequence_conv_grad      1           33.1946     33.1946     33.1946     33.1946
thread11::batch_norm_grad         6           29.7041     4.22605     8.19472     4.95068
thread11::sequence_conv           1           15.8937     15.8937     15.8937     15.8937
thread11::elementwise_add         7           7.08608     0.249664    1.29142     1.0123
thread11::softmax                 1           3.48118     3.48118     3.48118     3.48118
thread11::softmax_grad            1           1.26173     1.26173     1.26173     1.26173
thread11::sigmoid_grad            6           1.1695      0.165024    0.33152     0.194917
thread11::sigmoid                 6           0.865888    0.122208    0.238752    0.144315
thread11::top_k                   1           0.53664     0.53664     0.53664     0.53664
thread11::mean                    1           0.374816    0.374816    0.374816    0.374816
thread11::cross_entropy_grad      1           0.186208    0.186208    0.186208    0.186208
thread11::fill_zeros_like         12          0.129856    0.008576    0.014752    0.0108213
thread11::mean_grad               1           0.063072    0.063072    0.063072    0.063072
thread11::accuracy                1           0.033632    0.033632    0.033632    0.033632
thread11::cross_entropy           1           0.012992    0.012992    0.012992    0.012992
thread10::lstmp                   5           597.074     119.287     119.466     119.415
thread10::mul                     6           38.5494     2.81581     11.2257     6.4249
thread10::batch_norm              6           31.4628     4.52278     8.69075     5.2438
thread10::sequence_conv           1           16.2245     16.2245     16.2245     16.2245
thread10::elementwise_add         7           6.8559      0.269376    1.24259     0.979415
thread10::softmax                 1           3.41926     3.41926     3.41926     3.41926
thread10::sigmoid                 6           0.829088    0.117088    0.228288    0.138181
thread10::top_k                   1           0.519264    0.519264    0.519264    0.519264
thread10::mean                    1           0.35856     0.35856     0.35856     0.35856
thread10::accuracy                1           0.033728    0.033728    0.033728    0.033728
thread10::cross_entropy           1           0.01408     0.01408     0.01408     0.01408
thread9::lstmp                    10          1713.3      167.015     175.661     171.33
thread9::mul                      12          90.4381     3.26115     13.5263     7.53651
thread9::batch_norm               12          75.7455     5.31622     10.5745     6.31213
thread9::sequence_conv            2           36.567      18.2439     18.3231     18.2835
thread9::elementwise_add          14          12.5791     0.275424    1.1271      0.898507
thread9::softmax                  2           7.49744     3.71837     3.77907     3.74872
thread9::sigmoid                  12          1.90979     0.134272    0.268192    0.159149
thread9::top_k                    2           1.16906     0.578048    0.591008    0.584528
thread9::mean                     2           0.839712    0.415296    0.424416    0.419856
thread9::accuracy                 2           0.068672    0.034208    0.034464    0.034336
thread9::cross_entropy            2           0.028608    0.014144    0.014464    0.014304
thread8::lstmp_grad               10          4272.52     407.504     452.02      427.252
thread8::elementwise_add_grad     14          208.04      4.58851     18.6556     14.86
thread8::mul_grad                 12          175.941     5.72528     25.6164     14.6617
thread8::sequence_conv_grad       2           69.9841     34.0546     35.9294     34.992
thread8::batch_norm_grad          12          65.3465     4.43088     9.55971     5.44554
thread8::softmax_grad             2           2.70944     1.31514     1.3943      1.35472
thread8::sigmoid_grad             12          2.47258     0.17136     0.362592    0.206048
thread8::cross_entropy_grad       2           0.396704    0.189504    0.2072      0.198352
thread8::fill_zeros_like          24          0.25536     0.00736     0.015296    0.01064
thread8::mean_grad                2           0.091424    0.045408    0.046016    0.045712
thread7::lstmp_grad               5           1873.11     374.212     375.857     374.623
thread7::lstmp                    5           770.056     153.795     154.203     154.011
thread7::elementwise_add_grad     7           101.73      4.532       17.9313     14.5328
thread7::mul_grad                 6           82.5767     5.70058     23.7289     13.7628
thread7::mul                      6           37.1124     2.70205     10.8231     6.18541
thread7::sequence_conv_grad       1           33.7955     33.7955     33.7955     33.7955
thread7::batch_norm_grad          6           29.9662     4.23654     8.39731     4.99437
thread7::batch_norm               6           29.7999     4.29379     8.25901     4.96665
thread7::sequence_conv            1           15.5201     15.5201     15.5201     15.5201
thread7::elementwise_add          7           5.21747     0.227936    0.906336    0.745353
thread7::softmax                  1           3.33162     3.33162     3.33162     3.33162
thread7::softmax_grad             1           1.29683     1.29683     1.29683     1.29683
thread7::sigmoid_grad             6           1.18394     0.167552    0.3416      0.197323
thread7::sigmoid                  6           0.79536     0.113248    0.223072    0.13256
thread7::top_k                    1           0.48112     0.48112     0.48112     0.48112
thread7::mean                     1           0.343712    0.343712    0.343712    0.343712
thread7::cross_entropy_grad       1           0.186592    0.186592    0.186592    0.186592
thread7::fill_zeros_like          12          0.12592     0.008704    0.014592    0.0104933
thread7::mean_grad                1           0.03648     0.03648     0.03648     0.03648
thread7::accuracy                 1           0.033856    0.033856    0.033856    0.033856
thread7::cross_entropy            1           0.013152    0.013152    0.013152    0.013152
thread6::lstmp                    5           663.257     132.347     133.169     132.651
thread6::mul                      6           39.0453     2.91165     11.5308     6.50755
thread6::batch_norm               6           31.9485     4.56381     8.80547     5.32475
thread6::sequence_conv            1           15.7017     15.7017     15.7017     15.7017
thread6::elementwise_add          7           7.20931     0.256736    1.28586     1.0299
thread6::softmax                  1           3.42019     3.42019     3.42019     3.42019
thread6::sigmoid                  6           0.840448    0.118976    0.236224    0.140075
thread6::top_k                    1           0.525472    0.525472    0.525472    0.525472
thread6::mean                     1           0.365344    0.365344    0.365344    0.365344
thread6::accuracy                 1           0.033568    0.033568    0.033568    0.033568
thread6::cross_entropy            1           0.013664    0.013664    0.013664    0.013664
thread5::lstmp                    10          1465.1      142.875     150.161     146.51
thread5::mul                      12          82.1036     2.88003     12.4691     6.84197
thread5::batch_norm               12          68.7075     4.62544     9.99229     5.72562
thread5::sequence_conv            2           32.5207     16.2078     16.3129     16.2604
thread5::elementwise_add          14          13.1465     0.249088    1.25325     0.939035
thread5::softmax                  2           7.08294     3.47363     3.60931     3.54147
thread5::sigmoid                  12          1.77165     0.120544    0.252896    0.147637
thread5::top_k                    2           1.08448     0.532832    0.551648    0.54224
thread5::mean                     2           0.765856    0.372032    0.393824    0.382928
thread5::accuracy                 2           0.068032    0.03344     0.034592    0.034016
thread5::cross_entropy            2           0.027104    0.013376    0.013728    0.013552
thread4::lstmp_grad               5           1843.66     368.073     369.404     368.732
thread4::elementwise_add_grad     7           100.204     4.41642     19.064      14.3149
thread4::mul_grad                 6           75.8968     5.16458     22.2484     12.6495
thread4::sequence_conv_grad       1           31.4607     31.4607     31.4607     31.4607
thread4::batch_norm_grad          6           27.4629     3.86765     7.68384     4.57716
thread4::softmax_grad             1           1.20938     1.20938     1.20938     1.20938
thread4::sigmoid_grad             6           1.12061     0.157696    0.317792    0.186768
thread4::cross_entropy_grad       1           0.188096    0.188096    0.188096    0.188096
thread4::fill_zeros_like          12          0.130976    0.008864    0.015296    0.0109147
thread4::mean_grad                1           0.06768     0.06768     0.06768     0.06768
thread3::lstmp                    5           644.434     128.569     129.843     128.887
thread3::mul                      6           37.0888     2.76954     10.8988     6.18146
thread3::batch_norm               6           29.8327     4.28163     8.33174     4.97212
thread3::sequence_conv            1           15.4733     15.4733     15.4733     15.4733
thread3::elementwise_add          7           5.88432     0.229024    1.09133     0.840617
thread3::softmax                  1           3.33139     3.33139     3.33139     3.33139
thread3::sigmoid                  6           0.791776    0.110528    0.22        0.131963
thread3::top_k                    1           0.500096    0.500096    0.500096    0.500096
thread3::mean                     1           0.34544     0.34544     0.34544     0.34544
thread3::accuracy                 1           0.033056    0.033056    0.033056    0.033056
thread3::cross_entropy            1           0.013632    0.013632    0.013632    0.013632
thread2::lstmp_grad               5           2332.32     465.976     467.515     466.464
thread2::elementwise_add_grad     7           103.286     4.64861     18.1984     14.7551
thread2::mul_grad                 6           84.4223     5.8079      25.5446     14.0704
thread2::sequence_conv_grad       1           34.3947     34.3947     34.3947     34.3947
thread2::batch_norm_grad          6           32.0733     4.44723     8.84685     5.34555
thread2::softmax_grad             1           1.3231      1.3231      1.3231      1.3231
thread2::sigmoid_grad             6           1.2119      0.171744    0.337504    0.201984
thread2::cross_entropy_grad       1           0.193344    0.193344    0.193344    0.193344
thread2::fill_zeros_like          12          0.126912    0.008672    0.014752    0.010576
thread2::mean_grad                1           0.04624     0.04624     0.04624     0.04624
thread1::lstmp                    5           733.427     146.601     146.797     146.685
thread1::mul                      6           41.0479     3.03094     12.1364     6.84132
thread1::batch_norm               6           34.2666     4.888       9.48128     5.71109
thread1::sequence_conv            1           16.586      16.586      16.586      16.586
thread1::elementwise_add          7           7.30218     0.286688    1.31834     1.04317
thread1::softmax                  1           3.54822     3.54822     3.54822     3.54822
thread1::sigmoid                  6           0.878048    0.124512    0.243936    0.146341
thread1::top_k                    1           0.53072     0.53072     0.53072     0.53072
thread1::mean                     1           0.380256    0.380256    0.380256    0.380256
thread1::accuracy                 1           0.034176    0.034176    0.034176    0.034176
thread1::cross_entropy            1           0.01472     0.01472     0.01472     0.01472
thread0::momentum                 369         39.0572     0.009856    0.534912    0.105846
thread0::sum                      369         32.347      0.017152    0.423776    0.0876612
thread0::elementwise_mul          369         4.77469     0.009056    0.070368    0.0129395
thread0::mean                     18          0.622048    0.018688    0.132256    0.0345582
thread0::mean_grad                9           0.235008    0.021024    0.038176    0.026112

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But we will make the results more readable for multi-threads in the future.

@kuke kuke merged commit ace512a into PaddlePaddle:develop Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants