Low GPU utilization on python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py #7652

guru4elephant · 2018-01-18T07:55:17Z

Compile Version：0.11.0
Device：GPU，a P40 card
Scripts：python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py
I changed the scripts on line 178 from
place = fluid.CPUPlace() to
place = fluid.CUDAPlace(0)
This example can run normally, but the GPU utilization is only about 20% on a single card. I guess there should be some ops not running on GPU. Is there anyway I can check whether all ops are running on GPU devices?

jacquesqiao · 2018-01-18T08:38:29Z

yes, now the linear_chain_crf and crf_decoding operators can only run on CPU, the framework will automatically copy memory from GPU to CPU if it find some operators can only run on CPU but the input tensors are on GPU

jacquesqiao · 2018-01-18T08:43:11Z

I have tested the performance, when running on CUDAPlace, the speed will be twice of running on CPUPlace. You can have a check.

lcy-seso · 2018-01-18T08:44:37Z

@jacquesqiao I will remove the copy operation in the linear_chain_crf op to avoid this repeated copy.

related to #7654

lcy-seso · 2018-01-19T06:31:24Z

remove the memory copy inside the linear_chain_crf_op inn this PR #7675 . This will fix the problem that both the framework and the operator itself copys inputs from GPU memory and copy memory back to GPU memory when this operator is running on GPU.

jshower · 2018-03-21T03:33:59Z

我们使用profile的工具看了一下不同op执行的时间，结果如下。

Event                             Calls       Total       Min.        Max.        Ave.        
thread0::linear_chain_crf_grad    12          12908.2     128.992     1273.86     1075.68     
thread0::crf_decoding             12          3697        36.998      370.332     308.084     
thread0::linear_chain_crf         12          3407.43     33.9113     346.214     283.953     
thread0::elementwise_add_grad     216         2215.51     0.483488    13.0267     10.257      
thread0::mul_grad                 288         971.093     0.516384    8.40723     3.37185     
thread0::lstm_grad                72          924.668     10.1028     13.9453     12.8426     
thread0::lstm                     96          598.993     2.96118     11.5723     6.23952     
thread0::mul                      384         456.164     0.08368     3.00365     1.18793     
thread0::sum                      321         406.066     0.066144    5.77043     1.265       
thread0::scale                    360         168.86      0.045376    4.14739     0.469056    
thread0::tanh_grad                216         112.893     0.067488    0.622016    0.522653    
thread0::tanh                     288         110.25      0.016864    0.50464     0.382813    
thread0::elementwise_add          288         109.787     0.02304     0.568384    0.381203    
thread0::sgd                      678         30.7195     0.008288    0.347968    0.045309    
thread0::lookup_table_grad        72          23.3073     0.162976    0.73408     0.323712    
thread0::lookup_table             96          21.4618     0.06256     0.349184    0.223561    
thread0::chunk_eval               12          19.0456     0.26656     1.97776     1.58713     
thread0::mean                     9           2.16704     0.224384    0.268192    0.240782    
thread0::elementwise_mul          30          1.55699     0.009568    0.103264    0.0518997   
thread0::mean_grad                12          1.38957     0.100224    0.169664    0.115797

可以看到crf有关的op执行的时间是占整个过程的大部分时间的。这是因为crf是在cpu上运行的，后续可以通过提供crf的gpu实现来解决这个问题。

lcy-seso self-assigned this Jan 18, 2018

guru4elephant assigned wangkuiyi and unassigned lcy-seso Jan 19, 2018

lcy-seso assigned lcy-seso and unassigned wangkuiyi Jan 19, 2018

jshower closed this as completed Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low GPU utilization on python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py #7652

Low GPU utilization on python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py #7652

guru4elephant commented Jan 18, 2018 •

edited

Loading

jacquesqiao commented Jan 18, 2018 •

edited by lcy-seso

Loading

jacquesqiao commented Jan 18, 2018

lcy-seso commented Jan 18, 2018 •

edited

Loading

lcy-seso commented Jan 19, 2018

jshower commented Mar 21, 2018 •

edited

Loading

Low GPU utilization on python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py #7652

Low GPU utilization on python/paddle/v2/fluid/tests/book/test_label_semantic_roles.py #7652

Comments

guru4elephant commented Jan 18, 2018 • edited Loading

jacquesqiao commented Jan 18, 2018 • edited by lcy-seso Loading

jacquesqiao commented Jan 18, 2018

lcy-seso commented Jan 18, 2018 • edited Loading

lcy-seso commented Jan 19, 2018

jshower commented Mar 21, 2018 • edited Loading

guru4elephant commented Jan 18, 2018 •

edited

Loading

jacquesqiao commented Jan 18, 2018 •

edited by lcy-seso

Loading

lcy-seso commented Jan 18, 2018 •

edited

Loading

jshower commented Mar 21, 2018 •

edited

Loading