-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataprovider #1395
Comments
这段代码我不是很熟,你试一下 |
@studyPaddle 在trainer_config下指定好test.list/train.list 后,dataprovider会:
|
这样说是不是意味着.list文件中不能是多个文件??? @Z-TAO |
@studyPaddle 听Z-TAO的意思是可以.list文件中可以存多行,每一行是一个文件名吧?
|
wangxicoding
pushed a commit
to wangxicoding/Paddle
that referenced
this issue
Dec 9, 2021
* fix bart perf * update fastergeneration doc * add img * add img * change img * update img * fix img * update docs * fix readme * update readme * fix perf * fix perf * fix modelling * fix perf and sample code * fix perf * fix perf * fix seq_len for gpt_sample * add forced eos token id for faster * upgrade perf and add forced eos token id * chenge stack to gather * add auto perf * minor fix * remove encoder change * Update bart_perf.py * Update bart_perf.py
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
你好,我是一个PaddlePaddle的初学者,在PaddlePaddle的文档中没有提到DataProvider的那个process函数如果需要传入多个数据文件的,怎么处理,如果train.list中有好多行,需要传入多个数据文件怎么处理呢?
from paddle.trainer.PyDataProvider2 import *
Define a py data provider
@Provider(input_types=[dense_vector(28 * 28), integer_value(10)])
def process(settings, filename): # settings is not used currently.
f = open(filename, 'r') # open one of training file
for line in f: # read each line
label, pixel = line.split(';')
# get features and label
pixels_str = pixel.split(' ')
pixels_float = []
for each_pixel_str in pixels_str:
pixels_float.append(float(each_pixel_str))
# give data to paddle.
yield pixels_float, int(label)
文档中给的这个例子只有传入一个数据文件的情况,请问如何传入多个数据文件?
The text was updated successfully, but these errors were encountered: