-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Non-blocking row_sparse_pull. Fix incorrect indices generated by device kvstore.row_sparse_pull #9887
Conversation
* draft * rm use_copy. fix dist kvstore. TODO: fix dtype * fix dtype, shape * remove reshape * cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just wanted to talk about if it is safe to use local variables for ndarrays. I see that at many places. The one I marked below is just one example. Is there a chance they can go out of scope, be freed, and cause segfaults?
} | ||
CHECK_EQ(num_vals, 1) << "RowSparsePull with multiple values is not supported yet"; | ||
NDArray& indices = target_val_rowids[0].second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to use local variable ndarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine, they're passed by value when pushed to engine
@reminisce could you help review? |
IType *dptr = out.data().dptr<IType>(); | ||
common::ParallelSort(dptr, dptr + num_elements, | ||
engine::OpenMP::Get()->GetRecommendedOMPThreadCount()); | ||
const size_t num_selected_out = std::unique(dptr, dptr + num_elements) - dptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique
only removes duplicates of elements present consecutively. Is that what's expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because inputs are sorted first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -76,7 +76,7 @@ def check_row_sparse_pull(kv, count): | |||
for i in range(count): | |||
vals.append(mx.nd.zeros(shape).tostype('row_sparse')) | |||
row_id = np.random.randint(num_rows, size=num_rows) | |||
row_ids.append(mx.nd.array(row_id)) | |||
row_ids.append(mx.nd.array(row_id).reshape((2,2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why reshape?
One suggestion: (2, 2) is too hard-coded. If shape is changed in the beginning, the test would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.. Made it less hard-coded by calculating the dimension based on total number of elements.
I'm testing 2-D rowids explicitly because that seems to be a common use case, many users forget to reshape it to 1-D before passing it to kvstore. It is now supported with this PR.
This PR also fixed a bug in the original GPU "Unique" implementation, where the same pointer is passed as both the input/output to cub::unique. This results in incorrect output when the number of rowids is large. Fixed by storing the input/output in separate buffers. |
@ZiyueHuang please view the changes for gpu unique |
@eric-haibin-lin Thanks for the fix! Looks good. |
… by device kvstore.row_sparse_pull (apache#9887)" This reverts commit 02dd89a.
…ce kvstore.row_sparse_pull (apache#9887) * nonblocking Kvstore (apache#195) * draft * rm use_copy. fix dist kvstore. TODO: fix dtype * fix dtype, shape * remove reshape * cleanup * fix compilation * rsp draft * update param name * doc update and small refactoring * minor updates * enhance test case with 2-D rowids * update gpu tests * rewrite gpu unique kernels * update gpu tests * update reshape test/ * fix lint * update test for py3
…ce kvstore.row_sparse_pull (apache#9887) * nonblocking Kvstore (apache#195) * draft * rm use_copy. fix dist kvstore. TODO: fix dtype * fix dtype, shape * remove reshape * cleanup * fix compilation * rsp draft * update param name * doc update and small refactoring * minor updates * enhance test case with 2-D rowids * update gpu tests * rewrite gpu unique kernels * update gpu tests * update reshape test/ * fix lint * update test for py3
…ce kvstore.row_sparse_pull (apache#9887) * nonblocking Kvstore (apache#195) * draft * rm use_copy. fix dist kvstore. TODO: fix dtype * fix dtype, shape * remove reshape * cleanup * fix compilation * rsp draft * update param name * doc update and small refactoring * minor updates * enhance test case with 2-D rowids * update gpu tests * rewrite gpu unique kernels * update gpu tests * update reshape test/ * fix lint * update test for py3
Description
This PR adds async execution support for kv.row_sparse_pull.
The operation was blocking because it requires unique row_ids, whose shape cannot be inferred ahead of time. This PR stores the unique row_ids in a row_sparse NDArray, whose data shape can be changed at run time when executed asynchronously.
Checklist
Essentials
make lint
)Changes
use_copy
param inBroadcastRowSparse
- this is essentially the same as callingBroadcast
CopyRetainedRowsToGPU
and always useSparseRetain
becauseCopyRetainedRowsToGPU
has high invocation overhead andSparseRetain
has improved performanceComments