Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Add compile flag to choose data type for tensor size due to performance degradation #371

Merged
merged 13 commits into from
Apr 12, 2019

Conversation

apeforest
Copy link
Contributor

Changing data type for index_t from 'uint32_ttoint64_t` caused performance degradation in operators defined in mshadow library.

I can think of three solutions to this problem:
(1) Add a compilation flag to choose data types for tensor size (This PR)
(2) Add an environment variable to choose data type for tensor size at runtime
(3) Choose data type for tensor size at runtime based on the size of the tensor

Due to the urgency of customer impact and the non-trivial change for approach (2) and (3), this PR is taking the quick fix of approach (1).

For more information and performance analysis, please refer to PR:
apache/mxnet#14570

@apeforest
Copy link
Contributor Author

@TaoLv @eric-haibin-lin @tqchen Please help to review.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I had an issue also with tensor size types and overflow.

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to understand that the change only affects the total size of tensor not the size of each dimension.

@apeforest
Copy link
Contributor Author

@TaoLv It affects the size of each dimension as well.

@TaoLv
Copy link
Member

TaoLv commented Mar 30, 2019

@apeforest Do you mean size of each dimension will also be defined as int64 if MSHADOW_INT64_TENSOR_SIZE=1 and the size may exceed INT32_MAX? If so, I'm afraid some functions in mshadow will be broken. See those gemm functions in dot_engine-inl.h. M/N/K there are explicitly defined as int.
/~https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L281

Copy link

@samskalicky samskalicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apeforest
Copy link
Contributor Author

@TaoLv Updated the gemm function signature with index_t. Please help to check if this addressed your concern.

@TaoLv
Copy link
Member

TaoLv commented Apr 3, 2019

Thank you @apeforest . Do we have any large dot/GEMM unit tests in MXNet for this change?

@apeforest
Copy link
Contributor Author

@apeforest
Copy link
Contributor Author

apeforest commented Apr 4, 2019

@szha @eric-haibin-lin @tqchen Could you please help to review/merge this PR? Thanks!

@TaoLv
Copy link
Member

TaoLv commented Apr 4, 2019

@apeforest Looks like M/N/K in that case are really small (eg. 2/2/3). I'm afraid it's not enough to test changes in this PR.

@apeforest
Copy link
Contributor Author

apeforest commented Apr 8, 2019

@TaoLv The main purpose of this PR is not to support large tensors using gemm engine. It is to fall back to int32 by default with a compilation flag. We need more thorough inspection of performance impact using int64 before we turning the flag on again.

index_t m, index_t n, index_t k, float alpha,
const float *A, index_t lda,
const float *B, index_t ldb, float beta,
float *C, index_t ldc) {
cublasStatus_t err = cublasSgemm(Stream<gpu>::GetBlasHandle(stream),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does lda, ldb, ldc support int64 in cublas?

cublasStatus_t cublasSgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const float           *alpha,
                           const float           *A, int lda,
                           const float           *B, int ldb,
                           const float           *beta,
                           float           *C, int ldc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I want to have a dot/GEMM unit test for large input tensors. Same problem may also exist for openblas and MKL.

@TaoLv
Copy link
Member

TaoLv commented Apr 9, 2019

Hi @apeforest , I feel okay if this flag only impacts the total tensor size as BLAS libraries should work well with that. But if it will also impact the size of each dimension, then we need more changes and validations to accommodate that. So this flag is not ready to be exposed to users.

@TaoLv
Copy link
Member

TaoLv commented Apr 9, 2019

@pengzhao-intel

@apeforest
Copy link
Contributor Author

Hi @apeforest , I feel okay if this flag only impacts the total tensor size as BLAS libraries should work well with that. But if it will also impact the size of each dimension, then we need more changes and validations to accommodate that. So this flag is not ready to be exposed to users.

@TaoLv The data type of each dimension in the tensor will be defined as index_t (if the flag is on then it is int64). However, it does not mean we will support the size in any dimension greater than INT32_MAX. We can only support the total element size greater than INT32_MAX in the tensor.

@TaoLv
Copy link
Member

TaoLv commented Apr 9, 2019

Thank you for the explanation @apeforest . Then seems we need document it somewhere and prevent users from passing a large tensor (dim[x] > INT32_MAX) to operators. If so I think there is no need to change the API in dot_engine.

@apeforest
Copy link
Contributor Author

Thank you for the explanation @apeforest . Then seems we need document it somewhere and prevent users from passing a large tensor (dim[x] > INT32_MAX) to operators. If so I think there is no need to change the API in dot_engine.

Makes sense. I have reverted the API signature in dot_engine.

mshadow/tensor.h Outdated Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants