Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StringTensor #39830

Merged
merged 65 commits into from
Mar 26, 2022
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
e43bf93
add string tensor and case convert kernels
joey12300 Jan 27, 2022
ecc0cbc
Add strings empty kernel; Reorganize the structure of case convert ke…
joey12300 Jan 29, 2022
de898cd
Add string infermeta
joey12300 Jan 29, 2022
f1c8c4b
Update mutable_data of string tensor
joey12300 Jan 31, 2022
a4f528c
merge develop
joey12300 Feb 10, 2022
e99a9dd
rename kernel name
joey12300 Feb 15, 2022
381c7ff
add string copy tmp
joey12300 Feb 18, 2022
374f253
Fix strings copy device bug
joey12300 Feb 19, 2022
da218cd
add utf8 gpu converter
joey12300 Feb 20, 2022
2cc6bb0
add string tensor c++ api
joey12300 Feb 21, 2022
c87eabb
Merge remote-tracking branch 'upstream' into add_string_tensor
joey12300 Feb 23, 2022
d94b946
Remove mutable_data of string tensor
joey12300 Feb 25, 2022
b743556
update string tensor interface
joey12300 Mar 3, 2022
0d5b397
remove charcases_flag.h
joey12300 Mar 3, 2022
14eb07d
Merge branch 'develop' into add_string_tensor
joey12300 Mar 8, 2022
57402b8
remove some fluid headers
joey12300 Mar 8, 2022
ffe8705
Add make_ddim
joey12300 Mar 8, 2022
c16f1ba
__HIPCC__ -> PADDLE_WITH_HIP
joey12300 Mar 8, 2022
a9928ca
remove fluid headers
joey12300 Mar 8, 2022
f6d5520
fix cpu compile
joey12300 Mar 9, 2022
d7d16a8
remove std::hash
joey12300 Mar 9, 2022
f623b60
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 9, 2022
986a10b
fix cmakelist conflics
joey12300 Mar 9, 2022
fa82d16
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 10, 2022
0d5bcaa
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 10, 2022
618b4f7
Fix cudaMalloc
joey12300 Mar 12, 2022
ebacdbd
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 12, 2022
74b1428
Remove strings/impl directory
joey12300 Mar 12, 2022
27df753
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 13, 2022
b0b84a1
Fix infrt/get_phi_kernel_info.py;Add custom_kernels deps
joey12300 Mar 13, 2022
683f51b
Add empty kernel test
joey12300 Mar 14, 2022
e13f7f8
Remove some comments
joey12300 Mar 15, 2022
e7594c3
Modify lower/upper api encoding type: string->bool
joey12300 Mar 15, 2022
b710687
STRING->PSTRING; Add CreateInferLikeMeta
joey12300 Mar 15, 2022
57726f2
Add code gen for C++ String API
joey12300 Mar 16, 2022
def03bb
merge develop
joey12300 Mar 16, 2022
7ba9a5d
remove strings_api_utils.h
joey12300 Mar 16, 2022
d8ede63
Add ignore file (strings_api.h, strings_api.cc)
joey12300 Mar 16, 2022
00bfcd2
update strings gen script
joey12300 Mar 16, 2022
8777dad
merge develop
joey12300 Mar 17, 2022
40906f8
change args order of case convert kernels
joey12300 Mar 17, 2022
e810f7d
Add comments for pstring, StringTensor
joey12300 Mar 17, 2022
8aa1225
cpstring_internal.h -> cpstring_impl.h
joey12300 Mar 17, 2022
3966a89
Update accordding to comments:
joey12300 Mar 17, 2022
afd509a
Remove all singletons in strings kernels
joey12300 Mar 17, 2022
43b449b
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 18, 2022
cd5873a
fix rocm compile
joey12300 Mar 18, 2022
a326602
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 19, 2022
a9de65f
Fix py3 compile
joey12300 Mar 19, 2022
c32e589
Fix c++ coverage
joey12300 Mar 21, 2022
30f5661
1. Add pstring proto type
joey12300 Mar 21, 2022
be3d5eb
DataLayout::PSTRING -> DataLayout::PSTRING_UNION
joey12300 Mar 21, 2022
b771d39
Register pstring data type
joey12300 Mar 22, 2022
87eadf9
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 22, 2022
23a6d2d
Fix strings api gen
joey12300 Mar 22, 2022
0d2845c
merge
joey12300 Mar 23, 2022
8d141eb
Fix dense tensor register pstring dtype
joey12300 Mar 23, 2022
6302677
Fix error messages
joey12300 Mar 23, 2022
bef8cbe
merge
joey12300 Mar 24, 2022
e2b962a
remove line
joey12300 Mar 25, 2022
5d22e58
add pstring unittest
joey12300 Mar 25, 2022
972ca57
Merge branch 'develop' of /~https://github.com/PaddlePaddle/Paddle into…
joey12300 Mar 25, 2022
67f8fb7
remove test string api unitest
joey12300 Mar 25, 2022
6f8decf
remove empty line
joey12300 Mar 25, 2022
71a8d1a
Remove some headers to decrease the size of executable file
joey12300 Mar 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ paddle/phi/api/backward/backward_api.h
paddle/phi/api/backward/sparse_bw_api.h
paddle/phi/api/include/api.h
paddle/phi/api/include/sparse_api.h
paddle/phi/api/include/strings_api.h
paddle/phi/api/lib/api.cc
paddle/phi/api/lib/dygraph_api.*
paddle/phi/api/lib/backward_api.cc
paddle/phi/api/lib/sparse_api.cc
paddle/phi/api/lib/strings_api.cc
paddle/phi/api/lib/sparse_bw_api.cc
paddle/phi/extension.h
paddle/phi/include/*
Expand Down
12 changes: 11 additions & 1 deletion cmake/phi.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ function(kernel_library TARGET)
string(REGEX MATCHALL "#include \"paddle\/phi\/kernels\/[a-z0-9_]+_kernel.h\"" include_kernels ${target_content})
else()
string(REGEX MATCHALL "#include \"paddle\/phi\/kernels\/${kernel_library_SUB_DIR}\/[a-z0-9_]+_kernel.h\"" include_kernels ${target_content})
string(REGEX MATCHALL "#include \"paddle\/phi\/kernels\/[a-z0-9_]+_kernel.h\"" include_dense_kernels ${target_content})
endif()
foreach(include_kernel ${include_kernels})
if ("${kernel_library_SUB_DIR}" STREQUAL "")
Expand All @@ -183,6 +184,16 @@ function(kernel_library TARGET)
string(REGEX REPLACE ".h\"" "" kernel_name ${kernel_name})
list(APPEND kernel_deps ${kernel_name})
endforeach()

if (NOT "${kernel_library_SUB_DIR}" STREQUAL "")
foreach(include_dense_kernel ${include_dense_kernels})
string(REGEX REPLACE "#include \"paddle\/phi\/kernels\/" "" kernel_name ${include_dense_kernel})
string(REGEX REPLACE ".h\"" "" kernel_name ${kernel_name})
list(APPEND kernel_deps ${kernel_name})
endforeach()
endif()


endforeach()
list(REMOVE_DUPLICATES kernel_deps)
list(REMOVE_ITEM kernel_deps ${TARGET})
Expand Down Expand Up @@ -331,7 +342,6 @@ function(register_kernels)
file(GLOB KERNELS RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" "*_kernel.h")
string(REPLACE ".h" "" KERNELS "${KERNELS}")
list(LENGTH register_kernels_DEPS register_kernels_DEPS_len)

foreach(target ${KERNELS})
list(FIND register_kernels_EXCLUDES ${target} _index)
if (${_index} EQUAL -1)
Expand Down
2 changes: 2 additions & 0 deletions paddle/fluid/framework/convert_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ paddle::framework::proto::VarType::Type TransToProtoVarType(
return paddle::framework::proto::VarType::BF16;
case DataType::BOOL:
return paddle::framework::proto::VarType::BOOL;
case DataType::PSTRING:
return paddle::framework::proto::VarType::RAW;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块为什么返回RAW类型的数据了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加PSTRING prototype,改为返回paddle::framework::proto::VarType::RAW

default:
PADDLE_THROW(paddle::platform::errors::Unimplemented(
"Unsupported data type `%s` when casting it into "
Expand Down
2 changes: 1 addition & 1 deletion paddle/phi/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ add_subdirectory(tools)
add_subdirectory(tests)

# make an unity target for compile deps
set(PHI_DEPS convert_utils dense_tensor phi_context kernel_factory kernel_context arg_map_context infermeta lod_utils op_compat_infos sparse_csr_tensor sparse_coo_tensor)
set(PHI_DEPS convert_utils dense_tensor phi_context kernel_factory kernel_context arg_map_context infermeta lod_utils op_compat_infos sparse_csr_tensor sparse_coo_tensor string_tensor)
get_property(phi_kernels GLOBAL PROPERTY PHI_KERNELS)
set(PHI_DEPS ${PHI_DEPS} ${phi_kernels})

Expand Down
2 changes: 1 addition & 1 deletion paddle/phi/api/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
add_subdirectory(lib)
cc_library(phi_api SRCS all.cc DEPS phi_function_api phi_bw_function_api sparse_api sparse_bw_api)
cc_library(phi_api SRCS all.cc DEPS phi_function_api phi_bw_function_api sparse_api sparse_bw_api strings_api)
23 changes: 22 additions & 1 deletion paddle/phi/api/lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ set(sparse_bw_api_source_file ${CMAKE_SOURCE_DIR}/paddle/phi/api/lib/sparse_bw_a
set(sparse_bw_api_header_file_tmp ${sparse_bw_api_header_file}.tmp)
set(sparse_bw_api_source_file_tmp ${sparse_bw_api_source_file}.tmp)

# strings api file
set(strings_api_gen_file ${CMAKE_SOURCE_DIR}/python/paddle/utils/code_gen/strings_api_gen.py)
set(strings_api_yaml_file ${CMAKE_SOURCE_DIR}/python/paddle/utils/code_gen/strings_api.yaml)
set(strings_api_header_file ${CMAKE_SOURCE_DIR}/paddle/phi/api/include/strings_api.h)
set(strings_api_source_file ${CMAKE_SOURCE_DIR}/paddle/phi/api/lib/strings_api.cc)
set(strings_api_header_file_tmp ${strings_api_header_file}.tmp)
set(strings_api_source_file_tmp ${strings_api_source_file}.tmp)

# wrapped infermeta file
set(wrapped_infermeta_gen_file ${CMAKE_SOURCE_DIR}/python/paddle/utils/code_gen/wrapped_infermeta_gen.py)
set(api_yaml_file ${CMAKE_SOURCE_DIR}/python/paddle/utils/code_gen/api.yaml)
Expand Down Expand Up @@ -123,6 +131,19 @@ add_custom_command(
DEPENDS ${sparse_bw_api_yaml_file} ${sparse_bw_api_gen_file} ${api_gen_base} ${api_gen_file} ${sparse_api_gen_file} ${bw_api_gen_file}
VERBATIM)

# generate strings api
add_custom_command(
OUTPUT ${strings_api_header_file} ${strings_api_source_file}
COMMAND ${PYTHON_EXECUTABLE} ${strings_api_gen_file}
--api_yaml_path ${strings_api_yaml_file}
--api_header_path ${strings_api_header_file_tmp}
--api_source_path ${strings_api_source_file_tmp}
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${strings_api_header_file_tmp} ${strings_api_header_file}
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${strings_api_source_file_tmp} ${strings_api_source_file}
COMMENT "copy_if_different ${strings_api_header_file} ${strings_strings_api_source_file}"
DEPENDS ${strings_api_yaml_file} ${strings_api_gen_file} ${api_gen_base} ${api_gen_file}
VERBATIM)

# generate wrapped infermeta
add_custom_command(
OUTPUT ${wrapped_infermeta_header_file} ${wrapped_infermeta_source_file}
Expand All @@ -147,5 +168,5 @@ cc_library(phi_dygraph_api SRCS ${dygraph_api_source_file} DEPS phi_tensor_raw p
cc_library(phi_bw_function_api SRCS ${bw_api_source_file} DEPS phi_tensor_raw phi kernel_dispatch api_gen_utils backward_infermeta phi_data_transform phi_function_api api_custom_impl)
cc_library(sparse_api SRCS ${sparse_api_source_file} DEPS phi_tensor_raw phi kernel_dispatch api_gen_utils sparse_api_custom_impl)
cc_library(sparse_bw_api SRCS ${sparse_bw_api_source_file} DEPS phi_tensor_raw phi kernel_dispatch api_gen_utils sparse_api sparse_api_custom_impl)

cc_library(strings_api SRCS ${strings_api_source_file} DEPS phi_tensor_raw phi kernel_dispatch api_gen_utils phi_data_transform)
cc_library(phi_tensor SRCS tensor_method.cc DEPS phi_tensor_raw phi_function_api api_gen_utils kernel_dispatch infermeta)
1 change: 1 addition & 0 deletions paddle/phi/api/lib/api_declare.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ limitations under the License. */

PD_DECLARE_API(Math);
PD_DECLARE_API(SparseApi);
PD_DECLARE_API(StringsApi);
23 changes: 23 additions & 0 deletions paddle/phi/api/lib/api_gen_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ std::shared_ptr<phi::SelectedRows> TensorToSelectedRows(
return nullptr;
}

std::shared_ptr<phi::StringTensor> TensorToStringTensor(const Tensor& tensor) {
return std::dynamic_pointer_cast<phi::StringTensor>(tensor.impl());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有太get这个函数的主要功能是什么?

/* ----------------- for infer_meta --------------------- */

phi::MetaTensor MakeMetaTensor(const phi::DenseTensor& tensor) {
Expand Down Expand Up @@ -92,6 +96,10 @@ paddle::optional<phi::MetaTensor> MakeMetaTensor(
return {paddle::none};
}

phi::MetaTensor MakeMetaTensor(const phi::StringTensor& tensor) {
return phi::MetaTensor(tensor);
}

/* ------------------ for output ----------------------- */

phi::DenseTensor* SetKernelOutput(Backend backend, Tensor* out) {
Expand Down Expand Up @@ -148,5 +156,20 @@ phi::TensorBase* SetSparseKernelOutput(Tensor* out, TensorType type) {
return out->impl().get();
}

phi::TensorBase* SetStringsKernelOutput(Backend backend,
Tensor* out,
TensorType type) {
if (!out->initialized()) {
auto place = phi::TransToPhiPlace(backend);
if (type == TensorType::STRING_TENSOR) {
auto strings_tensor = std::make_shared<phi::StringTensor>(
paddle::memory::AllocShared(place, 0), phi::StringTensorMeta());
out->set_impl(strings_tensor);
return strings_tensor.get();
}
}
return out->impl().get();
}

} // namespace experimental
} // namespace paddle
11 changes: 10 additions & 1 deletion paddle/phi/api/lib/api_gen_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,12 @@ limitations under the License. */
#include "paddle/phi/core/selected_rows.h"
#include "paddle/phi/core/sparse_coo_tensor.h"
#include "paddle/phi/core/sparse_csr_tensor.h"
#include "paddle/phi/core/string_tensor.h"

namespace paddle {
namespace experimental {

enum class TensorType { DENSE_TENSOR, SPARSE_CSR, SPARSE_COO };
enum class TensorType { DENSE_TENSOR, SPARSE_CSR, SPARSE_COO, STRING_TENSOR };

/* ------------------ for input ----------------------- */

Expand All @@ -43,6 +44,8 @@ std::shared_ptr<phi::SelectedRows> TensorToSelectedRows(const Tensor& tensor);
std::shared_ptr<phi::SelectedRows> TensorToSelectedRows(
const paddle::optional<Tensor>& tensor);

std::shared_ptr<phi::StringTensor> TensorToStringTensor(const Tensor& tensor);

/* ----------------- for infer_meta --------------------- */

phi::MetaTensor MakeMetaTensor(const phi::DenseTensor& tensor);
Expand All @@ -58,6 +61,8 @@ phi::MetaTensor MakeMetaTensor(const phi::SelectedRows& tensor);
paddle::optional<phi::MetaTensor> MakeMetaTensor(
const paddle::optional<const phi::SelectedRows&>& tensor);

phi::MetaTensor MakeMetaTensor(const phi::StringTensor& tensor);

/* ------------------ for output ----------------------- */

phi::DenseTensor* SetKernelOutput(Backend backend, Tensor* out);
Expand All @@ -70,5 +75,9 @@ phi::SelectedRows* SetSelectedRowsKernelOutput(Backend backend, Tensor* out);

phi::TensorBase* SetSparseKernelOutput(Tensor* out, TensorType type);

phi::TensorBase* SetStringsKernelOutput(Backend backend,
Tensor* out,
TensorType type);

} // namespace experimental
} // namespace paddle
2 changes: 1 addition & 1 deletion paddle/phi/api/lib/utils/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
cc_library(phi_api_utils SRCS storage.cc tensor_utils.cc DEPS
tensor_base convert_utils dense_tensor lod_tensor selected_rows_utils place var_type_traits scalar)
tensor_base convert_utils dense_tensor lod_tensor selected_rows_utils place var_type_traits scalar string_tensor)
Loading