-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cross-compiling support for arm architecture. #1698
Conversation
cmake/simd.cmake
Outdated
float32x4_t b = {1.0f, 2.0f, 3.0f, 4.0f}; | ||
float32x4_t c = vaddq_f32(a, b); | ||
return 0; | ||
}" NEON_FOUND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方很奇怪,加上编译器对NEON
指令的检查后,PC上cmake时,会出现如下错误
-- Looking for UINT64_MAX
-- Looking for UINT64_MAX - not found
-- Looking for UINT64_MAX
-- Looking for UINT64_MAX - not found
CMake Error at cmake/flags.cmake:82 (message):
Cannot find symbol UINT64_MAX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix了。主要是因为最后一次设置了set(CMAKE_REQUIRED_FLAGS ${NEON_FLAG})
,后面做其他check时,都会使用CMAKE_REQUIRED_FLAGS
来编译。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我是了一下这个PR,还是有 Cannot find symbol UINT64_MAX报错。
另外,这个是在flags.cmake:83报错的,跟这段NEON检查有关系?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你看一下build/CMakeFiles/CMakeError.log
这个文件,看看是什么错误信息啊?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,是CMakeCache.txt没有清空导致的。不过,这里应该是将CMAKE_REQUIRED_FLAGS赋回原来的值,而不是清空为好。
CMakeLists.txt
Outdated
@@ -65,6 +64,7 @@ include(external/openblas) # download, build, install openblas | |||
include(external/swig) # download, build, install swig | |||
include(external/warpctc) # download, build, install warpctc | |||
|
|||
include(simd) # set simd flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 31获取不到AVX_FOUND
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix了。
paddle/math/SIMDFunctions.cpp
Outdated
@@ -13,10 +13,12 @@ See the License for the specific language governing permissions and | |||
limitations under the License. */ | |||
|
|||
#include "SIMDFunctions.h" | |||
#ifdef __SSE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
太多地方出现__SSE__/AVX/__ARM_NEON__这种宏了,需要把这些代码单独梳理一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmake/simd.cmake
Outdated
@@ -73,4 +76,26 @@ int main() | |||
return 0; | |||
}" AVX2_FOUND) | |||
|
|||
mark_as_advanced(MMX_FOUND SSE2_FOUND SSE3_FOUND AVX_FOUND AVX2_FOUND) | |||
# Check NEON | |||
set(CMAKE_REQUIRED_FLAGS ${NEON_FLAG}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一般情况下都是用交叉编译吧,这段check需要吗?
@@ -163,8 +168,12 @@ void initMain(int argc, char** argv) { | |||
|
|||
installProfilerSwitch(); | |||
|
|||
#ifdef __SSE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM里面flush_to_zero是怎么做的?
paddle/utils/arch/linux/Locks.cpp
Outdated
inline SpinLockPrivate() { pthread_spin_init(&lock_, 0); } | ||
inline ~SpinLockPrivate() { pthread_spin_destroy(&lock_); } | ||
inline SpinLockPrivate() { | ||
#ifndef __ANDROID__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种每个函数里面都加一个宏的注释,基本破坏了整个代码的可读性。这里还不如把整个Locks注释掉。
@@ -19,7 +19,7 @@ limitations under the License. */ | |||
/// for MSVC | |||
#define CPUID(info, x) __cpuidex(info, x, 0) | |||
|
|||
#else | |||
#elif !defined(__ANDROID__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM环境就不用编译这个CpuId.cpp文件了,这里也不用引入__ANDROID__宏。
See the License for the specific language governing permissions and | ||
limitations under the License. */ | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件与hl_sse_matrix_kernel.cuh看起来是可以合成一个的。
paddle/math/SIMDFunctions.cpp
Outdated
@@ -13,10 +13,12 @@ See the License for the specific language governing permissions and | |||
limitations under the License. */ | |||
|
|||
#include "SIMDFunctions.h" | |||
#ifdef __SSE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
太多地方出现__SSE__/AVX/__ARM_NEON__这种宏了,需要把这些代码单独梳理一下。
using the same implemetation as mac os.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow这个PR可以编译出ARM版本,不过还有几个问题需要确认一下。
- 在Android API level 19下面编译会缺少一些符号,比如rand;level 21是可以的;所以,Paddle后续在Android上支持的最低版是21?
- cmake/external/gflags.cmake等需要增加-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}和-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}否则无法透传cmake .. -DCMAKE_C_COMPILER=...指定的较差编译环境。
- 由于protobuf的编译问题,当前没法直接cmake & make;
cmake/simd.cmake
Outdated
float32x4_t b = {1.0f, 2.0f, 3.0f, 4.0f}; | ||
float32x4_t c = vaddq_f32(a, b); | ||
return 0; | ||
}" NEON_FOUND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,是CMakeCache.txt没有清空导致的。不过,这里应该是将CMAKE_REQUIRED_FLAGS赋回原来的值,而不是清空为好。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR是确保可以build PaddlePaddle for ARM吗?如果是这样,应该更新(或者增加)对应的文档?
void SpinLock::lock() { m->lock(); } | ||
void SpinLock::unlock() { m->unlock(); } | ||
|
||
#ifdef PADDLE_USE_PTHREAD_BARRIER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件编译是为支持ARM吗?我感觉这里需要一个comment说明为什么引入这个条件编译。
@@ -36,36 +40,101 @@ void Semaphore::wait() { sem_wait(&m->sem); } | |||
|
|||
void Semaphore::post() { sem_post(&m->sem); } | |||
|
|||
#ifdef PADDLE_USE_PTHREAD_SPINLOCK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件编译是为支持ARM吗?我感觉这里需要一个comment说明为什么引入这个条件编译。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM和MAC的pthread
库里面都没有pthread_spinlock_t
和pthread_barrier_t
这两个类型以及对应的接口函数,这里我考虑有两种方式实现:
- 在cmake里面检查
pthread_spinlock_t
和pthread_barrier_t
这个两个变量是否存在,存在则定义宏PADDLE_USE_PTHREAD_SPINLOCK
和PADDLE_USE_PTHREAD_BARRIER
,#else ... #endif
里面采用paddle/utils/arch/osx/Locks.cpp
里面的实现,后期可以考虑将这两个Locks.cpp
文件合并。 - 新增
paddle/utils/arch/android/Locks.cpp
,采用paddle/utils/arch/linux/Locks.cpp
里面的SemaphorePrivate
实现和paddle/utils/arch/osx/Locks.cpp
里面的SpinLockPrivate
、ThreadBarrierPrivate
实现。
或者其他建议。。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果 SpinLockPrivate、ThreadBarrierPrivate 是可以自己实现,不依赖pthread的,是不是可以在各种情况下都自己实现?
如果是两种情况里选择一种,貌似第二种更容易看明白(如果和 arch/osx/Locks.cpp 没有太多代码重复的话)。
有一个建议: arch/osx 和 arch/android 这两个名字不合理,因为 osx 和 android 都不是 arch,而是 os。arm 和 x86 和 x64 是 arch。如果这个不适合在这个pr里修改,可否创建一个issue提醒改改目录名?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
列举一下Locks.cpp里面实现的内容和方式吧:
SemaphonePrivate | SpinLockPrivate | ThreadBarrierPrivate | |
---|---|---|---|
linux | sem_t | pthread_spinlock_t | pthread_barrier_t |
android | sem_t | std::atomic_flag | pthread_mutex_t & pthread_cond_t |
osx | dispatch_semaphore_t (mac独有) | std::atomic_flag | pthread_mutex_t & pthread_cond_t |
其中,使用std::atomic_flag
实现的SpinLockPrivate
和使用pthread_mutex_t & pthread_cond_t
实现的ThreadBarrierPrivate
为@gangliao 针对mac系统实现,我看android系统上可用,就拿过来用了。
arch/osx 和 arch/android 这两个名字不合理,因为 osx 和 android 都不是 arch,而是 os。arm 和 x86 和 x64 是 arch。如果这个不适合在这个pr里修改,可否创建一个issue提醒改改目录名?
好的。 #1728
是的,需要有How to build这个益群好像已经在写了吧?这个PR实际上解决的是ARM+Android的编译,其他ARM环境(Linux + ARM)的编译需要基于这个PR继续Fix。 |
include(system) | ||
|
||
if(ANDROID) | ||
cmake_minimum_required(VERSION 3.7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个需要最低3.7?我这边是3.2.2的也是可以编译的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你用的是编译方式2吧,如果使用编译方式1,cmake3.2.2会出现如下Warning,而且不会自动设置编译选项:
CMake Warning:
Manually-specified variables were not used by the project:
CMAKE_ANDROID_ARCH_ABI
CMAKE_ANDROID_ARM_MODE
CMAKE_ANDROID_ARM_NEON
CMAKE_ANDROID_STANDALONE_TOOLCHAIN
这些系统cmake变量是在3.7版本后才加入的。(cmake-toolchains文档)
API level 19(Android 4.4)我试了一遍,主要存在以下问题:
是的,protobuf和openblas都需要提前编译好,在cmake时传入。如果需要,后期可以修改cmake改成自动编译。另外cmake时还有些交叉编译的选项需要手动配置。
目前还没写,在这个pr中我简单介绍了两种编译方式。文档之后补上 :-D |
该PR相关的进一步工作
一些工作将在后续的pr中继续完善。 |
目前只能编译
WITH_PYTHON=OFF
的版本。准备Android的交叉编译环境:
交叉编译Android版Paddle,主要涉及到以下修改:
编译方式1: 使用cmake本身对Android交叉编译的支持,要求cmake-3.7以上版本。cmake系统根据是否设置了
CMAKE_SYSTEM_NAME
来判断是否在进行交叉编译,会根据配置自动添加相应的编译选项,并且设置CMAKE_CROSSCOMPILING
变量为TRUE。而Paddle的cmake文件在检测到交叉编译Android版本时,也会自动地设置WITH_AVX=OFF; WITH_GPU=OFF; WITH_RDMA=OFF; WITH_PYTHON=OFF
编译方式2: 手动配置编译选项。这种方式,cmake系统本身并不认为是在进行交叉编译,而是用户手动通过编译器、编译器选项在控制。
__SSE__
控制,没有定义该宏则直接使用naive版本paddle/math/SIMDFunctions.hNEON
指令,后期可以按照OpenBLAS的方式实现,Android版可以查询cpufeatures库pthread_spinlock_t
和pthread_barrier_t
pthread_spinlock_t
和pthread_barrier_t
这两个变量是否存在:PADDLE_USE_PTHREAD_SPINLOCK
和PADDLE_USE_PTHREAD_BARRIER
,直接采用pthread版本paddle/utils/arch/linux/Locks.cpp
和paddle/utils/arch/osx/Locks.cpp
合并成一个)std::to_string
,实现了一个简单的内部版本(paddle/utils/StringUtil.h)protoc
和target上的库libprotobuf.a
NO_LAPACK
版本,Paddle中添加宏PADDLE_USE_LAPACK
控制lapack函数的调用make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc ARM_SOFTFP_ABI=1 NOFORTRAN=1 USE_THREAD=0