Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable MM_PREFETCH and MM_MALLOC on aarch64 #4124

Closed
StrikerRUS opened this issue Mar 27, 2021 · 4 comments
Closed

Enable MM_PREFETCH and MM_MALLOC on aarch64 #4124

StrikerRUS opened this issue Mar 27, 2021 · 4 comments

Comments

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 27, 2021

Refer to #3948 (comment).

Related source codes:

LightGBM/CMakeLists.txt

Lines 243 to 271 in e98da99

include(CheckCXXSourceCompiles)
check_cxx_source_compiles("
#include <xmmintrin.h>
int main() {
int a = 0;
_mm_prefetch(&a, _MM_HINT_NTA);
return 0;
}
" MM_PREFETCH)
if(${MM_PREFETCH})
message(STATUS "Using _mm_prefetch")
ADD_DEFINITIONS(-DMM_PREFETCH)
endif()
include(CheckCXXSourceCompiles)
check_cxx_source_compiles("
#include <mm_malloc.h>
int main() {
char *a = (char*)_mm_malloc(8, 16);
_mm_free(a);
return 0;
}
" MM_MALLOC)
if(${MM_MALLOC})
message(STATUS "Using _mm_malloc")
ADD_DEFINITIONS(-DMM_MALLOC)
endif()

#if (defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_AMD64))) || defined(__INTEL_COMPILER) || MM_PREFETCH
#include <xmmintrin.h>
#define PREFETCH_T0(addr) _mm_prefetch(reinterpret_cast<const char*>(addr), _MM_HINT_T0)
#elif defined(__GNUC__)
#define PREFETCH_T0(addr) __builtin_prefetch(reinterpret_cast<const char*>(addr), 0, 3)
#else
#define PREFETCH_T0(addr) do {} while (0)
#endif

#if defined(_MSC_VER)
#include <malloc.h>
#elif MM_MALLOC
#include <mm_malloc.h>
// https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
// https://www.oreilly.com/library/view/mac-os-x/0596003560/ch05s01s02.html
#elif defined(__GNUC__) && defined(HAVE_MALLOC_H)
#include <malloc.h>
#define _mm_malloc(a, b) memalign(b, a)
#define _mm_free(a) free(a)
#else
#include <stdlib.h>
#define _mm_malloc(a, b) malloc(a)
#define _mm_free(a) free(a)
#endif

@StrikerRUS
Copy link
Collaborator Author

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@jameslamb
Copy link
Collaborator

jameslamb commented Jun 21, 2024

See the error at https://clang.llvm.org/doxygen/xmmintrin_8h_source.html.

error: "This header is only meant to be used on x86 and x64 architecture"

This can be reproduced like this:

// conftest.cpp
#include <xmmintrin.h>

void main() {
    int a = 0;
    _mm_prefetch(&a, _MM_HINT_NTA);
    return 0;
}
clang++ -arch arm64 -std=gnu++17 -o conftest conftest.cpp

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Jul 9, 2024

I'm not an expert in all these things at all, but it looks like Arm Compiler supports prefetching via __builtin_prefetch.
https://developer.arm.com/documentation/101458/2404/Optimize/Prefetching-with---builtin-prefetch?lang=en

UPD: ... and gcc: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c

@StrikerRUS
Copy link
Collaborator Author

Linking #4331 and #6514 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants