-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU][ARM]Snippets MatMul via brgemm emitter and executor #28304
base: master
Are you sure you want to change the base?
[CPU][ARM]Snippets MatMul via brgemm emitter and executor #28304
Conversation
a5b829d
to
6ca4f1b
Compare
982e2c2
to
6e05cb1
Compare
@a-sidorova, Could you please review as well, as you are reviewing #28229. The test cases passed on arm for snippets MatMul. Thank you! |
# define SNIPPETS_REGISTER_PASS_ABSOLUTE_ARM64(PASS_PLACE, PASS, ...) \ | ||
backend_passes.emplace_back(PassPosition(PASS_PLACE), std::make_shared<PASS>(__VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to improve readability and since we don't use SNIPPETS_REGISTER_PASS_ABSOLUTE_ARM64
yet, we can lave only SNIPPETS_REGISTER_PASS_RELATIVE_ARM64
definition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate why we need to have the separate aarch64-specific pass BrgemmTPPBlocking
? This pass already exists and used on x64. Can we use one pass for the both arhs?
void jit_brgemm_emitter::emit_impl(const std::vector<size_t>& in, const std::vector<size_t>& out) const { | ||
validate_arguments(in, out); | ||
std::unordered_set<size_t> exclude = {}; | ||
store_context(exclude); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that we will merge #27391 soon. This PR efficently provides efficient work with reg spills - we will able to spill only needed (live) registers
Just for information and to align with other our activities 😊
if (ENABLE_SNIPPETS_LIBXSMM_TPP) | ||
ov_add_compiler_flags(-Wno-missing-declarations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate why you need to add this flag? Can we avoid it?
@@ -52,7 +52,7 @@ ov_dependent_option (ENABLE_GPU_DEBUG_CAPS "enable GPU debug capabilities at run | |||
ov_dependent_option (ENABLE_CPU_DEBUG_CAPS "enable CPU debug capabilities at runtime" ON "ENABLE_DEBUG_CAPS;ENABLE_INTEL_CPU" OFF) | |||
ov_dependent_option (ENABLE_SNIPPETS_DEBUG_CAPS "enable Snippets debug capabilities at runtime" ON "ENABLE_DEBUG_CAPS" OFF) | |||
|
|||
ov_dependent_option (ENABLE_SNIPPETS_LIBXSMM_TPP "allow Snippets to use LIBXSMM Tensor Processing Primitives" OFF "ENABLE_INTEL_CPU AND X86_64" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also RISCV64, AArch32 etc. Can we add only supported archs to condition?
#ifndef OPENVINO_ARCH_X86_64 | ||
config.update(DIM_CAST(M), DIM_CAST(N), DIM_CAST(K), 0, 0, 0, beta); | ||
return; | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that there should be common cross-arch base class with method init_runtime_params(M,N,K,LDA,LDB,LDC)
.
Then x64 dnnl executors update LDB
and LDA
if needed. aarch64 (tpp) should call update_config(...)
with these parameters.
If we use ifdef
in common code, I think this is a sign of problematic code and we should resolve it.
size_t BrgemmKernelConfig::StaticParams::compute_hash(dnnl::impl::cpu::aarch64::cpu_isa_t aarch_isa) { | ||
return hash_combine(0, aarch_isa); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use aarch_isa
in this config/kernel/executor. Then this attribute should be missed in params
const auto num_ins = expr->get_node()->get_input_size(); | ||
const auto num_outs = expr->get_node()->get_output_size(); | ||
|
||
size_t io_strides[num_ins + num_outs]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it variable length array? If it is, as far as I know, ISO C++ forbids it. Can we use std::vector
at least here or just 3 size_t variables?
auto refreshed_compile_flag = | ||
config.get_beta() == 0 ? config.get_compile_flags() | LIBXSMM_GEMM_FLAG_BETA_0 : compile_flag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already update compile flags on L121 in update_config
. Then config.get_compile_flags()
will return already updated flags
std::shared_ptr<libxsmm_gemmfunction> brgemm_kernel = nullptr; | ||
}; | ||
|
||
class BrgemmKernelExecutor : public BrgemmBaseKernelExecutor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, is there any differences of config/executor/kernel between aarch64?
Is there any chance to have only one kernel executor for x64 and aarc64? As far as I know, all aarch-dependent params are hidden in TPP functions/kernels (this is pro of libxsmm).
6e05cb1
to
96e274c
Compare
96e274c
to
6b55e68
Compare
6b55e68
to
f336769
Compare
Details:
Tickets: