Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU][ARM]Snippets MatMul via brgemm emitter and executor #28304

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

chenhu-wang
Copy link
Contributor

Details:

  • Snippets MatMul via brgemm emitter and executor

Tickets:

@chenhu-wang chenhu-wang requested review from a team as code owners January 8, 2025 06:59
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 8, 2025
@chenhu-wang chenhu-wang marked this pull request as draft January 8, 2025 07:28
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Jan 9, 2025
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch 3 times, most recently from a5b829d to 6ca4f1b Compare January 9, 2025 08:09
@chenhu-wang chenhu-wang marked this pull request as ready for review January 13, 2025 06:37
@chenhu-wang chenhu-wang requested a review from a team as a code owner January 13, 2025 06:37
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch 15 times, most recently from 982e2c2 to 6e05cb1 Compare January 15, 2025 05:33
@chenhu-wang
Copy link
Contributor Author

chenhu-wang commented Jan 15, 2025

@a-sidorova, Could you please review as well, as you are reviewing #28229. The test cases passed on arm for snippets MatMul. Thank you!

Comment on lines +463 to +469
# define SNIPPETS_REGISTER_PASS_ABSOLUTE_ARM64(PASS_PLACE, PASS, ...) \
backend_passes.emplace_back(PassPosition(PASS_PLACE), std::make_shared<PASS>(__VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to improve readability and since we don't use SNIPPETS_REGISTER_PASS_ABSOLUTE_ARM64 yet, we can lave only SNIPPETS_REGISTER_PASS_RELATIVE_ARM64 definition here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate why we need to have the separate aarch64-specific pass BrgemmTPPBlocking? This pass already exists and used on x64. Can we use one pass for the both arhs?

void jit_brgemm_emitter::emit_impl(const std::vector<size_t>& in, const std::vector<size_t>& out) const {
validate_arguments(in, out);
std::unordered_set<size_t> exclude = {};
store_context(exclude);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that we will merge #27391 soon. This PR efficently provides efficient work with reg spills - we will able to spill only needed (live) registers

Just for information and to align with other our activities 😊

if (ENABLE_SNIPPETS_LIBXSMM_TPP)
ov_add_compiler_flags(-Wno-missing-declarations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why you need to add this flag? Can we avoid it?

@@ -52,7 +52,7 @@ ov_dependent_option (ENABLE_GPU_DEBUG_CAPS "enable GPU debug capabilities at run
ov_dependent_option (ENABLE_CPU_DEBUG_CAPS "enable CPU debug capabilities at runtime" ON "ENABLE_DEBUG_CAPS;ENABLE_INTEL_CPU" OFF)
ov_dependent_option (ENABLE_SNIPPETS_DEBUG_CAPS "enable Snippets debug capabilities at runtime" ON "ENABLE_DEBUG_CAPS" OFF)

ov_dependent_option (ENABLE_SNIPPETS_LIBXSMM_TPP "allow Snippets to use LIBXSMM Tensor Processing Primitives" OFF "ENABLE_INTEL_CPU AND X86_64" OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also RISCV64, AArch32 etc. Can we add only supported archs to condition?

Comment on lines +251 to +255
#ifndef OPENVINO_ARCH_X86_64
config.update(DIM_CAST(M), DIM_CAST(N), DIM_CAST(K), 0, 0, 0, beta);
return;
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that there should be common cross-arch base class with method init_runtime_params(M,N,K,LDA,LDB,LDC).
Then x64 dnnl executors update LDB and LDA if needed. aarch64 (tpp) should call update_config(...) with these parameters.

If we use ifdef in common code, I think this is a sign of problematic code and we should resolve it.

Comment on lines +47 to +49
size_t BrgemmKernelConfig::StaticParams::compute_hash(dnnl::impl::cpu::aarch64::cpu_isa_t aarch_isa) {
return hash_combine(0, aarch_isa);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use aarch_isa in this config/kernel/executor. Then this attribute should be missed in params

const auto num_ins = expr->get_node()->get_input_size();
const auto num_outs = expr->get_node()->get_output_size();

size_t io_strides[num_ins + num_outs];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it variable length array? If it is, as far as I know, ISO C++ forbids it. Can we use std::vector at least here or just 3 size_t variables?

Comment on lines +78 to +79
auto refreshed_compile_flag =
config.get_beta() == 0 ? config.get_compile_flags() | LIBXSMM_GEMM_FLAG_BETA_0 : compile_flag;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already update compile flags on L121 in update_config. Then config.get_compile_flags() will return already updated flags

std::shared_ptr<libxsmm_gemmfunction> brgemm_kernel = nullptr;
};

class BrgemmKernelExecutor : public BrgemmBaseKernelExecutor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, is there any differences of config/executor/kernel between aarch64?
Is there any chance to have only one kernel executor for x64 and aarc64? As far as I know, all aarch-dependent params are hidden in TPP functions/kernels (this is pro of libxsmm).

@a-sidorova a-sidorova self-assigned this Jan 15, 2025
@v-Golubev v-Golubev self-assigned this Jan 16, 2025
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch from 6e05cb1 to 96e274c Compare January 22, 2025 06:36
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch from 96e274c to 6b55e68 Compare January 22, 2025 08:07
@chenhu-wang chenhu-wang force-pushed the chenhu/snipppets_matmul_via_executor_on_arm branch from 6b55e68 to f336769 Compare January 23, 2025 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants