Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache rotation inputs and CPU kernel implementation for cache rotation #27088

Merged
merged 27 commits into from
Jan 9, 2025

Conversation

vshampor
Copy link
Contributor

Tickets:
153783

@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Oct 16, 2024
src/core/src/op/paged_attention.cpp Outdated Show resolved Hide resolved
Comment on lines 418 to 419
pa_arguments.insert(pa_arguments.begin() + 13, v0::Constant::create(element::f32, Shape{0}, {}));
pa_arguments.insert(pa_arguments.begin() + 14, v0::Constant::create(element::i32, Shape{0}, {}));
Copy link
Contributor

@slyalin slyalin Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make these inputs really optional, these two lines are not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

get_input_partial_shape(13).rank().is_dynamic() ||
get_input_partial_shape(13).rank().get_length() == 0 ||
get_input_partial_shape(13).rank().get_length() == 1,
"Input `rotation_coefficients` should either have an empty shape or rank 1, but it has rank ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Input `rotation_coefficients` should either have an empty shape or rank 1, but it has rank ",
"Input `rotation_coefficients` should either have rank 1 or omitted, but it has rank ",

"Empty" shape means [0] here, which have rank 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

NODE_VALIDATION_CHECK(
this,
get_input_partial_shape(13).rank().is_dynamic() ||
get_input_partial_shape(13).rank().get_length() == 0 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_input_partial_shape(13).rank().get_length() == 0 ||

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 167 to 169
get_input_partial_shape(14).rank().get_length() == 0 ||
get_input_partial_shape(14).rank().get_length() == 1,
"Input `rotated_block_indices` should either have an empty shape or rank 1 but it has rank ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment are applicable here as for input 13 above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1576,6 +1591,11 @@ struct AttentionExecutor : public PagedAttentionExecutor {
if (alibi_slopes) {
alibi_slopes.assert_dims({H});
}

if (rotated_block_indices) {
// Rotation, and cache eviction, is limited to cases when Q, K and V embedding sizes are equal, e.g. S == Sv
Copy link
Contributor

@slyalin slyalin Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have cases where they are not: minicpm-3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed - realized that we don't need that limitation for cache rotation since we only rotate the K values

@@ -58,6 +59,10 @@ static void CreatePagedAttentionExtensionOp(ProgramBuilder& p, const std::shared
OPENVINO_ASSERT(alibi_const != nullptr);
prim.has_alibi = ov::shape_size(alibi_const->get_output_shape(0)) > 0;

std::shared_ptr<ov::op::v0::Constant> rotation_coefficients_const = std::dynamic_pointer_cast<ov::op::v0::Constant>(op->get_input_node_shared_ptr(rotation_coefficients_idx));
OPENVINO_ASSERT(rotation_coefficients_const != nullptr);
prim.has_rotation_coefficients = ov::shape_size(alibi_const->get_output_shape(0)) > 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alibi_const shouldn't be used here -- bad copy&paste?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Oct 30, 2024
@vshampor vshampor changed the title Add cache rotation inputs Add cache rotation inputs and CPU kernel implementation for cache rotation Nov 12, 2024
@dmitry-gorokhov
Copy link
Contributor

@luo-cheng2021 Please review CPU PA changes.

CT cache_value_1 = *cache_value_1_ptr;

*cache_value_0_ptr = cache_value_0 * rotation_value_cos - cache_value_1 * rotation_value_sin;
*cache_value_1_ptr = cache_value_0 * rotation_value_sin + cache_value_1 * rotation_value_cos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the algorithm same with the following code?

auto src0 = src[i];
auto src1 = src[i + half_rotary_dims];
dst[i] = cos[i] * src0 - sin[i] * src1;
dst[i + half_rotary_dims] = cos[i + half_rotary_dims] * src1 + sin[i + half_rotary_dims] * src0;

If so, the following code can be used as reference:
static std::shared_ptr<kernel::JitKernelBase> createJitKernel(const jit_rotary_compile_params& param, bool check_vec_size2 = false) {
std::shared_ptr<kernel::JitKernelBase> res;
MAYBE_UNUSED(param);
MAYBE_UNUSED(check_vec_size2);
#if defined(OPENVINO_ARCH_X86_64)
if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core)) {
bool flag = true;
if (check_vec_size2) {
auto vec_size = jit_rotary_kernel<dnnl::impl::cpu::x64::avx512_core>::vec_size;
if (param.rotary_ndims % (vec_size * 2) != 0)
flag = false;
}
if (flag)
res = std::make_shared<jit_rotary_kernel<dnnl::impl::cpu::x64::avx512_core>>(param);
} else if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx2)) {
bool flag = true;
if (check_vec_size2) {
auto vec_size = jit_rotary_kernel<dnnl::impl::cpu::x64::avx2>::vec_size;
if (param.rotary_ndims % (vec_size * 2) != 0)
flag = false;
}
if (flag)
res = std::make_shared<jit_rotary_kernel<dnnl::impl::cpu::x64::avx2>>(param);
}
if (res)
res->create_kernel();
#endif // OPENVINO_ARCH_X86_64
return res;
}
static void execJitKernel(const std::shared_ptr<kernel::JitKernelBase>& ker, const void* src, void* dst, const float* cos, const float* sin) {
MAYBE_UNUSED(ker);
MAYBE_UNUSED(src);
MAYBE_UNUSED(dst);
MAYBE_UNUSED(cos);
MAYBE_UNUSED(sin);
#if defined(OPENVINO_ARCH_X86_64)
jit_rotary_call_args call_args;
call_args.src = src;
call_args.cos = cos;
call_args.sin = sin;
call_args.dst = dst;
(*ker)(&call_args);
#endif // OPENVINO_ARCH_X86_64
}
template <typename T>
struct RoPE::RoPEExecutorRotateHalf : public RoPE::Executor {
const op::internal::RoPE::Config& m_config;
std::shared_ptr<kernel::JitKernelBase> m_rotaryKernel;
RoPEExecutorRotateHalf(const op::internal::RoPE::Config& config) : m_config(config) {
jit_rotary_compile_params jcp;
jcp.src_prc = precision_of<T>::value;
jcp.dst_prc = precision_of<T>::value;
jcp.rotary_ndims = config.rotary_ndims;
jcp.interleave = false;
m_rotaryKernel = createJitKernel(jcp);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already written and tested my implementation, besides, the code you've sent me probably cannot be reused without modifications or bulky instantiations.

@vshampor vshampor requested a review from slyalin November 18, 2024 10:11
@github-actions github-actions bot added category: Python API OpenVINO Python bindings category: TF FE OpenVINO TensorFlow FrontEnd category: PyTorch FE OpenVINO PyTorch Frontend category: JAX FE OpenVINO JAX FrontEnd labels Nov 18, 2024
@github-actions github-actions bot added category: CI OpenVINO public CI github_actions Pull requests that update GitHub Actions code category: NPU OpenVINO NPU plugin labels Nov 19, 2024
@vshampor vshampor marked this pull request as ready for review November 19, 2024 12:22
@vshampor vshampor requested review from a team as code owners November 19, 2024 12:22
Copy link
Contributor

@p-durandin p-durandin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK from GPU point of view

@vshampor vshampor enabled auto-merge January 9, 2025 11:53
@vshampor vshampor added this pull request to the merge queue Jan 9, 2025
Merged via the queue into openvinotoolkit:master with commit 98192e9 Jan 9, 2025
187 checks passed
@vshampor vshampor deleted the token_rotation branch January 9, 2025 14:08
github-merge-queue bot pushed a commit that referenced this pull request Jan 14, 2025
### Details:
- This PR adds cache rotation support for PagedAttention and related
tests
- Should be merged after
#27088

### Tickets:
 - *ticket-id*

---------

Co-authored-by: Vasily Shamporov <[email protected]>
Co-authored-by: Pavel Durandin <[email protected]>
Co-authored-by: cecilia peng <[email protected]>
MirceaDan99 pushed a commit to MirceaDan99/openvino that referenced this pull request Jan 22, 2025
MirceaDan99 pushed a commit to MirceaDan99/openvino that referenced this pull request Jan 22, 2025
### Details:
- This PR adds cache rotation support for PagedAttention and related
tests
- Should be merged after
openvinotoolkit#27088

### Tickets:
 - *ticket-id*

---------

Co-authored-by: Vasily Shamporov <[email protected]>
Co-authored-by: Pavel Durandin <[email protected]>
Co-authored-by: cecilia peng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CI OpenVINO public CI category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: GPU OpenVINO GPU plugin category: JAX FE OpenVINO JAX FrontEnd category: Python API OpenVINO Python bindings category: PyTorch FE OpenVINO PyTorch Frontend category: TF FE OpenVINO TensorFlow FrontEnd category: transformations OpenVINO Runtime library - Transformations github_actions Pull requests that update GitHub Actions code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants