Switch NPU LLM execution to ov::genai::StatefulLLMPipeline #1677

TolyaTalamanov · 2025-02-05T15:01:11Z

No description provided.

src/cpp/src/llm_pipeline_stateful.hpp

src/cpp/src/llm_pipeline.cpp

src/cpp/src/utils.cpp

TolyaTalamanov · 2025-02-17T14:50:03Z

Depends on: #1748

AsyaPronina · 2025-02-19T16:58:01Z

src/cpp/src/utils.cpp

+        // NB: OPTIMIZE_SIZE is only possible when model_path is defined!
+        // Otherwise, switch to OPTIMIZE_SPEED
+        if (cache_mode.has_value() && *cache_mode == CacheMode::OPTIMIZE_SIZE && !model_path.empty()) {
+            compiled = ov::genai::utils::singleton_core().compile_model(model_path, "NPU", properties);


Thus way we will read the model by model_path twice, won't we?

I believe your point is the same as @ilya-lavrenov. @smirnov-alexey please explain :)

@AsyaPronina Actually, we read model once in case of OPTIMIZE_SPEED and don't explicitly read by default

src/cpp/src/utils.hpp

AsyaPronina

LGTM, thanks! Minor comments left.

src/cpp/src/llm_pipeline_stateful.cpp

ilya-lavrenov · 2025-02-20T08:17:45Z

src/cpp/src/utils.cpp

+        kv_desc.min_response_len = pop_int_and_cast(properties, "MIN_RESPONSE_LEN").value_or(128u);
+        update_npu_config(properties, model, kv_pos, kv_desc);
+        auto cache_mode = get_option<CacheMode>(config, "CACHE_MODE");
+        // NB: OPTIMIZE_SIZE is only possible when model_path is defined!


@smirnov-alexey will this limitation be fixed in nearest future?

It will be great to avoid custom path for NPU (passing extra models_path) through a chain of calls

@ilya-lavrenov As I understood this is how it work on openvino side:

https://docs.openvino.ai/2025/api/c_cpp_api/classov_1_1_core.html

This can be more efficient than using the Core::read_model + Core::compile_model(model_in_memory_object) flow, especially for cases when caching is enabled and a cached model is available.

I suppose those are different things. I suppose you need to check CACHE_DIR instead of CACHE_MODE, then?

Definitely compile_model with path is preferable, but CACHE_MODE optimization (is it about weightless cache? I suppose it should work even with CACHE_MODE = OPTIMIZE_SPEED for NPU) should also work with ov::Model as well, IMO

BTW, model has been already read via read_model using a "common" code and some transformations are applied.
So, looks strange that you ignore this model and read your own one inside the plugin.

@ilya-lavrenov You're right, it's an artifact of copy-paste, will fix, thanks!

@ilya-lavrenov I believe read_model() + compile_model(model) doesn't work. I previously discussed it with GPU plguin and I think the only way to import model via CACHE_DIR is using compile_model(model_path). It has something to do with https://github.com/openvinotoolkit/openvino/pull/27162/files
Please correct me if I'm wrong, but I couldn't achieve weightless caching with CACHE_DIR and model. If I recall correctly, WEIGHTS_PATH is not set in this case.

I think we can enable weightless blob for already read ov::Models
See openvinotoolkit/openvino#29101, could you please try that PR?

Unfortunately we get increased memory consumption

Yet another attempt openvinotoolkit/openvino#29107 😄

src/cpp/src/utils.cpp

Co-authored-by: Ilya Lavrenov <[email protected]>

…envino.genai into at/uniform-llm

src/cpp/src/llm_pipeline_stateful.cpp

…genai into at/uniform-llm

TolyaTalamanov · 2025-02-20T15:07:34Z

tests/python_tests/test_llm_pipeline_static.py

+static_config = { **default_config, 'STATIC_PIPELINE': 'STATEFUL' }
+
+# Test both, static and generic pipelines
+pipeline_configs = [default_config, static_config]


Perhaps it will be renamed to npu tests

tests/python_tests/test_llm_pipeline_static.py

smirnov-alexey · 2025-02-20T18:20:01Z

tests/python_tests/test_llm_pipeline_static.py

+                     'NPUW_ONLINE_PIPELINE': 'NONE'
+                 } | get_default_llm_properties()
+
+static_config = { **default_config, 'STATIC_PIPELINE': 'STATEFUL' }


Do we still need 'STATIC_PIPELINE': 'STATEFUL'?

smirnov-alexey · 2025-02-20T18:31:50Z

src/cpp/src/utils.cpp

+        auto cache_mode = get_option<CacheMode>(config, ov::cache_mode.name());
+        // NB: OPTIMIZE_SIZE is only possible when model_path is defined!
+        // Otherwise, switch to OPTIMIZE_SPEED
+        if (cache_mode.has_value() && *cache_mode == CacheMode::OPTIMIZE_SIZE && !model_path.empty()) {


@TolyaTalamanov previous behavior was in default case we just use model_path (weightless). And user would need to explicitly specify OPTIMIZE**_SPEED**. Please, return to this behavior - no need for user to specify OPTIMIZE_SIZE here, it's our default option

Sorry, changed it now, could you check?

smirnov-alexey · 2025-02-20T18:35:14Z

src/cpp/src/llm_pipeline.cpp

@@ -139,8 +139,8 @@ ov::genai::LLMPipeline::LLMPipeline(
        m_pimpl = std::make_unique<ContinuousBatchingAdapter>(models_path, tokenizer, scheduler_config, device, device_properties);
    }

-    if (m_pimpl == nullptr && device == "NPU") {
-        m_pimpl = static_llm::LLMPipelineFactory::create(models_path, tokenizer, device, properties);
+    if (m_pimpl == nullptr && device == "NPU" && properties.count("STATIC_PIPELINE")) {


Following my previous review - can we just get rid of STATIC_PIPELINE since our only option is STATEFULL?

Most likely it's not the case, but what if we have another static pipeline implementation? I plan to remove it anyway next week preferably when switching to generic pipeline doesn't bring any perf regressions

smirnov-alexey

My only concern here is OPTIMIZE_SPEED and OPTIMIZE_SIZE behavior

…genai into at/uniform-llm

smirnov-alexey · 2025-02-21T11:26:06Z

@TolyaTalamanov please wait for merge until I test Ilya's changes. Likely we could remove cache_mode logic

smirnov-alexey · 2025-02-21T13:30:48Z

@TolyaTalamanov please wait for merge until I test Ilya's changes. Likely we could remove cache_mode logic

Tested - doesn't work out-of-the-box. I suggest we implement it separately

github-actions bot added category: LLM LLM pipeline (stateful, static) no-match-files labels Feb 5, 2025

TolyaTalamanov commented Feb 5, 2025

View reviewed changes

src/cpp/src/llm_pipeline_stateful.hpp Show resolved Hide resolved

TolyaTalamanov marked this pull request as draft February 5, 2025 15:30

ilya-lavrenov added this to the 2025.1 milestone Feb 5, 2025

TolyaTalamanov force-pushed the at/uniform-llm branch from bd7ca15 to fc14f8e Compare February 6, 2025 11:50

TolyaTalamanov marked this pull request as ready for review February 6, 2025 12:04

TolyaTalamanov commented Feb 6, 2025

View reviewed changes

src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/llm_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/utils.cpp Outdated Show resolved Hide resolved

github-actions bot added the category: cmake / build Cmake scripts label Feb 6, 2025

TolyaTalamanov changed the title ~~Switch NPU LLM execution to ov::genai::StatefulLLMPipeline~~ Switch NPU LLM execution to ov::genai::StatefulLLMPipeline [DON'T MERGE!] Feb 6, 2025

ilya-lavrenov added the do_not_merge label Feb 6, 2025

TolyaTalamanov force-pushed the at/uniform-llm branch from 2b519fd to a9c60e3 Compare February 17, 2025 14:47

github-actions bot added the category: visual language Visual language pipeline label Feb 17, 2025

TolyaTalamanov force-pushed the at/uniform-llm branch from e7dcacf to f7861fe Compare February 18, 2025 09:26

TolyaTalamanov changed the title ~~Switch NPU LLM execution to ov::genai::StatefulLLMPipeline [DON'T MERGE!]~~ Switch NPU LLM execution to ov::genai::StatefulLLMPipeline Feb 19, 2025

TolyaTalamanov force-pushed the at/uniform-llm branch from 4dcd044 to 949825e Compare February 19, 2025 14:47

Switch NPU LLMs execution to ov::genai::StatefulLLMPIpeline

b300855

TolyaTalamanov force-pushed the at/uniform-llm branch from 949825e to b300855 Compare February 19, 2025 14:50

github-actions bot removed category: text to image Text 2 image pipeline category: GHA CI based on Github actions category: cmake / build Cmake scripts labels Feb 19, 2025

TolyaTalamanov requested a review from AsyaPronina February 19, 2025 14:54

AsyaPronina reviewed Feb 19, 2025

View reviewed changes

src/cpp/src/utils.hpp Outdated Show resolved Hide resolved

AsyaPronina approved these changes Feb 19, 2025

View reviewed changes

Extend tests for STATIC_PIPELINE

b5d86a2

ilya-lavrenov reviewed Feb 20, 2025

View reviewed changes

TolyaTalamanov and others added 7 commits February 20, 2025 10:47

Extend tests for STATIC_PIPELINE

0105dc3

Update src/cpp/src/llm_pipeline_stateful.cpp

7951e86

Co-authored-by: Ilya Lavrenov <[email protected]>

Update src/cpp/src/llm_pipeline_stateful.cpp

97ff503

Co-authored-by: Ilya Lavrenov <[email protected]>

Update src/cpp/src/utils.cpp

31fc72d

Co-authored-by: Ilya Lavrenov <[email protected]>

Update src/cpp/src/llm_pipeline_stateful.cpp

4ea8841

Co-authored-by: Ilya Lavrenov <[email protected]>

Merge branch 'at/uniform-llm' of https://github.com/TolyaTalamanov/op…

33c6138

…envino.genai into at/uniform-llm

Fix comments for review

e832d04

TolyaTalamanov commented Feb 20, 2025

View reviewed changes

src/cpp/src/llm_pipeline_stateful.cpp Outdated Show resolved Hide resolved

TolyaTalamanov added 2 commits February 20, 2025 15:03

Add test for s11n

59e19d7

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

d61a9a7

…genai into at/uniform-llm

TolyaTalamanov commented Feb 20, 2025

View reviewed changes

ilya-lavrenov reviewed Feb 20, 2025

View reviewed changes

tests/python_tests/test_llm_pipeline_static.py Show resolved Hide resolved

smirnov-alexey reviewed Feb 20, 2025

View reviewed changes

smirnov-alexey requested changes Feb 20, 2025

View reviewed changes

TolyaTalamanov added 3 commits February 21, 2025 07:35

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

76e42af

…genai into at/uniform-llm

Clean up blob in tests

9dfb9e8

Fix OPTIMIZE_SPEED

d66f25b

smirnov-alexey approved these changes Feb 21, 2025

View reviewed changes

TolyaTalamanov added 2 commits February 21, 2025 15:41

Fix model_path

d487dd3

Fix optimize speed

88def9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch NPU LLM execution to ov::genai::StatefulLLMPipeline #1677

Switch NPU LLM execution to ov::genai::StatefulLLMPipeline #1677

TolyaTalamanov commented Feb 5, 2025

TolyaTalamanov commented Feb 17, 2025

AsyaPronina Feb 19, 2025

TolyaTalamanov Feb 20, 2025

smirnov-alexey Feb 20, 2025

AsyaPronina left a comment

ilya-lavrenov Feb 20, 2025

TolyaTalamanov Feb 20, 2025

ilya-lavrenov Feb 20, 2025 •

edited

Loading

ilya-lavrenov Feb 20, 2025

TolyaTalamanov Feb 20, 2025

smirnov-alexey Feb 20, 2025 •

edited

Loading

ilya-lavrenov Feb 21, 2025 •

edited

Loading

smirnov-alexey Feb 21, 2025

smirnov-alexey Feb 21, 2025

ilya-lavrenov Feb 21, 2025

TolyaTalamanov Feb 20, 2025

smirnov-alexey Feb 20, 2025

smirnov-alexey Feb 20, 2025

TolyaTalamanov Feb 21, 2025

smirnov-alexey Feb 20, 2025

TolyaTalamanov Feb 21, 2025

smirnov-alexey left a comment

smirnov-alexey commented Feb 21, 2025

smirnov-alexey commented Feb 21, 2025 •

edited

Loading

Switch NPU LLM execution to ov::genai::StatefulLLMPipeline #1677

Are you sure you want to change the base?

Switch NPU LLM execution to ov::genai::StatefulLLMPipeline #1677

Conversation

TolyaTalamanov commented Feb 5, 2025

TolyaTalamanov commented Feb 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AsyaPronina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilya-lavrenov Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smirnov-alexey Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

ilya-lavrenov Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smirnov-alexey left a comment

Choose a reason for hiding this comment

smirnov-alexey commented Feb 21, 2025

smirnov-alexey commented Feb 21, 2025 • edited Loading

ilya-lavrenov Feb 20, 2025 •

edited

Loading

smirnov-alexey Feb 20, 2025 •

edited

Loading

ilya-lavrenov Feb 21, 2025 •

edited

Loading

smirnov-alexey commented Feb 21, 2025 •

edited

Loading