Add a choice of how to end streaming from callback: STOP or CANCEL #1476

sbalandi · 2025-01-03T22:05:28Z

No description provided.

sbalandi · 2025-01-03T22:08:18Z

TODO: add CANCEL for ContinuousBatching

sbalandi · 2025-01-08T22:20:33Z

TODO: add CANCEL for ContinuousBatching

done

ilya-lavrenov

Please, add tests for new functionality.

samples/python/prompt_lookup_decoding_lm/prompt_lookup_decoding_lm.py

samples/cpp/chat_sample/chat_sample.cpp

src/cpp/include/openvino/genai/generation_handle.hpp

src/python/openvino_genai/__init__.py

src/cpp/src/text_callback_streamer.hpp

src/cpp/include/openvino/genai/streamer_base.hpp

src/python/openvino_genai/py_openvino_genai.pyi

samples/python/text_generation/chat_sample.py

samples/python/text_generation/prompt_lookup_decoding_lm.py

src/cpp/include/openvino/genai/llm_pipeline.hpp

src/cpp/include/openvino/genai/streamer_base.hpp

sbalandi · 2025-01-29T21:16:37Z

@ilya-lavrenov could you please take a look ?

src/cpp/src/utils.hpp

src/cpp/src/continuous_batching_adapter.hpp

ilya-lavrenov · 2025-02-03T08:51:58Z

samples/python/visual_language_chat/visual_language_chat.py

@@ -23,7 +23,7 @@ def streamer(subword: str) -> bool:
    print(subword, end='', flush=True)

    # No value is returned as in this example we don't want to stop the generation in this method.
-    # "return None" will be treated the same as "return False".
+    # "return None" will be treated the same as "return ov::genai::StreamerRunningStatus::RUNNING;".


Suggested change

# "return None" will be treated the same as "return ov::genai::StreamerRunningStatus::RUNNING;".

# "return None" will be treated the same as "return openvino_genai.StreamerRunningStatus.RUNNING".

ilya-lavrenov · 2025-02-03T08:54:40Z

src/cpp/include/openvino/genai/generation_handle.hpp

@@ -15,8 +15,8 @@ enum class GenerationStatus {
    RUNNING = 0, // Default status for ongoing generation
    FINISHED = 1, // Status set when generation has been finished
    IGNORED = 2, // Status set when generation run into out-of-memory condition and could not be continued
-    DROPPED_BY_PIPELINE = 3, // Currently not used, TODO: implement abort functionality
-    DROPPED_BY_HANDLE = 4 // Status set when generation handle is dropped


let's deprecate DROPPED_BY_HANDLE via OPENVINO_ENUM_DEPRECATED and assign DROPPED_BY_HANDLE = STOP

ilya-lavrenov · 2025-02-03T08:55:32Z

src/cpp/include/openvino/genai/generation_handle.hpp


+    bool is_stopped();
+
+    bool is_canceled();


just want to highlight what variant we want to use cancelled or canceled
@Wovchena @sbalandi I see that both spellings are OK, but want to pay your attention additionally..

ilya-lavrenov · 2025-02-03T08:56:19Z

src/cpp/include/openvino/genai/streamer_base.hpp

@@ -4,16 +4,29 @@
 #pragma once

 #include "openvino/genai/tokenizer.hpp"
+#include "openvino/genai/generation_handle.hpp"


looks like this header file is not required here anymore

ilya-lavrenov · 2025-02-03T08:57:04Z

src/cpp/include/openvino/genai/streamer_base.hpp

@@ -22,6 +35,12 @@ class OPENVINO_GENAI_EXPORTS StreamerBase {
    /// @brief end is called at the end of generation. It can be used to flush cache if your own streamer has one
    virtual void end() = 0;

+    /// @brief get_streaming_status() is called by the pipline to take more detailed about streaming status. m_streaming_finish_status, which contains streaming status info, could be set in put().
+    /// @return ov::genai::StreamerRunningStatus to determine the streaming status of generation, whether generation is running, stopped or cancelled
+    virtual StreamerRunningStatus get_streaming_status() {


Suggested change

virtual StreamerRunningStatus get_streaming_status() {

virtual StreamerRunningStatus get_streaming_status() const {

ilya-lavrenov · 2025-02-03T09:26:30Z

src/cpp/src/whisper/whisper.cpp

@@ -171,7 +171,7 @@ std::pair<ov::genai::EncodedResults, bool> decode(std::shared_ptr<ov::genai::Whi

    sampler.clear_request_info(sequence_group->get_request_id());

-    return {results, sequence_group->handle_dropped()};
+    return {results, sequence_group->handle_stopped()};


I think we need to handle cancel() as well.

As Whisper does not have chat scenario, cancel() and stop() work with no difference

ilya-lavrenov · 2025-02-03T09:28:50Z

tests/python_tests/test_llm_pipeline.py

@@ -217,6 +222,106 @@ def test_callback_kwargs_batch_throws(callback):
        pipe.generate(['1', '2'], max_new_tokens=10, streamer=callback)


+@pytest.mark.precommit
+@pytest.mark.nightly
+def test_callback_terminate_by_bool_sampler():


Suggested change

def test_callback_terminate_by_bool_sampler():

def test_callback_terminate_by_bool():

why do we need sampler in test name? IMO, we don't need such implementation detail here.

If you are OK, we need to drop _sampler postfix from other tests as well.

ilya-lavrenov · 2025-02-03T09:30:19Z

tests/python_tests/test_llm_pipeline.py

+        current_iter += 1
+        return current_iter == num_iters
+
+    ov_generation_config = GenerationConfig(max_new_tokens=100)


I think we need to use ignore_eos=True as generation in a theory can finish by itself on num_iters's iteration

the same for other tests

ilya-lavrenov · 2025-02-03T09:31:41Z

tools/continuous_batching/accuracy/continuous_batching_accuracy.cpp

@@ -114,7 +114,7 @@ int main(int argc, char* argv[]) try {
                print_generation_result(generation_result);
            }
            break;
-        case ov::genai::GenerationStatus::DROPPED_BY_PIPELINE:
+        case ov::genai::GenerationStatus::CANCEL:


Suggested change

case ov::genai::GenerationStatus::CANCEL:

case ov::genai::GenerationStatus::CANCEL:

case ov::genai::GenerationStatus::STOP:

ilya-lavrenov · 2025-02-03T09:31:50Z

tools/continuous_batching/accuracy/continuous_batching_speculative_decoding.cpp

@@ -124,7 +124,7 @@ int main(int argc, char* argv[]) try {
                print_cb_generation_result(generation_result);
            }
            break;
-        case ov::genai::GenerationStatus::DROPPED_BY_PIPELINE:
+        case ov::genai::GenerationStatus::CANCEL:


Suggested change

case ov::genai::GenerationStatus::CANCEL:

case ov::genai::GenerationStatus::CANCEL:

case ov::genai::GenerationStatus::STOP:

sbalandi force-pushed the callback branch from 62439bf to 3800085 Compare January 3, 2025 22:10

ilya-lavrenov added this to the 2025.0 milestone Jan 4, 2025

ilya-lavrenov self-assigned this Jan 6, 2025

sbalandi force-pushed the callback branch 5 times, most recently from 454cdd9 to 1592ed0 Compare January 8, 2025 19:38

github-actions bot added category: Python API Python API for GenAI category: samples GenAI samples labels Jan 8, 2025

sbalandi force-pushed the callback branch 3 times, most recently from 10a755b to d18fe16 Compare January 8, 2025 22:19

sbalandi marked this pull request as ready for review January 8, 2025 22:43

sbalandi force-pushed the callback branch 3 times, most recently from 2758f6b to 03ca3ce Compare January 9, 2025 21:56

ilya-lavrenov reviewed Jan 10, 2025

View reviewed changes

ilya-lavrenov requested a review from Wovchena January 10, 2025 08:21

andrei-kochin modified the milestones: 2025.0, 2025.1 Jan 13, 2025

sbalandi force-pushed the callback branch 11 times, most recently from 17a9501 to 8975221 Compare January 21, 2025 13:16

sbalandi requested a review from ilya-lavrenov January 22, 2025 14:40

sbalandi force-pushed the callback branch 3 times, most recently from 591c81a to 8c6ff44 Compare January 24, 2025 17:31

github-actions bot added the category: whisper Whisper pipeline label Jan 24, 2025

sbalandi force-pushed the callback branch 4 times, most recently from cac1834 to 408e4a3 Compare January 27, 2025 12:53

Wovchena requested changes Jan 29, 2025

View reviewed changes

sbalandi force-pushed the callback branch from 0488f3f to fff387a Compare January 29, 2025 15:04

Wovchena requested changes Jan 30, 2025

View reviewed changes

src/cpp/src/utils.hpp Outdated Show resolved Hide resolved

src/cpp/src/continuous_batching_adapter.hpp Outdated Show resolved Hide resolved

sbalandi force-pushed the callback branch from fff387a to 5064a0c Compare January 30, 2025 17:11

sbalandi added 2 commits January 30, 2025 18:06

Add a choice of how to end streaming from callback: STOP or CANCEL

651cb4b

update docs

5064a0c

ilya-lavrenov reviewed Feb 3, 2025

View reviewed changes

ilya-lavrenov assigned Wovchena Feb 3, 2025

ilya-lavrenov requested a review from Wovchena February 3, 2025 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

sbalandi commented Jan 3, 2025

sbalandi commented Jan 3, 2025

sbalandi commented Jan 8, 2025

ilya-lavrenov left a comment

sbalandi commented Jan 29, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025 •

edited

Loading

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

ilya-lavrenov Feb 3, 2025

	# "return None" will be treated the same as "return ov::genai::StreamerRunningStatus::RUNNING;".
	# "return None" will be treated the same as "return openvino_genai.StreamerRunningStatus.RUNNING".

	virtual StreamerRunningStatus get_streaming_status() {
	virtual StreamerRunningStatus get_streaming_status() const {

	def test_callback_terminate_by_bool_sampler():
	def test_callback_terminate_by_bool():

	case ov::genai::GenerationStatus::CANCEL:
	case ov::genai::GenerationStatus::CANCEL:
	case ov::genai::GenerationStatus::STOP:

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Are you sure you want to change the base?

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Conversation

sbalandi commented Jan 3, 2025

sbalandi commented Jan 3, 2025

sbalandi commented Jan 8, 2025

ilya-lavrenov left a comment

Choose a reason for hiding this comment

sbalandi commented Jan 29, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilya-lavrenov Feb 3, 2025 •

edited

Loading