Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Jan 3, 2025

No description provided.

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: speculative decoding Speculative decoding category: GenAI C++ API Changes in GenAI C++ public headers no-match-files category: prompt lookup labels Jan 3, 2025
@sbalandi
Copy link
Contributor Author

sbalandi commented Jan 3, 2025

TODO: add CANCEL for ContinuousBatching

@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Jan 4, 2025
@ilya-lavrenov ilya-lavrenov self-assigned this Jan 6, 2025
@sbalandi sbalandi force-pushed the callback branch 5 times, most recently from 454cdd9 to 1592ed0 Compare January 8, 2025 19:38
@github-actions github-actions bot added category: Python API Python API for GenAI category: samples GenAI samples labels Jan 8, 2025
@sbalandi sbalandi force-pushed the callback branch 3 times, most recently from 10a755b to d18fe16 Compare January 8, 2025 22:19
@sbalandi
Copy link
Contributor Author

sbalandi commented Jan 8, 2025

TODO: add CANCEL for ContinuousBatching

done

@sbalandi sbalandi marked this pull request as ready for review January 8, 2025 22:43
@sbalandi sbalandi force-pushed the callback branch 3 times, most recently from 2758f6b to 03ca3ce Compare January 9, 2025 21:56
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add tests for new functionality.

samples/cpp/chat_sample/chat_sample.cpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/generation_handle.hpp Outdated Show resolved Hide resolved
src/python/openvino_genai/__init__.py Show resolved Hide resolved
src/cpp/src/text_callback_streamer.hpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/streamer_base.hpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/streamer_base.hpp Outdated Show resolved Hide resolved
src/python/openvino_genai/py_openvino_genai.pyi Outdated Show resolved Hide resolved
@andrei-kochin andrei-kochin modified the milestones: 2025.0, 2025.1 Jan 13, 2025
@sbalandi sbalandi force-pushed the callback branch 11 times, most recently from 17a9501 to 8975221 Compare January 21, 2025 13:16
@sbalandi sbalandi force-pushed the callback branch 3 times, most recently from 591c81a to 8c6ff44 Compare January 24, 2025 17:31
@github-actions github-actions bot added the category: whisper Whisper pipeline label Jan 24, 2025
@sbalandi sbalandi force-pushed the callback branch 4 times, most recently from cac1834 to 408e4a3 Compare January 27, 2025 12:53
samples/python/text_generation/chat_sample.py Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/streamer_base.hpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/streamer_base.hpp Outdated Show resolved Hide resolved
@sbalandi
Copy link
Contributor Author

@ilya-lavrenov could you please take a look ?

src/cpp/src/utils.hpp Outdated Show resolved Hide resolved
src/cpp/src/continuous_batching_adapter.hpp Outdated Show resolved Hide resolved
@@ -23,7 +23,7 @@ def streamer(subword: str) -> bool:
print(subword, end='', flush=True)

# No value is returned as in this example we don't want to stop the generation in this method.
# "return None" will be treated the same as "return False".
# "return None" will be treated the same as "return ov::genai::StreamerRunningStatus::RUNNING;".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# "return None" will be treated the same as "return ov::genai::StreamerRunningStatus::RUNNING;".
# "return None" will be treated the same as "return openvino_genai.StreamerRunningStatus.RUNNING".

@@ -15,8 +15,8 @@ enum class GenerationStatus {
RUNNING = 0, // Default status for ongoing generation
FINISHED = 1, // Status set when generation has been finished
IGNORED = 2, // Status set when generation run into out-of-memory condition and could not be continued
DROPPED_BY_PIPELINE = 3, // Currently not used, TODO: implement abort functionality
DROPPED_BY_HANDLE = 4 // Status set when generation handle is dropped
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's deprecate DROPPED_BY_HANDLE via OPENVINO_ENUM_DEPRECATED and assign DROPPED_BY_HANDLE = STOP


bool is_stopped();

bool is_canceled();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to highlight what variant we want to use cancelled or canceled
@Wovchena @sbalandi I see that both spellings are OK, but want to pay your attention additionally..

@@ -4,16 +4,29 @@
#pragma once

#include "openvino/genai/tokenizer.hpp"
#include "openvino/genai/generation_handle.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this header file is not required here anymore

@@ -22,6 +35,12 @@ class OPENVINO_GENAI_EXPORTS StreamerBase {
/// @brief end is called at the end of generation. It can be used to flush cache if your own streamer has one
virtual void end() = 0;

/// @brief get_streaming_status() is called by the pipline to take more detailed about streaming status. m_streaming_finish_status, which contains streaming status info, could be set in put().
/// @return ov::genai::StreamerRunningStatus to determine the streaming status of generation, whether generation is running, stopped or cancelled
virtual StreamerRunningStatus get_streaming_status() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual StreamerRunningStatus get_streaming_status() {
virtual StreamerRunningStatus get_streaming_status() const {

@@ -171,7 +171,7 @@ std::pair<ov::genai::EncodedResults, bool> decode(std::shared_ptr<ov::genai::Whi

sampler.clear_request_info(sequence_group->get_request_id());

return {results, sequence_group->handle_dropped()};
return {results, sequence_group->handle_stopped()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to handle cancel() as well.

As Whisper does not have chat scenario, cancel() and stop() work with no difference

@@ -217,6 +222,106 @@ def test_callback_kwargs_batch_throws(callback):
pipe.generate(['1', '2'], max_new_tokens=10, streamer=callback)


@pytest.mark.precommit
@pytest.mark.nightly
def test_callback_terminate_by_bool_sampler():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_callback_terminate_by_bool_sampler():
def test_callback_terminate_by_bool():

why do we need sampler in test name? IMO, we don't need such implementation detail here.

If you are OK, we need to drop _sampler postfix from other tests as well.

current_iter += 1
return current_iter == num_iters

ov_generation_config = GenerationConfig(max_new_tokens=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use ignore_eos=True as generation in a theory can finish by itself on num_iters's iteration

the same for other tests

@@ -114,7 +114,7 @@ int main(int argc, char* argv[]) try {
print_generation_result(generation_result);
}
break;
case ov::genai::GenerationStatus::DROPPED_BY_PIPELINE:
case ov::genai::GenerationStatus::CANCEL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case ov::genai::GenerationStatus::CANCEL:
case ov::genai::GenerationStatus::CANCEL:
case ov::genai::GenerationStatus::STOP:

@@ -124,7 +124,7 @@ int main(int argc, char* argv[]) try {
print_cb_generation_result(generation_result);
}
break;
case ov::genai::GenerationStatus::DROPPED_BY_PIPELINE:
case ov::genai::GenerationStatus::CANCEL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case ov::genai::GenerationStatus::CANCEL:
case ov::genai::GenerationStatus::CANCEL:
case ov::genai::GenerationStatus::STOP:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GenAI C++ API Changes in GenAI C++ public headers category: LLM LLM pipeline (stateful, static) category: prompt lookup category: Python API Python API for GenAI category: samples GenAI samples category: speculative decoding Speculative decoding category: visual language Visual language pipeline category: whisper Whisper pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants