causal_lm: add stateful group beam search #81

Wovchena · 2023-12-16T16:03:38Z

Ticket 123782

…ck_strings()

slyalin · 2023-12-18T11:33:07Z

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

+int main(int argc, char* argv[]) try {
+    if (argc != 5) {
+        throw std::runtime_error(std::string{"Usage: "} + argv[0]
+            + " <openvino_model.xml> <tokenizer.xml> <detokenizer.xml> '<prompt>'");


Can we simplify command line for model passing: expect one parameter with model directory with all three IRs inside and hardcode names of those IRs in the code? It is somewhat we would expect from future development of optimum-intel where both tokenizer and detokenizer will be exported alongside with the model. We already have a name for the main model: openvino_model.xml and we can introduce openvino_tokenizer.xml and openvino_detokenizer.xml correspondingly. Now they are called tokenizer.xml and detokenizer.xml but better to align with optimum-intel naming for the main model.

Please collect this as a potential improvement.

slyalin · 2023-12-18T11:35:19Z

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

+    }
+    ov::Core core;
+    core.add_extension(USER_OV_EXTENSIONS_PATH);  // USER_OV_EXTENSIONS_PATH is defined in root CMakeLists.txt
+    auto [input_ids, mask] = tokenize(core.compile_model(argv[2], "CPU").create_infer_request(), argv[4]);
+    ov::InferRequest detokenizer = core.compile_model(argv[3], "CPU").create_infer_request();
+    ov::InferRequest ireq = core.compile_model(argv[1], "CPU").create_infer_request();
+    ireq.set_tensor("input_ids", input_ids);
+    ireq.set_tensor("attention_mask", mask);
+    ov::Tensor position_ids = ireq.get_tensor("position_ids");
+    position_ids.set_shape(input_ids.get_shape());
+    std::iota(position_ids.data<int64_t>(), position_ids.data<int64_t>() + position_ids.get_size(), 0);
+    ireq.get_tensor("beam_idx").set_shape({1});
+    ireq.get_tensor("beam_idx").data<int32_t>()[0] = 0;
+    Parameters parameters;
+    const int64_t* prompt_data = input_ids.data<const int64_t>();
+    parameters.prompt = std::vector<int64_t>{prompt_data, prompt_data + input_ids.get_size()};
+    GroupBeamSearcher group_beam_searcher{parameters};
+    std::vector<int64_t> next_tokens;
+    std::vector<int32_t> next_beams;


Please break this wall of code to more organized stages with brief comments. When looking at this region, I just don't want to read it.

I came up with a better idea instead of comments. Function names are scoped comments, so I use them

slyalin · 2023-12-18T11:41:44Z

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

+    }
+    ov::Core core;
+    core.add_extension(USER_OV_EXTENSIONS_PATH);  // USER_OV_EXTENSIONS_PATH is defined in root CMakeLists.txt
+    auto [input_ids, mask] = tokenize(core.compile_model(argv[2], "CPU").create_infer_request(), argv[4]);


It is nice to combine everything in a single statement because it needs calling tokenizer only once in this particular sample. But let's make it just more practical for code reuse. In real application, most likely, tokenizer will be used multiple times, and if the user doesn't really understand what's happening here in this statement, this line -- without much modifications -- will be used multiple times with the compile_model, creation of infer request and disposal, that is not really what we want to teach users.

slyalin · 2023-12-18T11:45:04Z

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

+        size_t batch_size = next_tokens.size();
+        ireq.set_tensor("input_ids", ov::Tensor{ov::element::i64, {batch_size, 1}, next_tokens.data()});
+        ov::Tensor attention_mask = ireq.get_tensor("attention_mask");
+        ov::Shape mask_shape{batch_size, attention_mask.get_shape().at(1) + 1};
+        attention_mask.set_shape(mask_shape);
+        std::fill_n(attention_mask.data<int64_t>(), shape_size(mask_shape), 1);
+        position_ids.set_shape({batch_size, 1});
+        std::fill_n(position_ids.data<int64_t>(), batch_size, mask_shape.at(1) - 1);
+        ireq.set_tensor("beam_idx", ov::Tensor{ov::element::i32, {batch_size}, next_beams.data()});


Please provide description in comment what's happening here.

slyalin · 2023-12-18T11:49:54Z

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

+        if (!group.done) {
+            for (Beam& beam : group.ongoing) {
+                group.finish(std::move(beam), parameters);
+            }
+        }


Looks as a GroupBeamSearcher internal detail that should be exposed as a function finish or something like that to make group.finish for all groups.

min_heap also looks like internal detail.
maybe we can wrap if with value& get_beams() which will return us accumulated beams ?

ilya-lavrenov · 2023-12-18T12:53:07Z

text_generation/causal_lm/cpp/CMakeLists.txt

+    )
+else()
+    target_compile_options(beam_search_causal_lm PRIVATE -Wall)  # Display all warnings
+endif()


BTW, do we need to set all these compiler options ?
I suppose our cmake files should be as simple as possible

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

slyalin · 2023-12-18T15:04:06Z

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp

+void initialize_inputs(ov::InferRequest& lm, const ov::Tensor& input_ids, const ov::Tensor& attention_mask) {
+    lm.set_tensor("input_ids", input_ids);
+    lm.set_tensor("attention_mask", attention_mask);
+    ov::Tensor position_ids = lm.get_tensor("position_ids");
+    position_ids.set_shape(input_ids.get_shape());
+    std::iota(position_ids.data<int64_t>(), position_ids.data<int64_t>() + position_ids.get_size(), 0);
+    lm.get_tensor("beam_idx").set_shape({1});
+    lm.get_tensor("beam_idx").data<int32_t>()[0] = 0;
+}
+
+void set_pointers(
+        ov::InferRequest& lm, std::vector<int64_t>& next_tokens, std::vector<int32_t>& next_beams) {
+    size_t batch_size = next_tokens.size();
+    lm.set_tensor("input_ids", ov::Tensor{ov::element::i64, {batch_size, 1}, next_tokens.data()});
+    lm.set_tensor("beam_idx", ov::Tensor{ov::element::i32, {batch_size}, next_beams.data()});
+}
+
+void set_auxiliary_inputs(ov::InferRequest& lm) {
+    size_t batch_size = lm.get_tensor("input_ids").get_shape().front();
+    ov::Tensor attention_mask = lm.get_tensor("attention_mask");
+    ov::Shape mask_shape{batch_size, attention_mask.get_shape().at(1) + 1};
+    attention_mask.set_shape(mask_shape);
+    std::fill_n(attention_mask.data<int64_t>(), ov::shape_size(mask_shape), 1);
+    lm.get_tensor("position_ids").set_shape({batch_size, 1});
+    std::fill_n(lm.get_tensor("position_ids").data<int64_t>(), batch_size, mask_shape.at(1) - 1);


Code without comments just moved to three functions without comments. It is not better, even worse because context disappeared. Please leave code there in the main function but provide more information about what is happening in comments.

Updated

How can context disappear?

My belief is that the better implementation the fewer comments required. Replacing function names with comments does the reverse. Is that my belief which is wrong here?

provide more information

I don't have more information to state apart from function names converted to comments. You need to be more specific

text_generation/causal_lm/cpp/README.md

Co-authored-by: Karol Blaszczak <[email protected]>

…talled

ilya-lavrenov · 2023-12-20T12:58:39Z

text_generation/causal_lm/cpp/set_up_and_run.sh

-python ./convert_tokenizers.py ./open_llama_3b_v2/
-./build/causal_lm ./open_llama_3b_v2/openvino_model.xml ./tokenizer.xml ./detokenizer.xml "return 0"
+python ./convert_tokenizers.py ./open_llama_3b_v2/pytorch/dldt/FP16/ --streaming-detokenizer
+./build/causal_lm ./open_llama_3b_v2/pytorch/dldt/FP16/ "return 0"


similar comments about dldt and removal of openvino via pip

It breaks version check somehow: huggingface/optimum-intel#486 (comment)
dldt is generated by convert.py. I didn't do it :)

It breaks version check somehow: huggingface/optimum-intel#486 (comment)

Looks strange
Ok, let's fix it later

ilya-lavrenov

Let's resolve conflicts.

…f shape is not verifyed

Wovchena added 30 commits October 30, 2023 15:33

Replace compex loops with index access, reuse pack_strings() and unpa…

3b5ab0d

…ck_strings()

shape.at(0)->shape[0]

c07ea2a

{1, std::numeric_limits<ov::Dimension::value_type>::max()}->-1

bcf1bf3

Add ref.py

7191fa4

search

2cfe1d8

idx

93489c6

join

8232c93

hangs

b2bea90

fix

b9ad600

fix

a0cf03f

Remove MaxLengthCriteria

f03c519

Remove _expand_inputs_for_generation()

c367846

Remove input_ids_length and device

2bca61e

Group beam search

65eb6d9

NO_REPEAT_NGRAM_SIZE

574659f

Fix assert

715a641

Fix runtime errors

4640ada

Unstable predictions

07fd79f

Fix attention_mask shape

2e55a35

workaround

98e6ecf

prompt

a0ce000

fix

214693a

Update thirdparty/openvino_contrib

e490c2f

Fix compilation for Ubuntu

7c04e4f

int64_t

417b926

fix

64ab0ce

warlus

03a91ac

windows

5d1ec35

fix

6faf9de

double

7ea03a6

--precision FP16

a5f4cd2

slyalin reviewed Dec 18, 2023

View reviewed changes

ilya-lavrenov reviewed Dec 18, 2023

View reviewed changes

slyalin reviewed Dec 18, 2023

View reviewed changes

text_generation/causal_lm/cpp/beam_search_causal_lm.cpp Show resolved Hide resolved

Explain implementation

244c669

Wovchena force-pushed the causal_lm-add-stateful-group-beam-search branch from b65d944 to 244c669 Compare December 18, 2023 14:01

Wovchena added 3 commits December 18, 2023 18:10

Update model_dir

6821d50

Fix argv

9ec142c

no --stateful

930b10f

slyalin reviewed Dec 18, 2023

View reviewed changes

kblaszczak-intel reviewed Dec 18, 2023

View reviewed changes

text_generation/causal_lm/cpp/README.md Outdated Show resolved Hide resolved

Wovchena and others added 5 commits December 19, 2023 16:48

This->These

43a5682

Co-authored-by: Karol Blaszczak <[email protected]>

Replace function names with comments

026acd2

Merge branch 'master' into causal_lm-add-stateful-group-beam-search

6dc0247

Uninstall openvino from PyPI because there's one from the archive ins…

13ac428

…talled

--yes

bca3966

Wovchena mentioned this pull request Dec 20, 2023

causal_lm: migrate to string tensors #92

Merged

ilya-lavrenov reviewed Dec 20, 2023

View reviewed changes

ilya-lavrenov approved these changes Dec 20, 2023

View reviewed changes

ilya-lavrenov added the Code Freeze label Dec 20, 2023

Wovchena added 4 commits December 20, 2023 17:38

Merge branch 'master' into causal_lm-add-stateful-group-beam-search

0f7a675

Reorder model loading in readme, operator[]->at(), because the size o…

acfab04

…f shape is not verifyed

Align comments

280b191

Add greedy_

cafe0a3

ilya-lavrenov approved these changes Dec 20, 2023

View reviewed changes

ilya-lavrenov merged commit 5b3496d into openvinotoolkit:master Dec 20, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

causal_lm: add stateful group beam search #81

causal_lm: add stateful group beam search #81

Wovchena commented Dec 16, 2023 •

edited

Loading

slyalin Dec 18, 2023

Wovchena Dec 18, 2023

slyalin Dec 18, 2023

Wovchena Dec 18, 2023

slyalin Dec 18, 2023

Wovchena Dec 18, 2023

slyalin Dec 18, 2023

Wovchena Dec 18, 2023

slyalin Dec 18, 2023

ilya-lavrenov Dec 18, 2023

Wovchena Dec 18, 2023

ilya-lavrenov Dec 18, 2023

Wovchena Dec 18, 2023

slyalin Dec 18, 2023

Wovchena Dec 19, 2023

ilya-lavrenov Dec 20, 2023

Wovchena Dec 20, 2023

ilya-lavrenov Dec 20, 2023

ilya-lavrenov left a comment

causal_lm: add stateful group beam search #81

causal_lm: add stateful group beam search #81

Conversation

Wovchena commented Dec 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilya-lavrenov left a comment

Choose a reason for hiding this comment

Wovchena commented Dec 16, 2023 •

edited

Loading