Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

causal_lm: add stateful group beam search #81

Conversation

Wovchena
Copy link
Collaborator

@Wovchena Wovchena commented Dec 16, 2023

Ticket 123782

int main(int argc, char* argv[]) try {
if (argc != 5) {
throw std::runtime_error(std::string{"Usage: "} + argv[0]
+ " <openvino_model.xml> <tokenizer.xml> <detokenizer.xml> '<prompt>'");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify command line for model passing: expect one parameter with model directory with all three IRs inside and hardcode names of those IRs in the code? It is somewhat we would expect from future development of optimum-intel where both tokenizer and detokenizer will be exported alongside with the model. We already have a name for the main model: openvino_model.xml and we can introduce openvino_tokenizer.xml and openvino_detokenizer.xml correspondingly. Now they are called tokenizer.xml and detokenizer.xml but better to align with optimum-intel naming for the main model.

Please collect this as a potential improvement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 32 to 50
}
ov::Core core;
core.add_extension(USER_OV_EXTENSIONS_PATH); // USER_OV_EXTENSIONS_PATH is defined in root CMakeLists.txt
auto [input_ids, mask] = tokenize(core.compile_model(argv[2], "CPU").create_infer_request(), argv[4]);
ov::InferRequest detokenizer = core.compile_model(argv[3], "CPU").create_infer_request();
ov::InferRequest ireq = core.compile_model(argv[1], "CPU").create_infer_request();
ireq.set_tensor("input_ids", input_ids);
ireq.set_tensor("attention_mask", mask);
ov::Tensor position_ids = ireq.get_tensor("position_ids");
position_ids.set_shape(input_ids.get_shape());
std::iota(position_ids.data<int64_t>(), position_ids.data<int64_t>() + position_ids.get_size(), 0);
ireq.get_tensor("beam_idx").set_shape({1});
ireq.get_tensor("beam_idx").data<int32_t>()[0] = 0;
Parameters parameters;
const int64_t* prompt_data = input_ids.data<const int64_t>();
parameters.prompt = std::vector<int64_t>{prompt_data, prompt_data + input_ids.get_size()};
GroupBeamSearcher group_beam_searcher{parameters};
std::vector<int64_t> next_tokens;
std::vector<int32_t> next_beams;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break this wall of code to more organized stages with brief comments. When looking at this region, I just don't want to read it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with a better idea instead of comments. Function names are scoped comments, so I use them

}
ov::Core core;
core.add_extension(USER_OV_EXTENSIONS_PATH); // USER_OV_EXTENSIONS_PATH is defined in root CMakeLists.txt
auto [input_ids, mask] = tokenize(core.compile_model(argv[2], "CPU").create_infer_request(), argv[4]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice to combine everything in a single statement because it needs calling tokenizer only once in this particular sample. But let's make it just more practical for code reuse. In real application, most likely, tokenizer will be used multiple times, and if the user doesn't really understand what's happening here in this statement, this line -- without much modifications -- will be used multiple times with the compile_model, creation of infer request and disposal, that is not really what we want to teach users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 57 to 65
size_t batch_size = next_tokens.size();
ireq.set_tensor("input_ids", ov::Tensor{ov::element::i64, {batch_size, 1}, next_tokens.data()});
ov::Tensor attention_mask = ireq.get_tensor("attention_mask");
ov::Shape mask_shape{batch_size, attention_mask.get_shape().at(1) + 1};
attention_mask.set_shape(mask_shape);
std::fill_n(attention_mask.data<int64_t>(), shape_size(mask_shape), 1);
position_ids.set_shape({batch_size, 1});
std::fill_n(position_ids.data<int64_t>(), batch_size, mask_shape.at(1) - 1);
ireq.set_tensor("beam_idx", ov::Tensor{ov::element::i32, {batch_size}, next_beams.data()});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide description in comment what's happening here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 68 to 72
if (!group.done) {
for (Beam& beam : group.ongoing) {
group.finish(std::move(beam), parameters);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks as a GroupBeamSearcher internal detail that should be exposed as a function finish or something like that to make group.finish for all groups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_heap also looks like internal detail.
maybe we can wrap if with value& get_beams() which will return us accumulated beams ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

)
else()
target_compile_options(beam_search_causal_lm PRIVATE -Wall) # Display all warnings
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we need to set all these compiler options ?
I suppose our cmake files should be as simple as possible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@Wovchena Wovchena force-pushed the causal_lm-add-stateful-group-beam-search branch from b65d944 to 244c669 Compare December 18, 2023 14:01
Comment on lines 37 to 61
void initialize_inputs(ov::InferRequest& lm, const ov::Tensor& input_ids, const ov::Tensor& attention_mask) {
lm.set_tensor("input_ids", input_ids);
lm.set_tensor("attention_mask", attention_mask);
ov::Tensor position_ids = lm.get_tensor("position_ids");
position_ids.set_shape(input_ids.get_shape());
std::iota(position_ids.data<int64_t>(), position_ids.data<int64_t>() + position_ids.get_size(), 0);
lm.get_tensor("beam_idx").set_shape({1});
lm.get_tensor("beam_idx").data<int32_t>()[0] = 0;
}

void set_pointers(
ov::InferRequest& lm, std::vector<int64_t>& next_tokens, std::vector<int32_t>& next_beams) {
size_t batch_size = next_tokens.size();
lm.set_tensor("input_ids", ov::Tensor{ov::element::i64, {batch_size, 1}, next_tokens.data()});
lm.set_tensor("beam_idx", ov::Tensor{ov::element::i32, {batch_size}, next_beams.data()});
}

void set_auxiliary_inputs(ov::InferRequest& lm) {
size_t batch_size = lm.get_tensor("input_ids").get_shape().front();
ov::Tensor attention_mask = lm.get_tensor("attention_mask");
ov::Shape mask_shape{batch_size, attention_mask.get_shape().at(1) + 1};
attention_mask.set_shape(mask_shape);
std::fill_n(attention_mask.data<int64_t>(), ov::shape_size(mask_shape), 1);
lm.get_tensor("position_ids").set_shape({batch_size, 1});
std::fill_n(lm.get_tensor("position_ids").data<int64_t>(), batch_size, mask_shape.at(1) - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code without comments just moved to three functions without comments. It is not better, even worse because context disappeared. Please leave code there in the main function but provide more information about what is happening in comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

  1. How can context disappear?

  2. My belief is that the better implementation the fewer comments required. Replacing function names with comments does the reverse. Is that my belief which is wrong here?

  3. provide more information

    I don't have more information to state apart from function names converted to comments. You need to be more specific

python ./convert_tokenizers.py ./open_llama_3b_v2/
./build/causal_lm ./open_llama_3b_v2/openvino_model.xml ./tokenizer.xml ./detokenizer.xml "return 0"
python ./convert_tokenizers.py ./open_llama_3b_v2/pytorch/dldt/FP16/ --streaming-detokenizer
./build/causal_lm ./open_llama_3b_v2/pytorch/dldt/FP16/ "return 0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comments about dldt and removal of openvino via pip

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks version check somehow: huggingface/optimum-intel#486 (comment)
dldt is generated by convert.py. I didn't do it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks version check somehow: huggingface/optimum-intel#486 (comment)

Looks strange
Ok, let's fix it later

Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's resolve conflicts.

@ilya-lavrenov ilya-lavrenov merged commit 5b3496d into openvinotoolkit:master Dec 20, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants