-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
causal_lm: add stateful group beam search #81
causal_lm: add stateful group beam search #81
Conversation
int main(int argc, char* argv[]) try { | ||
if (argc != 5) { | ||
throw std::runtime_error(std::string{"Usage: "} + argv[0] | ||
+ " <openvino_model.xml> <tokenizer.xml> <detokenizer.xml> '<prompt>'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify command line for model passing: expect one parameter with model directory with all three IRs inside and hardcode names of those IRs in the code? It is somewhat we would expect from future development of optimum-intel where both tokenizer and detokenizer will be exported alongside with the model. We already have a name for the main model: openvino_model.xml
and we can introduce openvino_tokenizer.xml
and openvino_detokenizer.xml
correspondingly. Now they are called tokenizer.xml
and detokenizer.xml
but better to align with optimum-intel naming for the main model.
Please collect this as a potential improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
ov::Core core; | ||
core.add_extension(USER_OV_EXTENSIONS_PATH); // USER_OV_EXTENSIONS_PATH is defined in root CMakeLists.txt | ||
auto [input_ids, mask] = tokenize(core.compile_model(argv[2], "CPU").create_infer_request(), argv[4]); | ||
ov::InferRequest detokenizer = core.compile_model(argv[3], "CPU").create_infer_request(); | ||
ov::InferRequest ireq = core.compile_model(argv[1], "CPU").create_infer_request(); | ||
ireq.set_tensor("input_ids", input_ids); | ||
ireq.set_tensor("attention_mask", mask); | ||
ov::Tensor position_ids = ireq.get_tensor("position_ids"); | ||
position_ids.set_shape(input_ids.get_shape()); | ||
std::iota(position_ids.data<int64_t>(), position_ids.data<int64_t>() + position_ids.get_size(), 0); | ||
ireq.get_tensor("beam_idx").set_shape({1}); | ||
ireq.get_tensor("beam_idx").data<int32_t>()[0] = 0; | ||
Parameters parameters; | ||
const int64_t* prompt_data = input_ids.data<const int64_t>(); | ||
parameters.prompt = std::vector<int64_t>{prompt_data, prompt_data + input_ids.get_size()}; | ||
GroupBeamSearcher group_beam_searcher{parameters}; | ||
std::vector<int64_t> next_tokens; | ||
std::vector<int32_t> next_beams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please break this wall of code to more organized stages with brief comments. When looking at this region, I just don't want to read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with a better idea instead of comments. Function names are scoped comments, so I use them
} | ||
ov::Core core; | ||
core.add_extension(USER_OV_EXTENSIONS_PATH); // USER_OV_EXTENSIONS_PATH is defined in root CMakeLists.txt | ||
auto [input_ids, mask] = tokenize(core.compile_model(argv[2], "CPU").create_infer_request(), argv[4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice to combine everything in a single statement because it needs calling tokenizer only once in this particular sample. But let's make it just more practical for code reuse. In real application, most likely, tokenizer will be used multiple times, and if the user doesn't really understand what's happening here in this statement, this line -- without much modifications -- will be used multiple times with the compile_model, creation of infer request and disposal, that is not really what we want to teach users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
size_t batch_size = next_tokens.size(); | ||
ireq.set_tensor("input_ids", ov::Tensor{ov::element::i64, {batch_size, 1}, next_tokens.data()}); | ||
ov::Tensor attention_mask = ireq.get_tensor("attention_mask"); | ||
ov::Shape mask_shape{batch_size, attention_mask.get_shape().at(1) + 1}; | ||
attention_mask.set_shape(mask_shape); | ||
std::fill_n(attention_mask.data<int64_t>(), shape_size(mask_shape), 1); | ||
position_ids.set_shape({batch_size, 1}); | ||
std::fill_n(position_ids.data<int64_t>(), batch_size, mask_shape.at(1) - 1); | ||
ireq.set_tensor("beam_idx", ov::Tensor{ov::element::i32, {batch_size}, next_beams.data()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide description in comment what's happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
if (!group.done) { | ||
for (Beam& beam : group.ongoing) { | ||
group.finish(std::move(beam), parameters); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks as a GroupBeamSearcher internal detail that should be exposed as a function finish
or something like that to make group.finish for all groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_heap
also looks like internal detail.
maybe we can wrap if with value& get_beams()
which will return us accumulated beams ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
) | ||
else() | ||
target_compile_options(beam_search_causal_lm PRIVATE -Wall) # Display all warnings | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we need to set all these compiler options ?
I suppose our cmake files should be as simple as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
b65d944
to
244c669
Compare
void initialize_inputs(ov::InferRequest& lm, const ov::Tensor& input_ids, const ov::Tensor& attention_mask) { | ||
lm.set_tensor("input_ids", input_ids); | ||
lm.set_tensor("attention_mask", attention_mask); | ||
ov::Tensor position_ids = lm.get_tensor("position_ids"); | ||
position_ids.set_shape(input_ids.get_shape()); | ||
std::iota(position_ids.data<int64_t>(), position_ids.data<int64_t>() + position_ids.get_size(), 0); | ||
lm.get_tensor("beam_idx").set_shape({1}); | ||
lm.get_tensor("beam_idx").data<int32_t>()[0] = 0; | ||
} | ||
|
||
void set_pointers( | ||
ov::InferRequest& lm, std::vector<int64_t>& next_tokens, std::vector<int32_t>& next_beams) { | ||
size_t batch_size = next_tokens.size(); | ||
lm.set_tensor("input_ids", ov::Tensor{ov::element::i64, {batch_size, 1}, next_tokens.data()}); | ||
lm.set_tensor("beam_idx", ov::Tensor{ov::element::i32, {batch_size}, next_beams.data()}); | ||
} | ||
|
||
void set_auxiliary_inputs(ov::InferRequest& lm) { | ||
size_t batch_size = lm.get_tensor("input_ids").get_shape().front(); | ||
ov::Tensor attention_mask = lm.get_tensor("attention_mask"); | ||
ov::Shape mask_shape{batch_size, attention_mask.get_shape().at(1) + 1}; | ||
attention_mask.set_shape(mask_shape); | ||
std::fill_n(attention_mask.data<int64_t>(), ov::shape_size(mask_shape), 1); | ||
lm.get_tensor("position_ids").set_shape({batch_size, 1}); | ||
std::fill_n(lm.get_tensor("position_ids").data<int64_t>(), batch_size, mask_shape.at(1) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code without comments just moved to three functions without comments. It is not better, even worse because context disappeared. Please leave code there in the main function but provide more information about what is happening in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
-
How can context disappear?
-
My belief is that the better implementation the fewer comments required. Replacing function names with comments does the reverse. Is that my belief which is wrong here?
-
provide more information
I don't have more information to state apart from function names converted to comments. You need to be more specific
python ./convert_tokenizers.py ./open_llama_3b_v2/ | ||
./build/causal_lm ./open_llama_3b_v2/openvino_model.xml ./tokenizer.xml ./detokenizer.xml "return 0" | ||
python ./convert_tokenizers.py ./open_llama_3b_v2/pytorch/dldt/FP16/ --streaming-detokenizer | ||
./build/causal_lm ./open_llama_3b_v2/pytorch/dldt/FP16/ "return 0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comments about dldt and removal of openvino via pip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It breaks version check somehow: huggingface/optimum-intel#486 (comment)
dldt
is generated by convert.py
. I didn't do it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It breaks version check somehow: huggingface/optimum-intel#486 (comment)
Looks strange
Ok, let's fix it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's resolve conflicts.
Ticket 123782