Skip to content

eos token is truncated when max_length is shorter than the total input seq length #1145

Answered by jklj077
Znull-1220 asked this question in Q&A
Discussion options

You must be logged in to vote

I believe the safest approach is to cut off entire messages if they exceed the context length, ensuring there are no incomplete messages in the sequence. However, as long as the truncation is done correctly--meaning the target token for the last token in the sequence after truncation is indeed the next token in the sequence before truncation--it shouldn't matter.

Additionally, since you are effectively using <im_start> as an end-of-turn token, I don't think you need to worry about truncation affecting <im_end>.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Znull-1220
Comment options

Answer selected by Znull-1220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants