Accelerate the index construction process #795

aeft · 2024-04-10T05:31:23Z

What behavior of the library made you think about the improvement?

My understanding of the index construction process is that for each state in the FSM, we need to iterate through all tokens in the vocabulary (e.g., 32000) and find all valid tokens (i.e., those that can reach valid states).

How would you like it to behave?

First, let me give a example:

Considering the current state and its out edges, if its out edges are limited, for example, only Number edge, which means the state transistion happens only if the first character of the next token is a number. We can exploit this finding to avoid the huge vocabulary iteration for this state, i.e., we only iterate through tokens whose first character is a number.

We can apply this idea into other characters. To do this, we need to preprocess the vocabulary, i.e., find the corresponding tokens for each first character. There are two ways:

We sort the vocabulary based on the first character. Then, for each character, we maintain an index pair (start_index, end_index), which indicates the token range.
We don't change the original vocabulary but create a new sorted vocabulary.

The first way can save a little memory, and it depends on whether the order of original vocabulary matters.

The cost introduced by this method (let's call it method1) is the sorting cost. If the number of states is large or we can save the sorted vocabulary. The cost can be almost ignored (amortized).

Can we extend this idea further?

We can build a trie tree for the vocabulary. Then we use BFS to traverse the trie tree and maintain the corresponding state from the FSM. Thus, for tokens with the same prefix, we don't need to access the FSM multiple times. For example, for "hel", "hell", and "hello" as tokens, the "h->e->l" transition path only needs to be traversed once.

The real performance gain from method2 depends on the implementation because the original implementation uses numba.jit, and it may compensate for the performance of the naive algorithm.

Note: For method1 and method2, we still need to consider all states. Are there some methods to skip some states? I'm still figuring that out. (Maybe impossible, because in the worst case, the LLM generates one character at a time.)

I might have gotten something wrong. I am looking forward to your advice if you are available. Thank you! If you don't mind, I would like to implement this when the method is mature.

rlouf · 2024-04-10T10:36:37Z

It's hard to evaluate the performance improvement ahead of time; at this point there is no other choice but to implement and benchmark.

Afaiu this is close to what was proposed in #507

aeft · 2024-04-10T15:50:21Z

It seems like the trie implementation in #507 didn't get merged. Are there some reasons?

rlouf · 2024-04-11T08:14:52Z

Said PR was making several changes at the same time, with different impacts on the compile time, which made it difficult to evaluate the performance impact. We asked for the PR to be split, but the author never got around to it. Happy to review a PR that implements a single change and with proper and extensive benchmarks.

We have to be conservative with that part of the code and make sure we don't introduce performance regressions.

aeft added the enhancement label Apr 10, 2024

rlouf added optimization Related to performance optimizations structured generation Linked to structured generation labels Apr 10, 2024

brandonwillard linked a pull request Apr 20, 2024 that will close this issue

Use a trie for scanning during index construction #507

Closed

lapp0 mentioned this issue May 10, 2024

Use a trie to speed up index construction #887

Closed

lapp0 mentioned this issue May 31, 2024

Fix llamacpp caching by making LlamaCppTokenizer an outlines Tokenizer #929

Merged

rlouf removed a link to a pull request Aug 12, 2024

Use a trie for scanning during index construction #507

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate the index construction process #795

Accelerate the index construction process #795

aeft commented Apr 10, 2024 •

edited

Loading

rlouf commented Apr 10, 2024 •

edited

Loading

aeft commented Apr 10, 2024

rlouf commented Apr 11, 2024

Accelerate the index construction process #795

Accelerate the index construction process #795

Comments

aeft commented Apr 10, 2024 • edited Loading

What behavior of the library made you think about the improvement?

How would you like it to behave?

rlouf commented Apr 10, 2024 • edited Loading

aeft commented Apr 10, 2024

rlouf commented Apr 11, 2024

aeft commented Apr 10, 2024 •

edited

Loading

rlouf commented Apr 10, 2024 •

edited

Loading