Edit create_states_mapping function from disk cache to memory cache. #1033

XDeviation · 2024-07-12T08:43:30Z

I found that when using the fixed output format of response_format in vLLM, the same regex_string and tokenizer are repeatedly used to create the CFGGuide and RegexGuide classes and call their get_next_instruction methods. During the creation, create_states_mapping uses the cache decorator to read the cache from the disk. However, the time spent reading the cache from the disk accounts for most of the total time required for get_next_instruction. I believe we can speed up this process by changing the cache of create_states_mapping to be stored in memory.
Here's some logs

2024-07-12 06:40:18.351486 Masking took 0.0003364086151123047
2024-07-12 06:40:18.352973 Use time: 0.0003237724304199219
2024-07-12 06:40:18.356375 FSM state computation took 9.226799011230469e-05
2024-07-12 06:40:18.356978 options={'(?:[ \t\x0c\r\n])+', '\\{', '\\['}
2024-07-12 06:40:18.357009 Cache enabled, args=('(((?:[ \t\x0c\r\n])+)|(\\{)|(\\[))', CachedQwen2TokenizerFast(name_or_path='/models/Qwen2-72B-Instruct-GPTQ-Int4', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}), kwargs={}
2024-07-12 06:40:18.357068 Cache key=('outlines.fsm.guide.create_states_mapping', '(((?:[ \t\x0c\r\n])+)|(\\{)|(\\[))', CachedQwen2TokenizerFast(name_or_path='/models/Qwen2-72B-Instruct-GPTQ-Int4', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}, None)
2024-07-12 06:40:18.442886 Cache hit
2024-07-12 06:40:18.442927 create states mapping done.
2024-07-12 06:40:18.442953 eos token id: 151645
2024-07-12 06:40:18.442973 final states: frozenset({1, 2, -1})
2024-07-12 06:40:18.443003 regex_string='(((?:[ \t\x0c\r\n])+)|(\\{)|(\\[))'
2024-07-12 06:40:18.443921 Instruction computation took 0.0875241756439209

RobinPicard · 2025-06-19T14:50:47Z

Hi @XDeviation! I'm sorry your PR had never been discussed before. We recently released Outlines v1 that changes a lot in the library. There's still a function create_states_mapping that we import from outlines_core and that we wrap in the cached function in outlines/processors/guide.py though. Would you mind trying again the test you made with v1?

Edit disk cache to memory cache.

0bee392

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Edit create_states_mapping function from disk cache to memory cache. #1033

Edit create_states_mapping function from disk cache to memory cache. #1033

Uh oh!

XDeviation commented Jul 12, 2024 •

edited

Loading

Uh oh!

RobinPicard commented Jun 19, 2025

Uh oh!

Uh oh!

Edit create_states_mapping function from disk cache to memory cache. #1033

Are you sure you want to change the base?

Edit create_states_mapping function from disk cache to memory cache. #1033

Uh oh!

Conversation

XDeviation commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobinPicard commented Jun 19, 2025

Uh oh!

Uh oh!

XDeviation commented Jul 12, 2024 •

edited

Loading