Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit create_states_mapping function from disk cache to memory cache. #1033

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

XDeviation
Copy link

@XDeviation XDeviation commented Jul 12, 2024

I found that when using the fixed output format of response_format in vLLM, the same regex_string and tokenizer are repeatedly used to create the CFGGuide and RegexGuide classes and call their get_next_instruction methods. During the creation, create_states_mapping uses the cache decorator to read the cache from the disk. However, the time spent reading the cache from the disk accounts for most of the total time required for get_next_instruction. I believe we can speed up this process by changing the cache of create_states_mapping to be stored in memory.
Here's some logs

2024-07-12 06:40:18.351486 Masking took 0.0003364086151123047
2024-07-12 06:40:18.352973 Use time: 0.0003237724304199219
2024-07-12 06:40:18.356375 FSM state computation took 9.226799011230469e-05
2024-07-12 06:40:18.356978 options={'(?:[ \t\x0c\r\n])+', '\\{', '\\['}
2024-07-12 06:40:18.357009 Cache enabled, args=('(((?:[ \t\x0c\r\n])+)|(\\{)|(\\[))', CachedQwen2TokenizerFast(name_or_path='/models/Qwen2-72B-Instruct-GPTQ-Int4', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}), kwargs={}
2024-07-12 06:40:18.357068 Cache key=('outlines.fsm.guide.create_states_mapping', '(((?:[ \t\x0c\r\n])+)|(\\{)|(\\[))', CachedQwen2TokenizerFast(name_or_path='/models/Qwen2-72B-Instruct-GPTQ-Int4', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}, None)
2024-07-12 06:40:18.442886 Cache hit
2024-07-12 06:40:18.442927 create states mapping done.
2024-07-12 06:40:18.442953 eos token id: 151645
2024-07-12 06:40:18.442973 final states: frozenset({1, 2, -1})
2024-07-12 06:40:18.443003 regex_string='(((?:[ \t\x0c\r\n])+)|(\\{)|(\\[))'
2024-07-12 06:40:18.443921 Instruction computation took 0.0875241756439209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant