Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a trie to speed up index construction #887

Closed
wants to merge 2 commits into from
Closed

Conversation

lapp0
Copy link
Contributor

@lapp0 lapp0 commented May 10, 2024

Fixes #795

Awaiting #1070

Problem

In regex.py Outlines compiles an index of legal tokens for each state of the FSM (state_scan_tokens)

On the main branch we use a naive approach:

  • for each (state, token) belonging to (FSM, vocabulary), simulate FSM traversal character-by-character, saving (state, token) pairs which can successfully walk the FSM.

Calling walking the FSM num_tokens * num_states times is inefficient and the current bottleneck in index construction.

Solution

We can improve this by using a Trie.

We know for a fact that if the token foo is illegal at a state, the token foobar is also illegal. Therefore we construct a trie, and if only if foo is legal, we check foobar, and only if foobar is legal, we check foobarbaz.

This prevents unnecessary work from being performed and reduces runtime on average.

Result

Some overhead is introduced, resulting in a slight degradation in performance for simple patterns, but a great performance increase for complex patterns.

  • Worst case (ssn): 352ms -> 374ms
  • Best case (complex_schema): 3.09s ->1.30s

Analysis

This analysis reviews the profiling details of multiple runs using complex_schema.

This PR is primarily an optimization of state_scan_tokens. Runtime of this function reduces from an average of 6.52s to 1.77 for complex_schema

New result profiling breakdown (performance varies by pattern):

  • RegexGuide(): 4.55s
    • create_states_mapping(): 4.15s
      • Pattern.to_fsm().reduce(): 0.961s
      • create_fsm_index_tokenizer(): 3.34s
        • (overhead): 0.14s
        • reduced_vocabulary(): 0.55s
        • create_fsm_index_end_to_end(): 2.29s
          • state_scan_tokens(...): 1.77s (359 calls)
          • (overhead): 0.47s

Benchmarks

(Benchmarks are relative to #1070)

Benchmarks that have improved:

| Change   | Before [65ae1585]    | After [a8075e9c]    |   Ratio | Benchmark (Parameter)                                                                                     |
|----------|----------------------|---------------------|---------|-----------------------------------------------------------------------------------------------------------|
| -        | 3.09±0.02s           | 1.30±0.01s          |    0.42 | bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm('complex_schema')                           |
| -        | 1.58±0s              | 837±4ms             |    0.53 | bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm('simple_schema')                            |
| -        | 662±7ms              | 407±6ms             |    0.61 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('complex_phone')                                |
| -        | 4.23±0s              | 2.06±0s             |    0.49 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('complex_span_constrained_relation_extraction') |
| -        | 504±2ms              | 391±6ms             |    0.78 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('date')                                         |
| -        | 472±1ms              | 407±3ms             |    0.86 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('email')                                        |
| -        | 423±1ms              | 377±4ms             |    0.89 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('ip')                                           |
| -        | 571±4ms              | 475±5ms             |    0.83 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('url')                                          |

Benchmarks that have stayed the same:

| Change   | Before [65ae1585]    | After [a8075e9c]    |   Ratio | Benchmark (Parameter)                                                                                              |
|----------|----------------------|---------------------|---------|--------------------------------------------------------------------------------------------------------------------|
|          | 92.3±0.5μs           | 91.7±1μs            |    0.99 | bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_regex('complex_schema')                                  |
|          | 50.5±0.9μs           | 50.0±0.5μs          |    0.99 | bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_regex('simple_schema')                                   |
|          | 3.78±0.01s           | 4.11±0.01s          |    1.09 | bench_numba_compile.NumbaCompileBenchmark.time_compile_numba                                                       |
|          | 183±0.2μs            | 183±0.9μs           |    1    | bench_processors.LogitsProcessorPassthroughBenchmark.time_passthrough('numpy')                                     |
|          | 181±0.4μs            | 181±3μs             |    1    | bench_processors.LogitsProcessorPassthroughBenchmark.time_passthrough('torch')                                     |
|          | 247±4μs              | 247±3μs             |    1    | bench_processors.LogitsProcessorStructuredBenchmark.time_structured_generation('numpy', 'Z*')                      |
|          | 1.04±0.01ms          | 1.06±0.02ms         |    1.02 | bench_processors.LogitsProcessorStructuredBenchmark.time_structured_generation('numpy', '[^Z]*')                   |
|          | 230±0.8μs            | 224±5μs             |    0.97 | bench_processors.LogitsProcessorStructuredBenchmark.time_structured_generation('torch', 'Z*')                      |
|          | 1.03±0.01ms          | 1.04±0.01ms         |    1    | bench_processors.LogitsProcessorStructuredBenchmark.time_structured_generation('torch', '[^Z]*')                   |
|          | 17.0±0.06ms          | 17.6±0.4ms          |    1.04 | bench_regex_fsm.RegexReducedVocabularyBenchmark.time_reduced_vocabulary(10000)                                     |
|          | 178±2ms              | 175±0.7ms           |    0.98 | bench_regex_fsm.RegexReducedVocabularyBenchmark.time_reduced_vocabulary(100000)                                    |
|          | 1.87±0.02s           | 1.85±0.01s          |    0.99 | bench_regex_fsm.RegexReducedVocabularyBenchmark.time_reduced_vocabulary(1000000)                                   |
|          | 640M                 | 671M                |    1.05 | bench_regex_guide.MemoryRegexGuideBenchmark.peakmem_regex_to_guide('complex_span_constrained_relation_extraction') |
|          | 542M                 | 586M                |    1.08 | bench_regex_guide.MemoryRegexGuideBenchmark.peakmem_regex_to_guide('simple_phone')                                 |
|          | 389±0.9ms            | 384±3ms             |    0.99 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('simple_phone')                                          |
|          | 352±1ms              | 374±4ms             |    1.06 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('ssn')                                                   |
|          | 346±3ms              | 338±2ms             |    0.98 | bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('time')                                                  |

Profile

by tottime

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      359    1.414    0.004    1.414    0.004 /home/andrew/p/outlines/outlines/fsm/regex.py:662(state_scan_tokens)
        1    1.381    1.381    4.081    4.081 /home/andrew/p/outlines/outlines/caching.py:114(wrapper)
        1    0.358    0.358    1.761    1.761 /home/andrew/p/outlines/outlines/fsm/regex.py:783(create_fsm_index_end_to_end)
    67256    0.267    0.000    0.267    0.000 {method 'index' of 'list' objects}
        1    0.231    0.231    0.231    0.231 <string>:2(ctor)
  1665873    0.205    0.000    0.205    0.000 {method 'add' of 'set' objects}
        1    0.136    0.136    1.897    1.897 /home/andrew/p/outlines/outlines/fsm/regex.py:1016(create_fsm_index_tokenizer)
      523    0.134    0.000    0.724    0.001 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:969(crawl)
        1    0.113    0.113    0.113    0.113 /home/andrew/p/outlines/outlines/fsm/regex.py:960(vocab_dict_to_inverted_vocab_list)
      359    0.109    0.000    0.109    0.000 {built-in method torch.tensor}
        8    0.105    0.013    0.105    0.013 {method '__reduce_ex__' of 'object' objects}
  1126880    0.101    0.000    0.101    0.000 {method 'setdefault' of 'dict' objects}
   198170    0.095    0.000    0.107    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:347(follow)
   114135    0.082    0.000    0.082    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:925(follow)
        1    0.082    0.082    0.082    0.082 /home/andrew/p/outlines/outlines/fsm/regex.py:739(get_vocabulary_transition_keys)
   128256    0.074    0.000    0.074    0.000 {method 'decode' of 'tokenizers.decoders.Decoder' objects}
   128256    0.054    0.000    0.177    0.000 /home/andrew/p/outlines/outlines/models/transformers.py:96(convert_token_to_string)
   128420    0.050    0.000    0.196    0.000 /nix/store/yd3jmjzhalskwdbcn5c6sxnsyql9a21w-python3.12-numba-0.60.0/lib/python3.12/site-packages/numba/core/typing/typeof.py:27(typeof)

by cumtime

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.395    5.395 {built-in method builtins.exec}
        1    0.000    0.000    5.395    5.395 <string>:1(<module>)
        1    0.000    0.000    5.395    5.395 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/contextlib.py:78(inner)
        1    0.029    0.029    5.394    5.394 /home/andrew/p/outlines/profile_regex_index.py:37(construct_regex_index)
        1    0.000    0.000    5.365    5.365 /home/andrew/p/outlines/outlines/fsm/guide.py:179(__init__)
        1    0.001    0.001    5.197    5.197 /home/andrew/p/outlines/outlines/caching.py:114(wrapper)
        1    0.000    0.000    5.197    5.197 /home/andrew/p/outlines/outlines/fsm/guide.py:119(create_states_mapping)
        1    0.158    0.158    4.141    4.141 /home/andrew/p/outlines/outlines/fsm/regex.py:1016(create_fsm_index_tokenizer)
        1    0.543    0.543    3.082    3.082 /home/andrew/p/outlines/outlines/fsm/regex.py:783(create_fsm_index_end_to_end)
      359    1.765    0.005    1.766    0.005 /home/andrew/p/outlines/outlines/fsm/regex.py:662(state_scan_tokens)
      523    0.178    0.000    0.957    0.002 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:969(crawl)
     10/1    0.001    0.000    0.911    0.911 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/patterns.py:447(to_fsm)
    128/2    0.001    0.000    0.777    0.388 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/patterns.py:453(<genexpr>)
    118/1    0.002    0.000    0.777    0.777 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/patterns.py:370(to_fsm)
        1    0.002    0.002    0.734    0.734 /home/andrew/p/outlines/outlines/fsm/regex.py:999(reduced_vocabulary)
     13/1    0.000    0.000    0.691    0.691 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/patterns.py:280(to_fsm)
      131    0.001    0.000    0.405    0.003 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:310(concatenate)
    67256    0.354    0.000    0.354    0.000 {method 'index' of 'list' objects}
       10    0.000    0.000    0.315    0.032 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:451(union)
       10    0.000    0.000    0.315    0.031 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:913(parallel)
        1    0.053    0.053    0.312    0.312 /home/andrew/p/outlines/outlines/fsm/regex.py:903(get_normalized_vocab)
        1    0.038    0.038    0.278    0.278 /home/andrew/p/outlines/outlines/fsm/regex.py:938(to_numba_dict)
        1    0.000    0.000    0.267    0.267 /nix/store/yd3jmjzhalskwdbcn5c6sxnsyql9a21w-python3.12-numba-0.60.0/lib/python3.12/site-packages/numba/experimental/jitclass/base.py:119(__call__)
        1    0.267    0.267    0.267    0.267 <string>:2(ctor)
  1665867    0.256    0.000    0.256    0.000 {method 'add' of 'set' objects}
   128256    0.077    0.000    0.251    0.000 /home/andrew/p/outlines/outlines/models/transformers.py:96(convert_token_to_string)
   128420    0.063    0.000    0.247    0.000 /nix/store/yd3jmjzhalskwdbcn5c6sxnsyql9a21w-python3.12-numba-0.60.0/lib/python3.12/site-packages/numba/core/typing/typeof.py:27(typeof)
   128256    0.059    0.000    0.174    0.000 /nix/store/dvqgbyv0i8bh4ddqay6cpsv9i48xi3ic-python3.12-transformers-4.43.3/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py:637(convert_tokens_to_string)
        1    0.015    0.015    0.168    0.168 /home/andrew/p/outlines/outlines/fsm/guide.py:285(_cache_state_to_token_tensor)
        1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/outlines/models/transformers.py:118(__hash__)
        1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/datasets/fingerprint.py:226(hash)
        1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/datasets/utils/_dill.py:106(dumps)
        1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/datasets/utils/_dill.py:101(dump)
        1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/dill/_dill.py:418(dump)
        1    0.000    0.000    0.156    0.156 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/pickle.py:470(dump)
    179/1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/datasets/utils/_dill.py:31(save)
    179/1    0.000    0.000    0.156    0.156 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/dill/_dill.py:367(save)
    179/1    0.000    0.000    0.155    0.155 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/pickle.py:529(save)
        1    0.000    0.000    0.155    0.155 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/datasets/utils/_dill.py:202(_save_transformersPreTrainedTokenizerBase)
      9/1    0.000    0.000    0.155    0.155 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/pickle.py:615(save_reduce)
      9/1    0.000    0.000    0.155    0.155 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/dill/_dill.py:1195(save_module_dict)
      9/1    0.000    0.000    0.155    0.155 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/pickle.py:959(save_dict)
      9/1    0.000    0.000    0.155    0.155 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/datasets/utils/_dill.py:72(_batch_setitems)
      9/1    0.000    0.000    0.155    0.155 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/pickle.py:970(_batch_setitems)
      359    0.153    0.000    0.153    0.000 {built-in method torch.tensor}
        8    0.153    0.019    0.153    0.019 {method '__reduce_ex__' of 'object' objects}
       13    0.000    0.000    0.145    0.011 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:364(__add__)
128744/128420    0.054    0.000    0.143    0.000 /nix/store/z7xxy35k7620hs6fn6la5fg2lgklv72l-python3-3.12.4/lib/python3.12/functools.py:904(wrapper)
   198170    0.125    0.000    0.139    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.12/site-packages/interegular/fsm.py:347(follow)
        1    0.139    0.139    0.139    0.139 /home/andrew/p/outlines/outlines/fsm/regex.py:960(vocab_dict_to_inverted_vocab_list)

@rlouf
Copy link
Member

rlouf commented May 10, 2024

Could you add a high-level description of what the PR does so the PR is self-contained? Is there an issue we could link to this PR? Also there is no need to add "WIP" to the title, this is what "Draft PR" means :)

@rlouf rlouf changed the title WIP: Vocab Trie To Speed Up regex.py Use a trie to speed up index construction May 10, 2024
@lapp0
Copy link
Contributor Author

lapp0 commented May 15, 2024

I believe I've found a bug in regex.py's reduced_vocabulary()

For the token 188 in the gpt2 tokenizer ('\x00'), token_tuple_np is empty (array([''], dtype='<U2')), however it isn't added to empty_token_ids.

Edit: appears it's being addressed in #904

@lapp0
Copy link
Contributor Author

lapp0 commented May 31, 2024

Seeing great results with this so far!

state_scan_tokens

  • before: 10.391
  • after: 1.925

Pretty close to interegular to_fsm (time = 1.763) being the majority of the index compilation time.

Full results

trie:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.010    0.010    5.326    5.326 /home/andrew/p/outlines/profile_null_byte_fix.py:24(profile_email_guide)
        1    0.000    0.000    5.317    5.317 /home/andrew/p/outlines/outlines/fsm/guide.py:140(__init__)
        1    0.001    0.001    5.317    5.317 /home/andrew/p/outlines/outlines/caching.py:113(wrapper)
        1    0.001    0.001    5.316    5.316 /home/andrew/p/outlines/outlines/fsm/guide.py:108(create_states_mapping)
        1    0.032    0.032    3.369    3.369 /home/andrew/p/outlines/outlines/fsm/regex.py:885(create_fsm_index_tokenizer)
        1    0.518    0.518    3.254    3.254 /home/andrew/p/outlines/outlines/fsm/regex.py:732(create_fsm_index_end_to_end)
      389    1.924    0.005    1.925    0.005 /home/andrew/p/outlines/outlines/fsm/regex.py:647(state_scan_tokens)
     10/1    0.001    0.000    1.763    1.763 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:447(to_fsm)
      523    0.268    0.001    1.567    0.003 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:969(crawl)
    128/2    0.001    0.000    1.545    0.772 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:453(<genexpr>)
    118/1    0.003    0.000    1.545    1.545 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:370(to_fsm)
     13/1    0.000    0.000    1.398    1.398 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:280(to_fsm)
      131    0.002    0.000    0.868    0.007 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:310(concatenate)
       10    0.000    0.000    0.600    0.060 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:451(union)
       10    0.000    0.000    0.600    0.060 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:913(parallel)
    67256    0.585    0.000    0.585    0.000 {method 'index' of 'list' objects}
        1    0.000    0.000    0.465    0.465 /nix/store/sc2wsadi9mk4kq5r1h6gvi8z2r9c1cpq-python3.11-numba-0.59.1/lib/python3.11/site-packages/numba/experimental/jitclass/base.py:119(__call__)
        1    0.465    0.465    0.465    0.465 <string>:2(ctor)
      269    0.018    0.000    0.291    0.001 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:112(union)
   114135    0.267    0.000    0.267    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:925(follow)
       13    0.000    0.000    0.224    0.017 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:364(__add__)
   198170    0.202    0.000    0.223    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:347(follow)
      930    0.211    0.000    0.211    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:93(__init__)
      125    0.000    0.000    0.182    0.001 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:374(star)
        1    0.000    0.000    0.150    0.150 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:249(reduce)
        2    0.007    0.003    0.149    0.075 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:558(reversed)
  1153581    0.135    0.000    0.135    0.000 {method 'add' of 'set' objects}
        1    0.122    0.122    0.122    0.122 /home/andrew/p/outlines/outlines/fsm/regex.py:716(get_all_token_transitions)
    24185    0.086    0.000    0.118    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:580(follow)
   743602    0.077    0.000    0.077    0.000 {method 'setdefault' of 'dict' objects}
    65660    0.057    0.000    0.059    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:384(follow)
      269    0.011    0.000    0.055    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:114(<dictcomp>)
      255    0.001    0.000    0.044    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:409(times)
    57272    0.017    0.000    0.044    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:114(<genexpr>)
    801/1    0.002    0.000    0.040    0.040 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:69(get_alphabet)
     10/1    0.000    0.000    0.040    0.040 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:423(_get_alphabet)
    128/2    0.000    0.000    0.040    0.020 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:425(<genexpr>)
    118/1    0.001    0.000    0.040    0.040 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:330(_get_alphabet)
    787/2    0.000    0.000    0.040    0.020 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:331(<genexpr>)
     13/1    0.000    0.000    0.040    0.040 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:270(_get_alphabet)
        1    0.040    0.040    0.040    0.040 /home/andrew/p/outlines/outlines/fsm/regex.py:911(<dictcomp>)
       19    0.000    0.000    0.032    0.002 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:445(__mul__)
        1    0.000    0.000    0.030    0.030 /home/andrew/p/outlines/outlines/models/transformers.py:113(__hash__)
        1    0.000    0.000    0.030    0.030 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/datasets/fingerprint.py:226(hash)
        1    0.000    0.000    0.030    0.030 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/datasets/utils/_dill.py:106(dumps)

main:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.010    0.010   13.219   13.219 /home/andrew/p/outlines/profile_null_byte_fix.py:23(profile_email_guide)
        1    0.000    0.000   13.208   13.208 /home/andrew/p/outlines/outlines/fsm/guide.py:140(__init__)
        1    0.001    0.001   13.208   13.208 /home/andrew/p/outlines/outlines/caching.py:113(wrapper)
        1    0.000    0.000   13.208   13.208 /home/andrew/p/outlines/outlines/fsm/guide.py:108(create_states_mapping)
        1    0.016    0.016   11.280   11.280 /home/andrew/p/outlines/outlines/fsm/regex.py:829(create_fsm_index_tokenizer)
        1    0.545    0.545   11.185   11.185 /home/andrew/p/outlines/outlines/fsm/regex.py:684(create_fsm_index_end_to_end)
      389   10.389    0.027   10.391    0.027 /home/andrew/p/outlines/outlines/fsm/regex.py:651(state_scan_tokens)
     10/1    0.001    0.000    1.740    1.740 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:447(to_fsm)
      523    0.275    0.001    1.594    0.003 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:969(crawl)
    128/2    0.001    0.000    1.520    0.760 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:453(<genexpr>)
    118/1    0.003    0.000    1.519    1.519 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:370(to_fsm)
     13/1    0.000    0.000    1.371    1.371 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:280(to_fsm)
      131    0.002    0.000    0.836    0.006 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:310(concatenate)
       10    0.000    0.000    0.599    0.060 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:451(union)
       10    0.000    0.000    0.599    0.060 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:913(parallel)
    67256    0.581    0.000    0.581    0.000 {method 'index' of 'list' objects}
   114135    0.273    0.000    0.273    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:925(follow)
      269    0.174    0.001    0.244    0.001 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:112(union)
       13    0.000    0.000    0.242    0.019 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:364(__add__)
   198170    0.211    0.000    0.232    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:347(follow)
      125    0.000    0.000    0.190    0.002 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:374(star)
        1    0.000    0.000    0.152    0.152 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:249(reduce)
        2    0.005    0.003    0.152    0.076 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:558(reversed)
  1154007    0.136    0.000    0.136    0.000 {method 'add' of 'set' objects}
    24185    0.089    0.000    0.121    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:580(follow)
   743602    0.075    0.000    0.075    0.000 {method 'setdefault' of 'dict' objects}
    65660    0.061    0.000    0.063    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:384(follow)
      269    0.011    0.000    0.056    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:114(<dictcomp>)
        1    0.047    0.047    0.047    0.047 /home/andrew/p/outlines/outlines/fsm/regex.py:855(<dictcomp>)
    57272    0.018    0.000    0.045    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:114(<genexpr>)
      255    0.000    0.000    0.045    0.000 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/fsm.py:409(times)
    801/1    0.002    0.000    0.043    0.043 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:69(get_alphabet)
     10/1    0.000    0.000    0.043    0.043 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:423(_get_alphabet)
    128/2    0.000    0.000    0.043    0.021 /home/andrew/p/outlines/.myenv2/lib/python3.11p/site-packages/interegular/patterns.py:425(<genexpr>)
    118/1    0.001    0.000    0.043    0.043 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:330(_get_alphabet)
    787/2    0.000    0.000    0.042    0.021 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:331(<genexpr>)
     13/1    0.000    0.000    0.042    0.042 /home/andrew/p/outlines/.myenv2/lib/python3.11/site-packages/interegular/patterns.py:270(_get_alphabet)
      359    0.003    0.000    0.034    0.000 /nix/store/qd7h3vn2bff6jjigdvq0xh91q49sm1ng-python3.11-tqdm-4.66.4/lib/python3.11/site-packages/tqdm/std.py:1198(update)

@lapp0 lapp0 force-pushed the trie branch 3 times, most recently from e189254 to 79c67ea Compare July 30, 2024 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accelerate the index construction process
2 participants