Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] max_seq_len can not be <= 2047 #240

Closed
4 tasks done
Originalimoc opened this issue Nov 15, 2024 · 1 comment · Fixed by #243
Closed
4 tasks done

[BUG] max_seq_len can not be <= 2047 #240

Originalimoc opened this issue Nov 15, 2024 · 1 comment · Fixed by #243
Labels
bug Something isn't working

Comments

@Originalimoc
Copy link

Originalimoc commented Nov 15, 2024

OS

Linux

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

When setting max_seq_len <= 2047 (or not a multiple of 256 && cache_size (un)set?) will trigger a logic bug(?) in"https://github.com/turboderp/exllamav2/blob/master/exllamav2/generator/dynamic.py#L392":

  File "tabbyAPI/git/backends/exllamav2/model.py", line 716, in create_generator
    self.generator = ExLlamaV2DynamicGeneratorAsync(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/imoc/miniconda3/envs/tabbyAPI/lib/python3.11/site-packages/exllamav2/generator/dynamic_async.py", line 16, in __init__
    self.generator = ExLlamaV2DynamicGenerator(*args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/imoc/miniconda3/envs/tabbyAPI/lib/python3.11/site-packages/exllamav2/generator/dynamic.py", line 401, in __init__
    assert self.max_chunk_size % self.page_size == 0, \
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: max_chunk_size must be multiple of 256, received None

Reproduction steps

Launch a default server with max_seq_len set to 2047 or lower or (not a multiple of 256?)

Expected behavior

All backend parameter being set correctly even with the WARNING:
"WARNING: The given cache size (2047) is not a multiple of 256.
WARNING: Overriding cache_size with an overestimated value of 2048 tokens."

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.
@Originalimoc Originalimoc added the bug Something isn't working label Nov 15, 2024
@DocShotgun
Copy link
Member

Can you try this? #243

I was able to load a model with a max_seq_len of 1337 and have the cache_size and chunk_size be autocorrected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants