You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "tabbyAPI/git/backends/exllamav2/model.py", line 716, in create_generator
self.generator = ExLlamaV2DynamicGeneratorAsync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/imoc/miniconda3/envs/tabbyAPI/lib/python3.11/site-packages/exllamav2/generator/dynamic_async.py", line 16, in __init__
self.generator = ExLlamaV2DynamicGenerator(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/imoc/miniconda3/envs/tabbyAPI/lib/python3.11/site-packages/exllamav2/generator/dynamic.py", line 401, in __init__
assert self.max_chunk_size % self.page_size == 0, \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: max_chunk_size must be multiple of 256, received None
Reproduction steps
Launch a default server with max_seq_len set to 2047 or lower or (not a multiple of 256?)
Expected behavior
All backend parameter being set correctly even with the WARNING:
"WARNING: The given cache size (2047) is not a multiple of 256.
WARNING: Overriding cache_size with an overestimated value of 2048 tokens."
Acknowledgements
I have looked for similar issues before submitting this one.
I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.
The text was updated successfully, but these errors were encountered:
OS
Linux
GPU Library
CUDA 12.x
Python version
3.11
Describe the bug
When setting max_seq_len <= 2047 (or not a multiple of 256 && cache_size (un)set?) will trigger a logic bug(?) in"https://github.com/turboderp/exllamav2/blob/master/exllamav2/generator/dynamic.py#L392":
Reproduction steps
Launch a default server with max_seq_len set to 2047 or lower or (not a multiple of 256?)
Expected behavior
All backend parameter being set correctly even with the WARNING:
"WARNING: The given cache size (2047) is not a multiple of 256.
WARNING: Overriding cache_size with an overestimated value of 2048 tokens."
Acknowledgements
The text was updated successfully, but these errors were encountered: