-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exllamav2 Integration #1010
base: main
Are you sure you want to change the base?
Exllamav2 Integration #1010
Conversation
Some questions I had for maintainers were
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply and request for refactor.
We've been moving towards using SequenceGeneratorAdapter
and outlines.processors
in outlines.generate
. Currently the only local outlines.model
which doesn't have a SequenceGeneratorAdapter
based implementation is exllamav2
.
Would you be able to refactor this to use SequenceGeneratorAdapter
instead?
This would involve
- Best starting point: Adding an
ExLlamaV2
fixture totests/generate/test_generate.py
which will automatically test all generation methods (structured, batch, stream, etc) against the model here - Adding
ExLlamaV2
to the*_unified
dispatcher https://github.com/outlines-dev/outlines/blob/main/outlines/generate/regex.py#L42-L53 - Ensuring the passed
OutlinesLogitsProcessor
is applied when exllamav2generator.generate(prompt)
is called
@lapp0 make sense! Let me try doing this tomorrow |
Thanks so much, please let me know if you have any questions! |
…exllamav2_filter
@lapp0 sry for delay! Two questions
Sry for the delayed response and let me know if I'm going in the right direction! |
Great questions! Converting it to a filter is a bit hacky IMO, but may be the simplest solution and doesn't require an upstream change. Alternatively we could apply logits processing directly. The way
The first option makes more sense to me, it is generator-class agnostic.
Tbh, I'm not sure how well |
@lapp0 Sounds good! I think I'll go with option 1. For this, I think the steps needed are
|
Rather than implementing a new logits processor, I'm awaiting correspondence with the ExLlamaV2 maintainer, turboderp, regarding whether a |
@lapp0 interesting! The main reason I was thinking of a new logits processor is because we do some redundant steps in terms of exllamav2's code base I thought. In that for them, they first
while in our case, we start with the assumption of the logits getting computed then construct mask etc. So I thought some of the steps here overlap with our current logits processor. But yeah very much happy to get advice here since this is just making the exllamav2 filter. And also happy to hear what turboderp thinks. |
Yes, they will have multiple methods of filtering, but given Outlines singular logits processor implementation, which is tested against all inference engines, it's likely better to follow the same pattern with ExLlamaV2. This will ensure bug fixes, optimizations, enhancement, and new features present in one integration are available to all integrations! I spoke with turboderp on their discord server, he is open to having a Here's the steps I think we should take, let me know what you think:
Let me know if you think this is the right path. Thanks so much for your great work on this PR. The users in the ExLlamaV2 discord were excited to hear about this PR! |
@lapp0 wow, didn't know exllamav2 had a discord server! And makes perfect sense. |
@isamu-isozaki can you please take a look at this changeset and the provided example I believe it should provide a sufficient basis for implementing Let me know if you see anything that should be changed in my implementation. If you have any questions, please do not hesitate! Good luck! Edit: Also please add "Fixes #807" to the PR description. |
@lapp0 sounds good. And sorry got a bit side tracked by some work. I'll try get to this at least by the weekend. Sorry for delay! |
Sorry for the delay, I finally got the exllamav2 fork built and I was able to run the current pr's code with below which worked! import sys
sys.path.append("../outlines-dev")
import outlines
from enum import Enum
from pydantic import BaseModel, constr
model = outlines.models.exl2(
model_path="turboderp/TinyLlama-1B-32k-exl2",
cache_q4=True,
paged=False
)
prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is just awesome!
"""
generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
print(answer)
prompt = "<s>result of 9 + 9 = 18</s><s>result of 1 + 2 = "
answer = outlines.generate.format(model, int)(prompt, max_tokens=1)
print(answer)
generator = outlines.generate.format(model, float)
answer = generator(prompt, max_tokens=10)
print(answer)
generator = outlines.generate.text(model)
unstructured = generator(prompt, max_tokens=30)
generator = outlines.generate.regex(
model,
r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)",
)
structured = generator(prompt, max_tokens=30)
print(unstructured)
# What is the IP address of the Google DNS servers?
#
# Passive DNS servers are at DNS servers that are private.
# In other words, both IP servers are private. The database
# does not contain Chelsea Manning
print(structured)
class Weapon(str, Enum):
sword = "sword"
axe = "axe"
mace = "mace"
spear = "spear"
bow = "bow"
crossbow = "crossbow"
class Armor(str, Enum):
leather = "leather"
chainmail = "chainmail"
plate = "plate"
class Character(BaseModel):
name: constr(max_length=10)
age: int
armor: Armor
weapon: Weapon
strength: int
# Construct structured sequence generator
generator = outlines.generate.json(model, Character)
# Draw a sample
seed = 789001
character = generator("Give me a character description", seed=seed)
print(repr(character))
# Character(name='Anderson', age=28, armor=<Armor.chainmail: 'chainmail'>, weapon=<Weapon.sword: 'sword'>, strength=8)
character = generator("Give me an interesting character description", seed=seed)
print(repr(character))
# Character(name='Vivian Thr', age=44, armor=<Armor.plate: 'plate'>, weapon=<Weapon.crossbow: 'crossbow'>, strength=125) |
The current main issue is that I can't seem to run the tests due to some error with the pyairports. @lapp0 do you have some advice on how to fix this? pytest -s tests/generate/test_generate.py -k exllamav2
======================== test session starts =========================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0
rootdir: /mnt/d/personal_projects/whiterabbitneo-pentestgpt/outlines-dev
configfile: pyproject.toml
plugins: anyio-3.6.2
collected 0 items / 1 error
=============================== ERRORS ===============================
__________ ERROR collecting tests/generate/test_generate.py __________
tests/generate/test_generate.py:6: in <module>
import outlines.generate as generate
outlines/__init__.py:6: in <module>
import outlines.types
outlines/types/__init__.py:1: in <module>
from . import airports, countries
outlines/types/airports.py:4: in <module>
from pyairports.airports import AIRPORT_LIST
/home/isamu/miniconda3/lib/python3.10/site-packages/pyairports/airports.py:1: in <module>
from pkg_resources import resource_string
/home/isamu/miniconda3/lib/python3.10/site-packages/pkg_resources/__init__.py:3663: in <module>
def _initialize_master_working_set():
/home/isamu/miniconda3/lib/python3.10/site-packages/pkg_resources/__init__.py:3646: in _call_aside
f(*args, **kwargs)
/home/isamu/miniconda3/lib/python3.10/site-packages/pkg_resources/__init__.py:3687: in _initialize_master_working_set
tuple(dist.activate(replace=False) for dist in working_set)
/home/isamu/miniconda3/lib/python3.10/site-packages/pkg_resources/__init__.py:3687: in <genexpr>
tuple(dist.activate(replace=False) for dist in working_set)
/home/isamu/miniconda3/lib/python3.10/site-packages/pkg_resources/__init__.py:3144: in activate
declare_namespace(pkg)
/home/isamu/miniconda3/lib/python3.10/site-packages/pkg_resources/__init__.py:2542: in declare_namespace
warnings.warn(msg, DeprecationWarning, stacklevel=2)
E DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
E Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
====================== short test summary info =======================
ERROR tests/generate/test_generate.py - DeprecationWarning: Deprecated call to `pkg_resources.declare_nam...
!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!
========================= 1 error in 17.91s ========================== |
@isamu-isozaki sorry for the delayed response.
A quick and easy hack is to remove the import and run tests again. |
@lapp0 Thanks for your reply! I think I was a bit confused on how to add models but I'm guessing blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 is already in the test environment? |
Yes it's already in the test environment. If you're interested, after this change, a good upstream change would be allowing model loading via hub uri with
Don't bother. The hardware isn't supported. Please just add exllamav2 to https://github.com/outlines-dev/outlines/blob/main/tests/generate/conftest.py#L23-L33 This will skip the tests on any machine without CUDA. |
@lapp0 Got it and thanks! I think I'm only missing coverage which I'll try making tests for once I get time |
Hi, just want to pop by and see how it is going. Will this feature be released soon? If there is some dev branch i can try it as well. |
Great, please let me know when you're ready for review!
You might be able to get it working with the installation commands below. Please report back with any issues or feedback, it will help with this PR!
|
…exllamav2_filter
…utlines into exllamav2_filter
@remichu-ai Hi! If you had an issue building exllamav2 like me you can just install outlines with my initial commit to this pr and you can use the code examples and it should work. |
@lapp0 hi! Sorry for more qs. I did write some tests to attempt to fill up the exllamav2.py. The coverage is 100% locally for exllamav2.py. But it seems like if the tests are skipped they don't count towards coverage(which is the case for this pipeline). Do you happen to know a simple way to fix this by any chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My example script works with the code. Minor change requests, great work!
@lapp0 Thanks for review! Let me check it out tomorrow |
@lapp0 Thanks for the review. I did all the changes and all my tests passed locally(including pre-commit) (base) outlines-dev$ pytest -s tests/generate/test_integration_exllamav2.py --cov=outlines.models
============================================ test session starts =============================================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0
rootdir: /mnt/d/personal_projects/whiterabbitneo-pentestgpt/outlines-dev
configfile: pyproject.toml
plugins: anyio-3.6.2, cov-5.0.0
collected 19 items
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:31 0:00:00
Loading tokenizer...
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
Loading tokenizer...
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01 0:00:00
Loading tokenizer...
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
Loading tokenizer...
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:01 0:00:00
Loading tokenizer...
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
Loading tokenizer...
.
---------- coverage: platform linux, python 3.10.12-final-0 ----------
Name Stmts Miss Branch BrPart Cover Missing
------------------------------------------------------------------------------------
outlines/models/__init__.py 9 0 0 0 100%
outlines/models/exllamav2.py 140 0 62 0 100%
outlines/models/llamacpp.py 154 110 60 0 21% 27-53, 56-57, 62-73, 76-84, 87-89, 92-94, 98, 107, 142, 146, 160-239, 277-293, 332-355, 358-362, 386-407
outlines/models/mlxlm.py 81 72 30 0 8% 25-27, 38-41, 70-122, 147-196, 230-247
outlines/models/openai.py 176 134 58 0 19% 97-105, 138-155, 158, 183-251, 255, 258, 261, 292-313, 318-322, 349-364, 381-388, 394-415, 420, 429-452, 461-484
outlines/models/tokenizer.py 12 0 0 0 100%
outlines/models/transformers.py 168 140 52 0 13% 28-56, 68-82, 87-90, 93-94, 97-106, 109-116, 119, 122-123, 126, 137-138, 163-184, 192-195, 225-253, 268-297, 309-340, 349-368, 371-381, 415-435, 444-452
outlines/models/transformers_vision.py 38 30 14 0 15% 12-13, 46-63, 73, 109-138
outlines/models/vllm.py 78 66 42 0 10% 24-27, 30-42, 87-149, 159, 164-169, 184-188, 208-226
------------------------------------------------------------------------------------
TOTAL 856 552 318 0 31%
================================== 18 passed, 1 skipped in 72.95s (0:01:12) ==================================
(base) outlines-dev$ pytest -s tests/generate/test_generate.py -k exllamav2
============================================ test session starts =============================================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0
rootdir: /mnt/d/personal_projects/whiterabbitneo-pentestgpt/outlines-dev
configfile: pyproject.toml
plugins: anyio-3.6.2, cov-5.0.0
collected 320 items / 288 deselected / 32 selected
Loading: blockblockblock/TinyLlama-1.1B-Chat-v1.0-bpw4-exl2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:37 0:00:00
Loading tokenizer...
Compiling FSM index for all state transitions: 100%|██████████████████████████| 10/10 [00:00<00:00, 45.03it/s]
Compiling FSM index for all state transitions: 100%|██████████████████████████| 25/25 [00:00<00:00, 95.85it/s]
Compiling FSM index for all state transitions: 100%|██████████████████████████| 21/21 [00:00<00:00, 95.23it/s]
Compiling FSM index for all state transitions: 100%|██████████████████████████| 10/10 [00:00<00:00, 96.69it/s]
Compiling FSM index for all state transitions: 100%|█████████████████████████| 25/25 [00:00<00:00, 139.23it/s]
Compiling FSM index for all state transitions: 100%|██████████████████████████| 21/21 [00:00<00:00, 95.51it/s]
Compiling FSM index for all state transitions: 100%|████████████████████████████| 6/6 [00:00<00:00, 73.53it/s]
Compiling FSM index for all state transitions: 100%|████████████████████████████| 8/8 [00:00<00:00, 92.24it/s]
Compiling FSM index for all state transitions: 100%|██████████████████████████| 10/10 [00:00<00:00, 92.73it/s]
...................
========================== 31 passed, 1 skipped, 288 deselected in 85.01s (0:01:25) ========================== outlines-dev> pre-commit run --all-files
check for merge conflicts................................................Passed
debug statements (python)................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
isort....................................................................Passed
pyupgrade................................................................Passed
flake8...................................................................Passed
black....................................................................Passed
mypy.....................................................................Passed |
Great job @isamu-isozaki ! I've opened the EXL2 PR for logits processors |
@lapp0 awesome! |
This fixes #1009
Also fixes #807
The tests I did were:
For loading:
Choices test:
Returns
Json test
Returns