Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to outlines010 #1092

Merged
merged 42 commits into from
Jan 10, 2025

Conversation

davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Jan 9, 2025

Copy link

github-actions bot commented Jan 9, 2025

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1092/

Copy link

codspeed-hq bot commented Jan 9, 2025

CodSpeed Performance Report

Merging #1092 will improve performances by ×4.1

Comparing feat/1081-feature-update-to-outlines010 (399154e) with develop (d9fd15c)

Summary

⚡ 1 improvements

Benchmarks breakdown

Benchmark develop feat/1081-feature-update-to-outlines010 Change
test_cache_time 2,277 ms 550.4 ms ×4.1

Copy link
Contributor

@sdiazlor sdiazlor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as it's the approach I had started for LlamaCpp.

@davidberenstein1957 davidberenstein1957 marked this pull request as ready for review January 9, 2025 17:36
Copy link
Contributor

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good functionally, but I found it difficult to follow. With community maintainability in mind, I think you could localise the logic about outlines versions.

src/distilabel/steps/tasks/structured_outputs/outlines.py Outdated Show resolved Hide resolved
src/distilabel/steps/tasks/structured_outputs/outlines.py Outdated Show resolved Hide resolved
src/distilabel/steps/tasks/structured_outputs/outlines.py Outdated Show resolved Hide resolved
src/distilabel/steps/tasks/structured_outputs/outlines.py Outdated Show resolved Hide resolved
Copy link
Contributor

@plaguss plaguss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left some comments. Have you generated some dataset with the 3 integrations to check it works?

tests/unit/pipeline/.DS_Store Outdated Show resolved Hide resolved
tests/unit/.DS_Store Outdated Show resolved Hide resolved
vllm Outdated Show resolved Hide resolved
src/distilabel/steps/tasks/structured_outputs/outlines.py Outdated Show resolved Hide resolved
…ct; delete unnecessary .DS_Store files from unit tests
- Introduced a helper function to check if the 'outlines' package is installed and its version.
- Updated the logic in `_get_logits_processor` to use the new version check, simplifying the processor selection based on the outlines version.
- Adjusted the handling of tokenizers in `_get_tokenizer_from_model` to streamline the integration with different frameworks.
- Modified `prepare_guided_output` to differentiate processing based on the outlines version, ensuring compatibility with both pre-0.1.0 and post-0.1.0 versions of the outlines package.
- Replaced the `_set_logits_processor` method with direct assignment of `_logits_processor` using `_prepare_structured_output`.
- Simplified the logic for setting the logits processor in both the `load` and generation methods, enhancing code clarity and maintainability.
…sLLM

- Updated the import statement for outlines to use the new helper function `_outlines_version_below_0_1_0`.
- Simplified the logic for setting the `_logits_processor` based on the outlines version check, enhancing code clarity and maintainability.
- Renamed the helper function from `_outlines_version_below_0_1_0` to `_is_outlines_version_below_0_1_0` for clarity.
- Updated all references to the renamed function across the codebase, ensuring consistent usage in the `TransformersLLM` class and related functions.
- Enhanced code readability and maintainability by standardizing function naming conventions.
…on outlines version

- Introduced version check for outlines in both LlamaCppLLM and TransformersLLM to determine processor return type.
- Updated `prepare_guided_output` to handle processor initialization differently for outlines versions below and above 0.1.0.
- Enhanced tokenizer handling in `_get_tokenizer_from_model` to support multiple frameworks, ensuring compatibility and improved functionality.
…ransformersLLM

- Updated return types of `_prepare_structured_output` methods to reflect changes in processor handling.
- Changed return type in LlamaCppLLM from `Union["LogitsProcessorList", None]` to `Union["LogitsProcessorList", "LogitsProcessor"]`.
- Modified MlxLLM and TransformersLLM to return `Union[List[Callable], Callable>` instead of `Union[Callable, None]`, ensuring consistency across implementations.
- Enhanced code clarity and maintainability by standardizing output handling in structured output preparation.
- Added support for the 'mlx' framework in the outlines processing logic.
- Updated the `prepare_guided_output` function to utilize `TransformerTokenizer` for 'mlx' framework.
- Modified the `_get_logits_processor` and `_get_tokenizer_from_model` functions to include 'mlx' as a valid framework option, ensuring consistent handling across different frameworks.
- Improved code clarity and maintainability by standardizing framework handling in the structured output preparation process.
Copy link
Contributor

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- Simplified return types in LlamaCppLLM and MlxLLM by removing version checks and directly returning the processor.
- Enhanced code clarity and maintainability by standardizing the output structure across both classes.
- Updated `prepare_guided_output` usage to ensure consistent handling of structured outputs.
- Removed the `structured_output` attribute and related processing logic from MlxLLM to simplify the class structure.
- Updated the `load` and generation methods to eliminate references to structured output, enhancing clarity and maintainability.
- Adjusted imports and type hints in `outlines.py` to reflect the removal of 'mlx' framework support, streamlining the framework handling.
- Improved code readability by cleaning up unnecessary complexity in structured output preparation.
- Changed the assignment of `_logits_processor` to always use a list, ensuring consistent handling across different outlines versions.
- Removed the version check for outlines in the `load` method, simplifying the logic and enhancing maintainability.
- Updated the return type in the structured output preparation to directly return the processor, improving code clarity.
- Updated type hints for the `llm` parameter in `_get_tokenizer_from_model` and `prepare_guided_output` functions to use `_vLLM` instead of `LLM`, enhancing code readability.
- Adjusted imports to reflect the new alias for `LLM`, streamlining the code structure.
- Updated type hint imports to include `# noqa` comments, enhancing code readability and maintaining consistency with type checking.
- No functional changes were made; this commit focuses on code structure and clarity.
- Updated the return statement in the `prepare_guided_output` function to use `model or tokenizer` instead of `llm`, improving clarity and consistency in processor assignment.
- This change enhances the function's flexibility in handling different input types while maintaining existing functionality.
- Removed the upper version limit for the `transformers` package, allowing for updates beyond version 4.47.0.
@davidberenstein1957 davidberenstein1957 merged commit 9506930 into develop Jan 10, 2025
6 of 7 checks passed
@davidberenstein1957 davidberenstein1957 deleted the feat/1081-feature-update-to-outlines010 branch January 10, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants