Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat (llm/learned_round): fast block update #1110

Merged
merged 12 commits into from
Dec 5, 2024

Conversation

Giuseppe5
Copy link
Collaborator

Reason for this PR

Inter-block update in learned round might be super slow for big models

Changes Made in this PR

We assume that blocks are sequential, so the output of each block is the input to the next.
Furthermore, we assume all kwargs don't change (typical in LLM).

We can run 2 block forwards instead of going through the entire model all over twice.

Testing Summary

NA

Risk Highlight

Limitations described above. Flag should be set to False unless the user knows what they're doing. Potentially to improve in the future.

  • This PR includes code from another work (please detail).
  • This PR contains API-breaking changes.
  • This PR depends on work in another PR (please provide links/details).
  • This PR introduces new dependencies (please detail).
  • There are coverage gaps not covered by tests.
  • Documentation updates required in subsequent PR.

Checklist

  • Code comments added to any hard-to-understand areas, if applicable.
  • Changes generate no new warnings.
  • Updated any relevant tests, if applicable.
  • No conflicts with destination dev branch.
  • I reviewed my own code changes.
  • Initial CI/CD passing.
  • 1+ reviews given, and any review issues addressed and approved.
  • Post-review full CI/CD passing.

Copy link
Collaborator

@pablomlago pablomlago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd open an issue to refactor and rely on save_inputs_output as much as possible, to prevent duplicating the block forward code.

@Giuseppe5 Giuseppe5 requested a review from pablomlago December 5, 2024 10:27
@@ -602,26 +603,28 @@ def apply_learned_round(

# Initialize cache to store partial inputs and outputs for each block
cache.initialize_cache()

floating_point_datasets = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

floating_point_datasets is no longer used after the changes, right?

@Giuseppe5 Giuseppe5 merged commit 72b7f66 into Xilinx:dev Dec 5, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants