-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory estimate fixes #720
Conversation
magdyksaleh
commented
Dec 16, 2024
•
edited
Loading
edited
- Update mem wiggle room to 0.9 (from 0.8)
- Ensure free_memory is never negative
- Lazy load graph for larger compile batch sizes
- Add new argument compile_batch_size (default 32) which is the batch size we will initially use for cuda compilation of models
- make the precommit use ruff instead of flake8
or not graph.input_state.traced_adapter_layer_names.issuperset(adapter_data.layer_names()) | ||
# This is the case where COMPILE_BATCH_SIZE < batch_size <= MAX_BATCH_SIZE so | ||
# we just retrace the graph for that new size | ||
or batch_size > self.batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lorax/server/lorax_server/utils/graph.py
Lines 478 to 498 in 12e530a
def can_use_graph( | |
self, | |
batch: "FlashCausalLMBatch", | |
adapter_data: AdapterBatchData, | |
) -> bool: | |
ranks = adapter_data.ranks() | |
nranks = len(ranks) | |
max_rank = max(ranks) if len(ranks) > 0 else 0 | |
batch_size = batch.input_ids.shape[0] | |
max_s = batch.max_current_length | |
# TODO(travis): allow using CUDA graphs with multi-rank batches | |
return ( | |
torch.cuda.is_available() | |
and batch_size <= MAX_BATCH_SIZE | |
and max_s <= self.max_total_tokens | |
and max_rank <= MAX_RANK | |
and nranks <= 1 | |
and max_rank in _allowed_ranks | |
) |
This ensures that we stay within the range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we need changes to warmup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!