Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory estimate fixes #720

Merged
merged 4 commits into from
Dec 17, 2024
Merged

Memory estimate fixes #720

merged 4 commits into from
Dec 17, 2024

Conversation

magdyksaleh
Copy link
Collaborator

@magdyksaleh magdyksaleh commented Dec 16, 2024

  • Update mem wiggle room to 0.9 (from 0.8)
  • Ensure free_memory is never negative
  • Lazy load graph for larger compile batch sizes
  • Add new argument compile_batch_size (default 32) which is the batch size we will initially use for cuda compilation of models
  • make the precommit use ruff instead of flake8

@magdyksaleh magdyksaleh marked this pull request as ready for review December 16, 2024 22:02
or not graph.input_state.traced_adapter_layer_names.issuperset(adapter_data.layer_names())
# This is the case where COMPILE_BATCH_SIZE < batch_size <= MAX_BATCH_SIZE so
# we just retrace the graph for that new size
or batch_size > self.batch_size
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def can_use_graph(
self,
batch: "FlashCausalLMBatch",
adapter_data: AdapterBatchData,
) -> bool:
ranks = adapter_data.ranks()
nranks = len(ranks)
max_rank = max(ranks) if len(ranks) > 0 else 0
batch_size = batch.input_ids.shape[0]
max_s = batch.max_current_length
# TODO(travis): allow using CUDA graphs with multi-rank batches
return (
torch.cuda.is_available()
and batch_size <= MAX_BATCH_SIZE
and max_s <= self.max_total_tokens
and max_rank <= MAX_RANK
and nranks <= 1
and max_rank in _allowed_ranks
)

This ensures that we stay within the range

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we need changes to warmup.

@magdyksaleh magdyksaleh requested a review from tgaddair December 16, 2024 22:36
Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@magdyksaleh magdyksaleh merged commit 6dfb215 into main Dec 17, 2024
3 checks passed
@magdyksaleh magdyksaleh deleted the memory-estimate-fixes branch December 17, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants