Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[platform] Refactor current_memory_usage() function in DeviceMemoryProfiler into Platform #11369

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions vllm/platforms/cuda.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,11 @@
cache_config = vllm_config.cache_config
if cache_config and cache_config.block_size is None:
cache_config.block_size = 16

@classmethod
def get_current_memory_usage(cls, device: Optional[torch.types.Device] = None) -> float:

Check failure on line 145 in vllm/platforms/cuda.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/platforms/cuda.py:145:81: E501 Line too long (92 > 80)
torch.cuda.reset_peak_memory_stats(device)
return torch.cuda.max_memory_allocated(device)


# NVML utils
Expand Down
7 changes: 7 additions & 0 deletions vllm/platforms/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,13 @@
"This may slow down the performance.")
return False
return True

@classmethod
def get_current_memory_usage(cls, device: Optional[torch.types.Device] = None) -> float:

Check failure on line 242 in vllm/platforms/interface.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/platforms/interface.py:242:81: E501 Line too long (92 > 80)
"""
Return the memory usage in bytes.
"""
raise NotImplementedError


class UnspecifiedPlatform(Platform):
Expand Down
5 changes: 5 additions & 0 deletions vllm/platforms/xpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,8 @@
def is_pin_memory_available(cls):
logger.warning("Pin memory is not supported on XPU.")
return False

@classmethod
def get_current_memory_usage(cls, device: Optional[torch.types.Device] = None) -> float:

Check failure on line 92 in vllm/platforms/xpu.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/platforms/xpu.py:92:81: E501 Line too long (92 > 80)
torch.xpu.reset_peak_memory_stats(device)
return torch.xpu.max_memory_allocated(device)
14 changes: 2 additions & 12 deletions vllm/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -680,23 +680,13 @@
def __init__(self, device: Optional[torch.types.Device] = None):
self.device = device

def current_memory_usage(self) -> float:
# Return the memory usage in bytes.
if current_platform.is_cuda_alike():
torch.cuda.reset_peak_memory_stats(self.device)
mem = torch.cuda.max_memory_allocated(self.device)
elif current_platform.is_xpu():
torch.xpu.reset_peak_memory_stats(self.device) # type: ignore
mem = torch.xpu.max_memory_allocated(self.device) # type: ignore
return mem

def __enter__(self):
self.initial_memory = self.current_memory_usage()
self.initial_memory = current_platform.get_current_memory_usage(self.device)

Check failure on line 684 in vllm/utils.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/utils.py:684:81: E501 Line too long (84 > 80)
# This allows us to call methods of the context manager if needed
return self

def __exit__(self, exc_type, exc_val, exc_tb):
self.final_memory = self.current_memory_usage()
self.final_memory = current_platform.get_current_memory_usage(self.device)

Check failure on line 689 in vllm/utils.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/utils.py:689:81: E501 Line too long (82 > 80)
self.consumed_memory = self.final_memory - self.initial_memory

# Force garbage collection
Expand Down
Loading