Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No memory reduction observed in a simple sparse-dense multiplication #15

Open
x-zho14 opened this issue Mar 26, 2021 · 0 comments
Open

Comments

@x-zho14
Copy link

x-zho14 commented Mar 26, 2021

Hi, I experiment with the following codes:

import torch
from pytorch_block_sparse import BlockSparseLinear
import time
import sys
iter = int(sys.argv[1])
dsty = float(sys.argv[2])

fc = BlockSparseLinear(1024, 256, density=dsty)
fc_dense = torch.nn.Linear(1024, 256).cuda()
input = torch.ones(3, 1024).cuda()

i = 0
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
t1 = time.time()

while(i < iter):
    output = fc(input)
    i += 1
end.record()
t2 = time.time()

torch.cuda.synchronize()
print("cpu time:", t2-t1)
print(start.elapsed_time(end))
print(torch.cuda.memory_summary())

i = 0
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
t1 = time.time()

while(i < iter):
    output = fc_dense(input)
    i += 1

end.record()
t2 = time.time()
torch.cuda.synchronize()
print("cpu time:", t2-t1)
print(start.elapsed_time(end))
print(torch.cuda.memory_summary())

And I find that the running time is decreased when iteration is small, while the memory consumption is not decreased.
sparse:

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    1248 KB |    1254 KB |    7280 KB |    6032 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |    1248 KB |    1254 KB |    7280 KB |    6032 KB |
|---------------------------------------------------------------------------|
| Active memory         |    1248 KB |    1254 KB |    7280 KB |    6032 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |    1248 KB |    1254 KB |    7280 KB |    6032 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 B  |
|       from small pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |     800 KB |    2047 KB |    8080 KB |    7280 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |     800 KB |    2047 KB |    8080 KB |    7280 KB |
|---------------------------------------------------------------------------|
| Allocations           |      12    |      15    |    2066    |    2054    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      12    |      15    |    2066    |    2054    |
|---------------------------------------------------------------------------|
| Active allocs         |      12    |      15    |    2066    |    2054    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      12    |      15    |    2066    |    2054    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       1    |       1    |       1    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       1    |       1    |       1    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       5    |       5    |    1033    |    1028    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       5    |       5    |    1033    |    1028    |
|===========================================================================|

dense:

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    1248 KB |    1251 KB |    4280 KB |    3032 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |    1248 KB |    1251 KB |    4280 KB |    3032 KB |
|---------------------------------------------------------------------------|
| Active memory         |    1248 KB |    1251 KB |    4280 KB |    3032 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |    1248 KB |    1251 KB |    4280 KB |    3032 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 B  |
|       from small pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |     800 KB |    2047 KB |    5080 KB |    4280 KB |
|       from large pool |       0 KB |       0 KB |       0 KB |       0 KB |
|       from small pool |     800 KB |    2047 KB |    5080 KB |    4280 KB |
|---------------------------------------------------------------------------|
| Allocations           |      12    |      15    |    1066    |    1054    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      12    |      15    |    1066    |    1054    |
|---------------------------------------------------------------------------|
| Active allocs         |      12    |      15    |    1066    |    1054    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |      12    |      15    |    1066    |    1054    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       1    |       1    |       1    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       1    |       1    |       1    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       5    |       5    |     533    |     528    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       5    |       5    |     533    |     528    |
|===========================================================================|

Could you please help with finding the problem? Actually the total alloc memory is even higher. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant