OOM with reddit dataset #6

xbxiong · 2024-03-28T09:53:35Z

I would like to know about your experimental equipment. I encountered an OOM issue while running the Reddit dataset. Is this normal?

(pyg) xiongxunbin@graph-1:~/SGDD$ python train_SGDD.py --dataset reddit --nlayers=2 --beta 0.1 --r=0.5 --gpu_id=0
WARNING:root:The OGB package is out of date. Your version is 1.3.5, while the latest version is 1.3.6.
Namespace(beta=0.1, dataset='reddit', debug=0, dis_metric='ours', dropout=0.0, ep_ratio=0.5, epochs=2000, gpu_id=0, hidden=256, ignr_epochs=400, inner=0, keep_ratio=1.0, lr_adj=0.0001, lr_feat=0.0001, lr_model=0.01, mode='disabled', nlayers=2, normalize_features=True, one_step=0, opt_scale=1e-10, option=0, outer=20, reduction_rate=0.5, save=1, seed=15, sgc=1, sinkhorn_iter=5, weight_decay=0.0)
adj_syn: (76966, 76966) feat_syn: torch.Size([76966, 602])
  0%|                                                                                                                                 | 0/2001 [03:00<?, ?it/s]
Traceback (most recent call last):
  File "train_SGDD.py", line 79, in <module>
    agent.train()
  File "/home/xiongxunbin/SGDD/SGDD_agent.py", line 218, in train
    adj_syn, opt_loss = IGNR(self.feat_syn, Lx=adj[random_nodes].to_dense()[:, random_nodes])
  File "/home/xiongxunbin/miniconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xiongxunbin/SGDD/models/IGNR.py", line 93, in forward
    c = torch.cat([c[self.edge_index[0]],
RuntimeError: CUDA out of memory. Tried to allocate 44.14 GiB (GPU 0; 79.20 GiB total capacity; 69.65 GiB already allocated; 8.24 GiB free; 69.91 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

Suchun-sv · 2024-04-17T12:30:05Z

Dear xbxiong,

Training on Reddit can indeed present challenges with our method. You might consider decreasing the mx_size to 2000 or 1000 and giving it another try. The issue at hand is likely due to the variability in the number of edges when sampling from Reddit, which could potentially be mitigated by altering the random seed. We're eager to learn about the outcomes of your experiments.

Our computation resource is a cluster of mixed A100 and V100.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM with reddit dataset #6

OOM with reddit dataset #6

xbxiong commented Mar 28, 2024

Suchun-sv commented Apr 17, 2024

OOM with reddit dataset #6

OOM with reddit dataset #6

Comments

xbxiong commented Mar 28, 2024

Suchun-sv commented Apr 17, 2024