You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to know about your experimental equipment. I encountered an OOM issue while running the Reddit dataset. Is this normal?
(pyg) xiongxunbin@graph-1:~/SGDD$ python train_SGDD.py --dataset reddit --nlayers=2 --beta 0.1 --r=0.5 --gpu_id=0
WARNING:root:The OGB package is out of date. Your version is 1.3.5, while the latest version is 1.3.6.
Namespace(beta=0.1, dataset='reddit', debug=0, dis_metric='ours', dropout=0.0, ep_ratio=0.5, epochs=2000, gpu_id=0, hidden=256, ignr_epochs=400, inner=0, keep_ratio=1.0, lr_adj=0.0001, lr_feat=0.0001, lr_model=0.01, mode='disabled', nlayers=2, normalize_features=True, one_step=0, opt_scale=1e-10, option=0, outer=20, reduction_rate=0.5, save=1, seed=15, sgc=1, sinkhorn_iter=5, weight_decay=0.0)
adj_syn: (76966, 76966) feat_syn: torch.Size([76966, 602])
0%| | 0/2001 [03:00<?, ?it/s]
Traceback (most recent call last):
File "train_SGDD.py", line 79, in <module>
agent.train()
File "/home/xiongxunbin/SGDD/SGDD_agent.py", line 218, in train
adj_syn, opt_loss = IGNR(self.feat_syn, Lx=adj[random_nodes].to_dense()[:, random_nodes])
File "/home/xiongxunbin/miniconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xiongxunbin/SGDD/models/IGNR.py", line 93, in forward
c = torch.cat([c[self.edge_index[0]],
RuntimeError: CUDA out of memory. Tried to allocate 44.14 GiB (GPU 0; 79.20 GiB total capacity; 69.65 GiB already allocated; 8.24 GiB free; 69.91 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered:
Training on Reddit can indeed present challenges with our method. You might consider decreasing the mx_size to 2000 or 1000 and giving it another try. The issue at hand is likely due to the variability in the number of edges when sampling from Reddit, which could potentially be mitigated by altering the random seed. We're eager to learn about the outcomes of your experiments.
Our computation resource is a cluster of mixed A100 and V100.
I would like to know about your experimental equipment. I encountered an OOM issue while running the Reddit dataset. Is this normal?
The text was updated successfully, but these errors were encountered: