Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pytorch] Inf1, neuron-cc stuck and memory keep incresing to 100G #978

Open
PigletOS opened this issue Sep 11, 2024 · 0 comments
Open

[Pytorch] Inf1, neuron-cc stuck and memory keep incresing to 100G #978

PigletOS opened this issue Sep 11, 2024 · 0 comments
Labels

Comments

@PigletOS
Copy link

Hi,
I tried to use torch neuron to trace stylegan2 but it got stuck and the memory kept increasing to 100G.
The following graph.zip is a subgraph to reproduce the issue.
graph.zip

neuron-cc version:

neuron-cc -V
Neuron Compiler version 1.23.5.0+1c9806b3e

HWM version 1.17.1.0-fbcd6c853
NEFF version Dynamic
TVM version 1.19.1.0+0
NumPy version 1.23.0
MXNet not available
TF not available

When the neuron-cc get stuck, it keeps showing log like this:

09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288309 has a write instruction that is neither AbstractCopy nor Load: I-457-288309 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288310
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288310 has a write instruction that is neither AbstractCopy nor Load: I-457-288310 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288311
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288311 has a write instruction that is neither AbstractCopy nor Load: I-457-288311 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288312
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288312 has a write instruction that is neither AbstractCopy nor Load: I-457-288312 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288313
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288313 has a write instruction that is neither AbstractCopy nor Load: I-457-288313 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288314
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288314 has a write instruction that is neither AbstractCopy nor Load: I-457-288314 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288315
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288315 has a write instruction that is neither AbstractCopy nor Load: I-457-288315 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288316
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288316 has a write instruction that is neither AbstractCopy nor Load: I-457-288316 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288317
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288317 has a write instruction that is neither AbstractCopy nor Load: I-457-288317 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288318
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288318 has a write instruction that is neither AbstractCopy nor Load: I-457-288318 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288319
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288319 has a write instruction that is neither AbstractCopy nor Load: I-457-288319 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288320
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288320 has a write instruction that is neither AbstractCopy nor Load: I-457-288320 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288321
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288321 has a write instruction that is neither AbstractCopy nor Load: I-457-288321 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288322
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288322 has a write instruction that is neither AbstractCopy nor Load: I-457-288322 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288323
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288323 has a write instruction that is neither AbstractCopy nor Load: I-457-288323 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288324
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288324 has a write instruction that is neither AbstractCopy nor Load: I-457-288324 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288325
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288325 has a write instruction that is neither AbstractCopy nor Load: I-457-288325 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288326
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288326 has a write instruction that is neither AbstractCopy nor Load: I-457-288326 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288327
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288327 has a write instruction that is neither AbstractCopy nor Load: I-457-288327 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288328
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288328 has a write instruction that is neither AbstractCopy nor Load: I-457-288328 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288329
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288329 has a write instruction that is neither AbstractCopy nor Load: I-457-288329 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288330
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288330 has a write instruction that is neither AbstractCopy nor Load: I-457-288330 is a TensorCopy instruction.
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: INFO (ShrinkDN): ANALYZE SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288331
09/11/2024 09:18:21 AM DEBUG [WalrusDriver.0]: DEBUG (ShrinkDN): Can only shrink DNs that're copy nodes. DN SynthesisNetwork_2/SynthesisBlock_1/SynthesisLayer_19/aten_mul_1/mul_t458_i288331 has a write instruction that is neither AbstractCopy nor Load: I-457-288331 is a TensorCopy instruction.

@PigletOS PigletOS changed the title Inf1, neuron-cc stuck and memory keep incresing to 100G [Pytorch] Inf1, neuron-cc stuck and memory keep incresing to 100G Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants