Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CenterPoint Training Error: RuntimeError: (PreconditionNotMet) The meta data must be valid when call the mutable data function. #466

Open
yaobaishen opened this issue May 24, 2024 · 3 comments
Assignees

Comments

@yaobaishen
Copy link

The error is reported from apollo-model-centerpoint, which uses paddle as backend, I found a similar issue here, but looks like not the same root cause: #118

below is my error log:

Traceback (most recent call last):
  File "tools/train.py", line 207, in <module>
    main(args)
  File "tools/train.py", line 202, in main
    trainer.train()
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/apis/trainer.py", line 290, in train
    output = training_step(
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/apis/pipeline.py", line 66, in training_step
    outputs = model(sample)
  File "/home/nsoft/anaconda3/envs/cp_paddle_cu11/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/models/base/base_model.py", line 70, in forward
    return self.train_forward(samples, *args, **kwargs)
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/models/detection/centerpoint/centerpoint.py", line 146, in train_forward
    x = self.extract_feat(data)
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/models/detection/centerpoint/centerpoint.py", line 120, in extract_feat
    voxels, coordinates, num_points_in_voxel = self.voxelizer(
  File "/home/nsoft/anaconda3/envs/cp_paddle_cu11/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/models/voxelizers/voxelize.py", line 75, in forward
    voxels, coors_pad, num_points_per_voxel = self.single_forward(
  File "/home/nsoft/Documents/github_code/apollo-model-centerpoint/paddle3d/models/voxelizers/voxelize.py", line 57, in single_forward
    coors = coors.reshape([1, -1, 3])
  File "/home/nsoft/anaconda3/envs/cp_paddle_cu11/lib/python3.8/site-packages/paddle/tensor/manipulation.py", line 3543, in reshape
    out = _C_ops.reshape(x, shape)
RuntimeError: (PreconditionNotMet) The meta data must be valid when call the mutable data function. (at /paddle/paddle/phi/core/dense_tensor.cc:111)

I think the root cause is, hard_voxelize() returns invalid coors, so the the reshape() operation failed.
But when look into the hard_voxelize() , it actually runs below code (sorry that I don't find the source code location in the github project)

core.eager._run_custom_op(ctx, "hard_voxelize", True)

And I am wondering what's the "_run_custom_op" inside of hard_voxelize() , can anyone give some hints? Great thanks.

@LielinJiang
Copy link
Collaborator

@yaobaishen
Copy link
Author

@LielinJiang thanks! I ignored this piece of code because I am using VS code python debug mode, and looks like add breakpoint in the CC file won't take effect. Anyway, I can dig further now to find the root cause of my training fail, maybe add more logs into the CC files.

@yaobaishen
Copy link
Author

Please keep this issue alive as I am still finding why the coors returned by hard_voxelize() is invalid, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants