Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Great work! a few questions for the sake of reproducibility #6

Open
ericaweng opened this issue Feb 22, 2025 · 11 comments
Open

Great work! a few questions for the sake of reproducibility #6

ericaweng opened this issue Feb 22, 2025 · 11 comments

Comments

@ericaweng
Copy link

-Thanks for the great work and the very thorough documentation. It has made using your code and building off your code very easy! I appreciate it a lot :)
I had a few questions, just to make sure I'm interpreting and reproducing your results accurately.

  • did you try an ablation study on HiVT with no map features (whether bev features or decoded vector map) used at all?
  • I notice the numbers you report in Table 1 of your paper are only the results for testing on the ego agent. did you compare numbers for testing all the agents? I tried this, and noticed that the performance improvement of using your method isn't as great when evaluating the score of all agents, probably because most agents are static. in addition, the map seems to not be as useful when evaluating on all agents. when evaluating your pretrained model without bev features (setting the vit_embed in LocalEncoder.forward to 0 before feeding into the next module), i get 0.254 vs. 0.245 for your method (MapTRv2_CL + Bev). on the other hand, the improvement is much more noticeable when evaluating only on ego agents (0.417 vs 0.369 (in your paper you report 0.365; i got 0.369 when i ran your pretrained model and the code)).
  • how many seeds did you try for each trajectory experiment (each cell in table 1)? Just 1 each, right?
  • are the hyperparameter settings set in the argparse defaults (in hivt.py and train.py) the ones used to produce the reported results? (embed_dim=128 being the only change, as you say in the traj.md doc)
@alfredgu001324
Copy link
Owner

Thanks for the compliments!

  1. Not really, I don't think I have tried this.

  2. Yes the evaluation is done on the ego vehicle only, as per the standard of trajectory prediction papers. Also since DenseTNT is pretty old, it can only make ego-agent predictions (if you want to make multi agent prediction, I guess you need to run the model for every single agent, but I am not sure whether that will work as it is also only trained on ego trajectories iirc). In contrast, HiVT can make predictions for all the agents in the scene.

Actually, may I know if the released checkpoints work fine? I tidied up the code a bit which I think might break the checkpoint reading, but I never get a chance to try loading it.

  1. Yes that is correct.

  2. For the specific hyperparameter settings, you can take a look at the appendix of the paper. I have done some grid search to make the performance optimal for each combination (cannot really find a universal configuration unfortunately. This is actually something what I want to explore a bit more when I finish the paper, to dig into the BEV features a bit more to see why it is helpful, are there any other better ways to encode them etc.)

@alfredgu001324
Copy link
Owner

If you want to use the BEV features to enhance other vehicles' predictions, you probably need to make some changes to the encoding mechanism. Currently it uses the center (where the ego vehicle is located at) as the query (IIRC, not quite remember anymore....), but if you want to use it for enhancements for other vehicle's predictions, you should use their corresponding BEV patches as the query.

@ericaweng
Copy link
Author

ericaweng commented Feb 26, 2025

Thank you for your prompt response! The released checkpoints work fine if i set strict=False in HiVT.load_from_checkpoint. So, it seems like there are some additional unneeded args in the checkpoint, which you later removed from the torch model. Otherwise, the model loads fine, and I am able to get close to the numbers in your paper using the pretrained model.

I was attempting to retrain your HiVT models to reproduce your results, using the pretrained map models to generate bev_features from scratch for both the training and val sets. Then, using the optimal hparams you specified in Appendix A, I was able to train the model to get a reasonable number but still pretty far from your reported results (0.391 vs 0.365 ADE you reported on Maptrv2_CL + bev).

  1. I want to ask did you have another file apart from MapTRv2_modified/tools/test.py generate bev_features for the train samples, since the code provided seems to only generate for val samples? I had to modify the file to generate the train samples: i took the test-time dataset params specified in cfg.data.test (e.g. here) and just swapped out the val ann_files ('nuscenes_map_infos_temporal_val.pkl' and 'nuscenes_map_anns_val.json' for the associated train files (''nuscenes_map_anns_train.json'' and 'nuscenes_map_anns_train.json') to make sure I wasn't adding train-time augmentations to the train set as I was generating the bev features. however, I'm not sure this is what you did.
  2. i would like to download your dataset and try using that, so I can check that the issue is not to do with my bev features generation for the training data. could you please help me verify that the access permissions on the aws bucket are set to public? when I try to read from that bucket (aws s3 sync s3://mapbevprediction/maptrv2_cent_bev .), i am getting fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied. I have aws configured with appropriate credentials.
  3. Just to confirm: you used the provided map config files to train the provided pretrained map models, correct? (e.g. here) In those config files, I see you use only 1 frame per sample (queue_length = 1). I am guessing you tried using more temporal information, but it did not yield better map prediction performance, is that correct?

finally, regarding your response about using the BEV features to enhance other vehicles' predictions: I believe the query is already set to use the local patches corresponding to each individual agent (here! :) )

Thank you so much!

@alfredgu001324
Copy link
Owner

Hi Erica,

Sorry for the late reply and thanks for checking the checkpoint for me. Was a bit busy these couple days.

  1. Not really, and what you did is exactly what I did. I just change the config files to train and evaluate on the training set to generate the BEV features for them.

  2. I think I have set it to public access, but I also just tried editing the bucket policy and set it to public read again? Can you please check again? In addition, I have also just uploaded two combinations to hugging face, which should be easier to use than aws (it is indeed a bit messy..). I recommend downloading from HuggingFace instead.

Also one difference between AWS and HuggingFace version is that I tidied up the dimension of the bev features in huggingface ones. In AWS they are (feature_dim, height, width), in HuggingFace is (height, width, feature_dim) which is more intuitive.

  1. I actually did not modify this part, this is inherited from the original MapTR repo.

Uhhh I see, thanks for checking. Yeah sorry it's been a while.... and I am not familiar with my code anymore :(
Hope this helps!

@alfredgu001324
Copy link
Owner

Uhmm to try retraining it to get the checkpoint, maybe you can use the uploaded dataset and see try retraining from there?

To check whether the BEV features make sense, you can try visualizing them (especially the streammapnet ones, their BEV features look pretty aligned with the road structure) using the vis script.

The reason why this is crucial is that in MapTR lines of work, they just do simply reshaping to bring the BEV features from (height, width, feature_dim) to (height*width, feature_dim) and then do their map decoding. In StreamMapNet, they used a flipped view of the map BEV features and then do the map decoding (which is quite weird tbh, but in their repo issues I remember they say this is to be compatible with a framework used in their approach, maybe detr or something?), so they are more like an image where the (0,0) starts at the top left corner. So in my file here, I need to do some flipping of the BEV features.

@alfredgu001324
Copy link
Owner

Let me know if you have any other questions and I am very happy to help. I think this line of work still has a lot of rooms for improvements but unfortunately I switched areas for my master research and do not have time to work on this area anymore... And hopefully you can build a cleaner codebase out of this haha.

@ericaweng
Copy link
Author

Thank you so much for your response, and for uploading some of the data to the huggingface repo! I was still getting AccessDenied error with aws, but it seems downloading from huggingface works fine. By any chance, are you able to also upload the other datasets (maptrv2_bev, maptrv2_cent_bev) to the huggingface repo as well, as those are the ones I'm interested in?
Thanks so much :)

@alfredgu001324
Copy link
Owner

Of course! I have just uploaded maptrv2_bev as well. maptrv2_cent_bev should also be done tmr. It takes a while to upload lol. Thanks for your patience.

@alfredgu001324
Copy link
Owner

alfredgu001324 commented Mar 4, 2025

I have uploaded them all. Thanks for the support!

@ericaweng
Copy link
Author

ericaweng commented Mar 7, 2025

Thank you for uploading the dataset! I just want to report on my attempt to reproduce your results on HiVT so far. I am using your HiVT_modified code (downloaded a fresh copy from scratch), and your provided huggingface dataset with 15113 train sequences and 4519 val sequences on nuscenes. the only difference between my setup and yours is my environment: because I had difficulty installing the environment according to the HiVT instuctions on my machine (equipped with nvidia A100, cuda 12.5), so I used torch 1.9.1 (instead of torch 1.8.0) but the same version of lightning and torch-geometric. however, I not able to reproduce your results after retraining HiVT from scratch with bev features (maptrv2_cent_bev). I am getting about the same performance as when I trained on my own generated data (using the pretrained mapping models). When using lr 3.5e-4 and weight_decay of 1e-2, which is the best hparams that you report in Appendix A of your paper, I am getting a minADE of 0.395 (in your paper you report 0.365; evaluating your provided pretrained model I am getting 0.369). Using lr 5e-4 and weight_decay of 1e-3, I am getting a little better performance @ 0.393. As this is almost a 10% worse performance than your reported results, I wanted to see where I may be going wrong (and to document reproducibility attempts to the community).

  • when I process the data using the pretrained maptrv2_cent mapping model, I end up with 28130 train samples and 6019 val scenes (this matches the numbers reported by the nuscenes devkit indexing verbose printout, 28130 + 6019 = 34149 samples). After I use your adaptor code (out-of-the-box, with no changes made), I get 15191 train sequences and 4519 val sequences. I have the same number of val sequences as you, but you have 78 more train sequences as me. I just want to confirm, did you make any changes to the indexing within the adaptor files?
  • in your experience, has the pytorch version, seed, float precision used for training, the gpu you used, or the cuda environment affected your performance?
  • did you ever try running your models with a constant seed?

Thank you for your help.

@alfredgu001324
Copy link
Owner

Hey Erica, thanks for letting me know your progress!

  1. Good point, I actually also observed this during uploading the dataset, but I no longer remember what caused this... I think 15113 sounds like a more reasonable number, because maptr_bev and maptrv2_bev also have 15113 training samples. As a reference here is what I have on my AWS bucket:
(base) guxunjia@tisl-ws23-0:~$ aws s3 ls s3://mapbevprediction/maptr_bev/train/data --recursive | wc -l
15113
(base) guxunjia@tisl-ws23-0:~$ aws s3 ls s3://mapbevprediction/maptrv2_bev/train/data --recursive | wc -l
15113
(base) guxunjia@tisl-ws23-0:~$ aws s3 ls s3://mapbevprediction/maptrv2_cent_bev/train/data --recursive | wc -l
15191
(base) guxunjia@tisl-ws23-0:~$ aws s3 ls s3://mapbevprediction/stream_bev/train/data --recursive | wc -l
15069

For StreamMapNet, because they use a custom split for training (to avoid overlapping areas between train and val), they have fewer samples than the MapTR series. But I do not recall what causes the difference between MapTRv2_cent and the others. I suspect one potential cause is that try/except block in my adaptor code. That is a very sketchy way of handling errors because the index files are extracted from TrajData, which I do not have much knowledge of in terms of how they actually process nuScenes into their canonical format. The reason I used TrajData is that they very easily process the 2Hz data in nuScenes into 10Hz. Sometimes I saw occasional errors for index mismatching, and I just decided to skip those. I roughly remember the frames I skipped were not that many (fewer than 100/50 samples or something), which is why back then I thought it wasn't a big issue. Maybe there are just some mismatches caused by the TrajData repo itself and the original nuScenes data. Also, the work I am doing is more of an intra-comparison of each mapping+prediction itself rather than comparing MapTR vs. MapTRv2 vs. StreamMapNet, etc., so I did not spend time ensuring they had the same training samples (this is definitely an improvement that should be made).

One suggestion I have is to maybe look at the frames that are skipped, specifically the places where the continue is triggered, to see what happened to those frames.

As for this question: "Did you make any changes to the indexing within the adaptor files?" I don't think I did. The indexing files I extracted from TrajData are consistent throughout—I only performed this extraction once at the start of the two projects. So, I believe it is more of an issue with the adaptor writing (maybe I changed some code between MapTR and MapTR_cent processing) rather than an issue with the indexing files.

  1. From my experience, these factors can affect performance, but they should not lead to a 10% variation. For example, a small variation like your 0.369 vs. my 0.365 seems reasonable to me due to slight setup differences, but I do not think it would result in a 10% difference.

  2. It is indeed using a constant seed as seen from this line.

Maybe I can also try reproducing the results on my side, but I need a bit of time to set up the environment and code again which might take some time... Thank you so much for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants