-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computing ADE/FDE when compared with other methods #47
Comments
Hi @pedro-mgb Thanks for this. First, social-gan can not use our evaluation method because of the way it generates the results. Social-gan is a generative model in which the generated samples are correlated, thus judging it as best scene is suitable. In our case, we generate a distribution parameters, then we sample from these. The CV might be valid to these datasets (not fully aware of it) because the datasets are old and not complex enough. I'd prefer for any upcoming work to use https://www.aicrowd.com/challenges/trajnet-a-trajectory-forecasting-challenge which is rich enough with more complex situations and better annotations. I think this answers your first bullet point. Let me know if you have more questions |
Thank you for the response. I agree with what you said. Multimodal CV may have the "best" prediction among 20 samples, but if we look at the errors from the other predictions (e.g. on average, the top-X samples), or use a NLL loss, like it is discussed on that article -> We will see that CV looks much worse than your model or social gan or an LSTM. Regarding Trajnet++, I think it's a step in the right direction to having some form of standard. But I believe the trajectory forecasting problem using data-driven models is still just taking its first steps. I don't really have any other questions. Thank you, once gain. |
This issue has been present in the past (#14 #27 #30), but I felt like it would be best to create another issue rather than commenting on closed ones.
I did some changes on the social GAN code, to compute the ADE and FDE metrics in the same way Social-STGCNN does (see this issue on sgan repo) - Picking the smallest error among all the samples per trajectory, instead of the overall smallest error for the entire scene/sequence.
I leave below a table comparing Social-STGCNN (results from the paper) with SGAN-P-20 (as in the paper), and also, a simpler baseline - a 'multimodal' constant velocity. I can explain it in more detail if you want, but basically the constant velocity model outputs 20 samples of trajectories with constant velocity, where for each sample the module of the velocity is weighted using a normal distribution based on the velocities of the observed trajectory.
According to this, not only does SGAN-P outperform Social-STGCNN, but a multi-modal constant velocity seems to outperform both. This was also touched on another issue in sgan repository - originating from the paper What the Constant Velocity Model Can Teach Us About Pedestrian Motion Prediction (https://arxiv.org/abs/1903.079339). Although the multimodal constant velocity they employ is different than mine, it also outperforms Social GAN.
I'd like to get someone's opinion on this matter, because as of right now a multi modal version of constant velocity is achieving competite results with the state-of-the-art. This leads to many questions, many of which have been discussed, but I fear no consensus has been reached. I'll leave a few here:
Thank you for reading this. Have a good day!
The text was updated successfully, but these errors were encountered: