Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing ADE/FDE when compared with other methods #47

Closed
pedro-mgb opened this issue Apr 10, 2021 · 2 comments
Closed

Computing ADE/FDE when compared with other methods #47

pedro-mgb opened this issue Apr 10, 2021 · 2 comments

Comments

@pedro-mgb
Copy link

pedro-mgb commented Apr 10, 2021

This issue has been present in the past (#14 #27 #30), but I felt like it would be best to create another issue rather than commenting on closed ones.

I did some changes on the social GAN code, to compute the ADE and FDE metrics in the same way Social-STGCNN does (see this issue on sgan repo) - Picking the smallest error among all the samples per trajectory, instead of the overall smallest error for the entire scene/sequence.

I leave below a table comparing Social-STGCNN (results from the paper) with SGAN-P-20 (as in the paper), and also, a simpler baseline - a 'multimodal' constant velocity. I can explain it in more detail if you want, but basically the constant velocity model outputs 20 samples of trajectories with constant velocity, where for each sample the module of the velocity is weighted using a normal distribution based on the velocities of the observed trajectory.

Model ETH HOTEL UNIV ZARA1 ZARA2 AVG
Const vel 0.46 / 0.70 0.14 / 0.23 0.31 / 0.59 0.28 / 0.54 0.20 / 0.40 0.28 / 0.49
SGAN-P 0.59 / 0.92 0.34 / 0.66 0.33 / 0.60 0.23 / 0.42 0.22 / 0.39 0.34 / 0.60
Social-STGCNN 0.64 / 1.11 0.49 / 0.85 0.44 / 0.79 0.34 / 0.53 0.30 / 0.48 0.44 / 0.75

According to this, not only does SGAN-P outperform Social-STGCNN, but a multi-modal constant velocity seems to outperform both. This was also touched on another issue in sgan repository - originating from the paper What the Constant Velocity Model Can Teach Us About Pedestrian Motion Prediction (https://arxiv.org/abs/1903.079339). Although the multimodal constant velocity they employ is different than mine, it also outperforms Social GAN.

I'd like to get someone's opinion on this matter, because as of right now a multi modal version of constant velocity is achieving competite results with the state-of-the-art. This leads to many questions, many of which have been discussed, but I fear no consensus has been reached. I'll leave a few here:

  • Are the datasets in which these models are based representitive of the huge complexity of human motion and human interactions?
  • Are the models actually learning meaningful information about interactions between humans, or is it just "making things worse"?
  • Is this evaluation process enough to compare the different models? For instance some models and benchmarks have been using metrics that take into account collisions between pedestrians. I assume (or hope) that the social models will have better performance in such metrics than the constant velocity method, but I have not done enough experiments in that regard.

Thank you for reading this. Have a good day!

@abduallahmohamed
Copy link
Owner

Hi @pedro-mgb

Thanks for this.

First, social-gan can not use our evaluation method because of the way it generates the results. Social-gan is a generative model in which the generated samples are correlated, thus judging it as best scene is suitable. In our case, we generate a distribution parameters, then we sample from these. The CV might be valid to these datasets (not fully aware of it) because the datasets are old and not complex enough. I'd prefer for any upcoming work to use https://www.aicrowd.com/challenges/trajnet-a-trajectory-forecasting-challenge which is rich enough with more complex situations and better annotations. I think this answers your first bullet point.
For the second point, I think the only way to evaluate this is by qualitative analysis. Also, if you want to use these models in a real-life applications you will need lot of conditions around it. I don't believe it makes things worse, all of them are approaches to a complex problems with each method has it is own shortcomings.
For the third point, The best of N metrics (FDE -20 , ADE -20) are not suitable to judge the performance. Why 20? ...etc?
This article http://ai.stanford.edu/blog/trajectory-forecasting/ discuss this point extensively.

Let me know if you have more questions

@pedro-mgb
Copy link
Author

Thank you for the response. I agree with what you said.

Multimodal CV may have the "best" prediction among 20 samples, but if we look at the errors from the other predictions (e.g. on average, the top-X samples), or use a NLL loss, like it is discussed on that article -> We will see that CV looks much worse than your model or social gan or an LSTM.

Regarding Trajnet++, I think it's a step in the right direction to having some form of standard. But I believe the trajectory forecasting problem using data-driven models is still just taking its first steps.

I don't really have any other questions. Thank you, once gain.
Feel free to close the issue, if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants