You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for providing a valuable contribution to the Computer Vision community. It was interesting to read your paper and also thank you for providing your implementation. I have a few doubts to clarify,
From your paper in Table 7, you show that OOS strategy does not show much improvement for DanceTrack and a significant boost in HOTA for MOT17. Could you provide a reasoning for this? I am trying to understand the impact of the three strategies you provided (OOS, OCM, OCR).
I was able to visually understand the impacts of OCM and OCR by removing the components and evaluating on DanceTrack. I could not understand the same for OOS. I did not visually see any difference when using OOS or not. I understand intuitively (theoretically) its impact. I was wondering do you have example videos where you visually saw the impact of OOS.
What are your thoughts when we use a less powerful detector? I tried using mobilenet+ssd (tflite version) from TFHub and the performance was not as expected. It was also the case for SORT. What are your suggestions in cases where the detector is not very powerful?
The text was updated successfully, but these errors were encountered:
I had some discussion about experiment observation in a previous issue. You can probably get some insight and a deeper explanation there.
For your question:
OOS also shows efficiency on DanceTrack (HOTA=47.8->48.5). But considering OOS is designed to correct the error accumulated by linear trajectory propagation during occlusion when the motion trajectory is too complicated and non-linear, such as on DanceTrack, the corrected trajectory intention can probably still have a variance from the true trajectory. This may be a reason for its relatively less efficiency on Dancetrack compared to MOT17.
I don't keep some ablation visualization videos about OOS at this moment. I will recheck the implementation and make some in the near future after I finish a coming deadline.
I didn't try many detectors. What I tried is (1) detection by FairMOT and HeadHunter on CroHD dataset; (2) detection by PermaTrack on KITTI dataset; (3) detection by YOLOX and TransTrack on MOT17/DanceTrack/MOT20. Through these experiments, OC-SORT was quite efficient. But yes, SORT and OC-SORT are tracking-by-detection algorithms so they highly rely on the performance of detection, especially when you use IoU as the only clue to match detection and prediction and there are heavy occlusion. A heuristic strategy that was helpful in my experience is to use the appearance similarity in the center area of the bounding boxes in the cost function. So it is a customized version of DeepSORT. Another potential help is to set a higher threshold of IoU matching and detection recall. Wish these rough insights can be helpful to you. Please feel free to follow up this discussion.
First of all, thank you for providing a valuable contribution to the Computer Vision community. It was interesting to read your paper and also thank you for providing your implementation. I have a few doubts to clarify,
The text was updated successfully, but these errors were encountered: