Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCSORT + ByteTrack? #12

Closed
HanGuangXin opened this issue Apr 10, 2022 · 16 comments
Closed

OCSORT + ByteTrack? #12

HanGuangXin opened this issue Apr 10, 2022 · 16 comments

Comments

@HanGuangXin
Copy link
Contributor

Thanks for the amazing work again!

After replacing the SORT kalman filter in ocsort.py with the JDE kalman filter, I got higher HOTA and faster speed, which may indicates that ocsort with SORT settings can be improved.

So, do you plan to provide a version of ocsort with BYTE?

@noahcao
Copy link
Owner

noahcao commented Apr 10, 2022

Yes. Actually we find similar results. OC-SORT is provided as a new baseline for more advanced study. You can feel free to improve it by integrating new components.

Combining OC-SORT and BYTE should be incremental. I will try to make it once I have the bandwidth. My current high priority is to support mmtracking first. Thank you for your suggestion!

@HanGuangXin
Copy link
Contributor Author

I will run more tests on different settings, and maybe provide my results to discuss.

@HanGuangXin
Copy link
Contributor Author

HanGuangXin commented Apr 12, 2022

ocsort

Here are my results, using pretrained model to run evaluation in MOT17_val_half and DanceTrack_val. For each metric, red > blue > green.

There are observations which does not make me confused:

  1. Performance of ByteTrack and OC-SORT in MOT17 and DanceTrack is not the same. In MOT17, ByteTrack consistently performs better than OC-SORT on different settings. But I think it is reasonable because OC-SORT means to improve performance under occlusion and non-linear motion.
  2. For original ByteTrack and OC-SORT on DanceTrack. ByteTrack has higher MOTA, but OC-SORT has much higher HOTA. I think it is reasonable, too. Because ByteTrack use BYTE to improve the MOTA.

There are observations which does make me confused:

  1. About JDE kalman filter and SORT kalman filter. JDE kalman filter performs better in MOT17, but worse in DanceTrack. Why is that?
  2. Questions about NSA can pass for now.
  3. The BYTE will benifit both MOTA and HOTA in OC-SORT on MOT17, which I think reasonable. But BYTE only benifit MOTA in OC-SORT on DanceTrack, then harm HOTA a lot, which makes me very confused. Why does BYTE even harm HOTA in OC-SORT.

Looking forward to your reply.

@noahcao
Copy link
Owner

noahcao commented Apr 14, 2022

Hi @HanGuangXin ,

You have done really a wonderful study! It is quite impressive to me. I have some experience and thoughts from the observations you provide.

  1. Why OC-SORT is inferior to DanceTrack on MOT17 half_val: there are some insights:
  • MOT17 is a dataset where object (pedestrians) usually move very linearly. So given the linear-motion-based Kalman filter, most of its failure is caused by the missing of "observations" (detections). If you try to make some ablation study over the threshold of IoU in OCR or the threshold of IoU during general association, you might get an impression that the key to boost performance on such a dataset it to "recall more observations!". BYTE is designed to use the low-confidence detections so it may have a good performance on such a situation by recalling more detections.
  • Here is another key point that, the splitting of MOT17 train_hal/val_hal is a compromise to the limited that. It may not be perfectly reasonable that these two parts come from the same video sequences (the former half and the latter half). So, during the training, the detetcior (or even the tracker for some joint-detection-and-tracking methods) has actually seen the objects in the half_val subset. This makes a consequence that it is very secure to trust the detections predicted on the val_half even if sometimes their confidence score / IoU score is not that high. "Recalling more" strategy can be even more successful given this background.
  • Given the motion pattern of objects on MOT17 is simple, the overall performance (even if using HOTA) is highly influenced by the detection part. We actually have a study in the DanceTrack paper that given the oracle detections, even the most naive IoU matching can result in nearly perfect tracking performance on MOT17 (HOTA=98.1). Given this bias, MOT17 may encourage methods can focus more on the detection quality, which is supplementary to the first two points above.
  1. Why JDE is inferior to KF on DanceTrack: there are many variants in the implementation of JDE that can influence the results. For example: (1) whether you have considered the influence of OOS from OC_SORT in the comparison? (2) how do you generate the embeddings for JDE, and so on. But there is a potential reason that comes to my mind first: JDE is designed to be able to incorporate with object appearance features. Given the object appearance on MOT17 is usually distinguishable, objects' appearance embedding is usually helpful in association. But the object appearance on DanceTrack is quite similar so appearance embeddings have very high noise in association. I would recommend you to read the original paper of Dancetrack for more details hidden in the dataset characteristics.
  2. Why does BYTE even hurt HOTA on DanceTrack: it is also comes from the nature of the dataset DanceTrack that the detection on DanceTrack is very simple (refer to Table 3 in the DanceTrack paper, all detection-focused metric is much higher on DanceTrack than on MOT17). So the detection confidence of true targets is usually very high. The typical situation that one detection's confidence is very low is when it has high overlap with another object. Therefore, the strategy to bringing more detections by BYTE is likely to introduce more noisy observations than it does on MOT17. If there is one more detection, there is likely one less FN during evaluation, making higher MOTA. But the one more detection can be of large overlap with other targets, making more difficulty for association and higher chance of ID switch. So it may be expected to get lower HOTA, which evaluates the tracking performance in a tracklet-wise level instead of frame-wise level.

I provide some intuitions and experience from my own study above for your question. I hope they can be helpful. Again, the bias of dataset is always important when we consider an algorithm. I highly recommend you to read DanceTrack paper for more details.

To make our discussion helpful to a broad community, let's discuss here instead of via private message platforms.

@Mobu59
Copy link

Mobu59 commented Apr 14, 2022

Thanks for the amazing work again!

After replacing the SORT kalman filter in ocsort.py with the JDE kalman filter, I got higher HOTA and faster speed, which may indicates that ocsort with SORT settings can be improved.

So, do you plan to provide a version of ocsort with BYTE?
Hello, I am a novice in the field of MOT, can you tell me what the JDE kalman filter is (means the kalman filter combined with ReID in JDE?), or where can I find relevant information? Thanks in advance!

@HanGuangXin
Copy link
Contributor Author

Thanks a lot for your detailed and enlightening explanation, truly! It deepened my understanding of the algorithms and the task of MOT. I will review the DanceTrack paper more thoroughly.

And about JED kalman filter, it is on me for giving a confusing description.
Kalman filter in SORT has 7 states, [x, y, s, r, \dot{u}, \dot{v}, \dot{s}]. Kalman filter in JDE has 8 states, [x, y, r, h, \dot{x}, \dot{y}, \dot{r}, \dot{h}], adding a state for the velocity of aspect ratio r.
So, when I replace SORT kalman filter with JDE kalman filter, I just replace the states of KF, the covariance of KF and some interface, without incorporating appearance embeddings. I thought JDE kalman filter will performs better than SORT kalman filter in DanceTrack for predicting the additional velocity of aspect ratio, as there are more frequent changes of bbox aspect ratio in DanceTrack.
For the second question of you, it is awkward to say that I didn't find the code for OOS. It would be nice of you to point it out for me.

Finally, It is a luck to have researchers like you to work on MOT and bring us awesome work like OC-SORT.

@HanGuangXin
Copy link
Contributor Author

And I can provide the code using JDE kalman filter in ocsort, if needed.
Maybe there is something I missed out.

@noahcao
Copy link
Owner

noahcao commented Apr 14, 2022

Hi @Mobu59 ,

I thought that JDE Kalman FIlter means using the embeddings from the famous JDE model together with a canonical Kalman Filter. But in the following post, @HanGuangXin corrected me that he meant

Kalman filter in SORT has 7 states, [x, y, s, r, \dot{u}, \dot{v}, \dot{s}]. Kalman filter in JDE has 8 states, [x, y, r, h, \dot{x}, \dot{y}, \dot{r}, \dot{h}], adding a state for the velocity of aspect ratio r.

So, I think that is still a canonical Kalman Filter, the only difference from the popular implementation of KF by SORT is that it does not assume the box aspect ratio is constant anymore. But still, I believe in the MOT community, the term JDE is usually referring to the work of Joint Detection and Embedding[1].

[1]: Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. "Towards real-time multi-object tracking". ECCV 2020

@noahcao
Copy link
Owner

noahcao commented Apr 14, 2022

Hi @HanGuangXin , thank you for clarifying and providing more details. I always enjoy sharing my idea with the community!

  1. influence of allowing non-fixed aspect ratio:

    I don't know if the advantage brought by allowing aspect ratio to change linearly is generalizable or not, even if limited to the situations when objects do not have much body gesture change. But I know it can better handle the case that an object is moving into or out of occlusion where the aspect ratio of the bounding box is changing.

    For Dancetrack, objects have aggressive body gesture change. Many dancing movements make the aspect ratio bigger but suddenly smaller. Using a linear assumption for the change of aspect ratio in such cases is the same as using the linear motion model to predict the motion of an object which plunges forward on this side, dashes in on that (chinese "左冲右突"). You are not likely to get reliable prediction in this situation. The incorporation of non-fixed aspect ratio is likely to introduce more noise instead of signal here.

    But still, we need more experiment support to get more sense. It would be great if you can show some cases where JDE is right but OC-SORT is wrong on MOT17 and where on the contrary on Dancetrack.

  2. the OOS is realized by the freeze/unfreeze of parameters in my customized Kalman Filter.

  3. You can provide your implementation in a forked repo of your own or make a PR to this repo. That would be a good practice to share your intelligence with the community.

@HanGuangXin
Copy link
Contributor Author

Thanks! You convinced me again. There is a lot things I have to do and to learn.

I will make a fork or PR as soon as possible, after I finish other deadlines :(

@noahcao
Copy link
Owner

noahcao commented Apr 18, 2022

I am keeping this issue open as many others may be interested in the combination of OC-SORT and BYTE. I wish the posts here could be helpful to them.

@HanGuangXin
Copy link
Contributor Author

@noahcao Sorry for the delay! I make a PR which combine OC-SORT and BYTE, getting both higher MOTA and HOTA.
Maybe you can check it or merge it?

It is an honor for me to contribute to this repository!

@noahcao
Copy link
Owner

noahcao commented Apr 27, 2022

OC-SORT has supported BYTE from PR #19. Thanks @HanGuangXin for the contribution.

@noahcao noahcao pinned this issue May 5, 2022
@abhigoku10
Copy link

@HanGuangXin thanks for the detailed explanation, can you please provide the code for BYTE_OCsort

@Umar1998
Copy link

And I can provide the code using JDE kalman filter in ocsort, if needed. Maybe there is something I missed out.

Yes please, can you provide the code implementation of JDE kalman Filter @HanGuangXin

@noahcao
Copy link
Owner

noahcao commented Aug 29, 2022

@abhigoku10 Please refer to the code contribution from PR #19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants