OCSORT + ByteTrack? #12

HanGuangXin · 2022-04-10T14:09:05Z

Thanks for the amazing work again!

After replacing the SORT kalman filter in ocsort.py with the JDE kalman filter, I got higher HOTA and faster speed, which may indicates that ocsort with SORT settings can be improved.

So, do you plan to provide a version of ocsort with BYTE?

The text was updated successfully, but these errors were encountered:

noahcao · 2022-04-10T15:11:48Z

Yes. Actually we find similar results. OC-SORT is provided as a new baseline for more advanced study. You can feel free to improve it by integrating new components.

Combining OC-SORT and BYTE should be incremental. I will try to make it once I have the bandwidth. My current high priority is to support mmtracking first. Thank you for your suggestion!

HanGuangXin · 2022-04-11T03:43:33Z

I will run more tests on different settings, and maybe provide my results to discuss.

HanGuangXin · 2022-04-12T09:07:48Z

Here are my results, using pretrained model to run evaluation in MOT17_val_half and DanceTrack_val. For each metric, red > blue > green.

There are observations which does not make me confused:

Performance of ByteTrack and OC-SORT in MOT17 and DanceTrack is not the same. In MOT17, ByteTrack consistently performs better than OC-SORT on different settings. But I think it is reasonable because OC-SORT means to improve performance under occlusion and non-linear motion.
For original ByteTrack and OC-SORT on DanceTrack. ByteTrack has higher MOTA, but OC-SORT has much higher HOTA. I think it is reasonable, too. Because ByteTrack use BYTE to improve the MOTA.

There are observations which does make me confused:

About JDE kalman filter and SORT kalman filter. JDE kalman filter performs better in MOT17, but worse in DanceTrack. Why is that?
Questions about NSA can pass for now.
The BYTE will benifit both MOTA and HOTA in OC-SORT on MOT17, which I think reasonable. But BYTE only benifit MOTA in OC-SORT on DanceTrack, then harm HOTA a lot, which makes me very confused. Why does BYTE even harm HOTA in OC-SORT.

Looking forward to your reply.

noahcao · 2022-04-14T01:13:34Z

Hi @HanGuangXin ,

You have done really a wonderful study! It is quite impressive to me. I have some experience and thoughts from the observations you provide.

Why OC-SORT is inferior to DanceTrack on MOT17 half_val: there are some insights:

MOT17 is a dataset where object (pedestrians) usually move very linearly. So given the linear-motion-based Kalman filter, most of its failure is caused by the missing of "observations" (detections). If you try to make some ablation study over the threshold of IoU in OCR or the threshold of IoU during general association, you might get an impression that the key to boost performance on such a dataset it to "recall more observations!". BYTE is designed to use the low-confidence detections so it may have a good performance on such a situation by recalling more detections.
Here is another key point that, the splitting of MOT17 train_hal/val_hal is a compromise to the limited that. It may not be perfectly reasonable that these two parts come from the same video sequences (the former half and the latter half). So, during the training, the detetcior (or even the tracker for some joint-detection-and-tracking methods) has actually seen the objects in the half_val subset. This makes a consequence that it is very secure to trust the detections predicted on the val_half even if sometimes their confidence score / IoU score is not that high. "Recalling more" strategy can be even more successful given this background.
Given the motion pattern of objects on MOT17 is simple, the overall performance (even if using HOTA) is highly influenced by the detection part. We actually have a study in the DanceTrack paper that given the oracle detections, even the most naive IoU matching can result in nearly perfect tracking performance on MOT17 (HOTA=98.1). Given this bias, MOT17 may encourage methods can focus more on the detection quality, which is supplementary to the first two points above.

Why JDE is inferior to KF on DanceTrack: there are many variants in the implementation of JDE that can influence the results. For example: (1) whether you have considered the influence of OOS from OC_SORT in the comparison? (2) how do you generate the embeddings for JDE, and so on. But there is a potential reason that comes to my mind first: JDE is designed to be able to incorporate with object appearance features. Given the object appearance on MOT17 is usually distinguishable, objects' appearance embedding is usually helpful in association. But the object appearance on DanceTrack is quite similar so appearance embeddings have very high noise in association. I would recommend you to read the original paper of Dancetrack for more details hidden in the dataset characteristics.
Why does BYTE even hurt HOTA on DanceTrack: it is also comes from the nature of the dataset DanceTrack that the detection on DanceTrack is very simple (refer to Table 3 in the DanceTrack paper, all detection-focused metric is much higher on DanceTrack than on MOT17). So the detection confidence of true targets is usually very high. The typical situation that one detection's confidence is very low is when it has high overlap with another object. Therefore, the strategy to bringing more detections by BYTE is likely to introduce more noisy observations than it does on MOT17. If there is one more detection, there is likely one less FN during evaluation, making higher MOTA. But the one more detection can be of large overlap with other targets, making more difficulty for association and higher chance of ID switch. So it may be expected to get lower HOTA, which evaluates the tracking performance in a tracklet-wise level instead of frame-wise level.

I provide some intuitions and experience from my own study above for your question. I hope they can be helpful. Again, the bias of dataset is always important when we consider an algorithm. I highly recommend you to read DanceTrack paper for more details.

To make our discussion helpful to a broad community, let's discuss here instead of via private message platforms.

Mobu59 · 2022-04-14T02:40:18Z

Thanks for the amazing work again!

After replacing the SORT kalman filter in ocsort.py with the JDE kalman filter, I got higher HOTA and faster speed, which may indicates that ocsort with SORT settings can be improved.

So, do you plan to provide a version of ocsort with BYTE?
Hello, I am a novice in the field of MOT, can you tell me what the JDE kalman filter is (means the kalman filter combined with ReID in JDE？), or where can I find relevant information? Thanks in advance！

HanGuangXin · 2022-04-14T04:43:55Z

Thanks a lot for your detailed and enlightening explanation, truly! It deepened my understanding of the algorithms and the task of MOT. I will review the DanceTrack paper more thoroughly.

And about JED kalman filter, it is on me for giving a confusing description.
Kalman filter in SORT has 7 states, [x, y, s, r, \dot{u}, \dot{v}, \dot{s}]. Kalman filter in JDE has 8 states, [x, y, r, h, \dot{x}, \dot{y}, \dot{r}, \dot{h}], adding a state for the velocity of aspect ratio r.
So, when I replace SORT kalman filter with JDE kalman filter, I just replace the states of KF, the covariance of KF and some interface, without incorporating appearance embeddings. I thought JDE kalman filter will performs better than SORT kalman filter in DanceTrack for predicting the additional velocity of aspect ratio, as there are more frequent changes of bbox aspect ratio in DanceTrack.
For the second question of you, it is awkward to say that I didn't find the code for OOS. It would be nice of you to point it out for me.

Finally, It is a luck to have researchers like you to work on MOT and bring us awesome work like OC-SORT.

HanGuangXin · 2022-04-14T04:45:43Z

And I can provide the code using JDE kalman filter in ocsort, if needed.
Maybe there is something I missed out.

noahcao · 2022-04-14T04:58:29Z

Hi @Mobu59 ,

I thought that JDE Kalman FIlter means using the embeddings from the famous JDE model together with a canonical Kalman Filter. But in the following post, @HanGuangXin corrected me that he meant

Kalman filter in SORT has 7 states, [x, y, s, r, \dot{u}, \dot{v}, \dot{s}]. Kalman filter in JDE has 8 states, [x, y, r, h, \dot{x}, \dot{y}, \dot{r}, \dot{h}], adding a state for the velocity of aspect ratio r.

So, I think that is still a canonical Kalman Filter, the only difference from the popular implementation of KF by SORT is that it does not assume the box aspect ratio is constant anymore. But still, I believe in the MOT community, the term JDE is usually referring to the work of Joint Detection and Embedding[1].

[1]: Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. "Towards real-time multi-object tracking". ECCV 2020

noahcao · 2022-04-14T05:42:03Z

Hi @HanGuangXin , thank you for clarifying and providing more details. I always enjoy sharing my idea with the community!

influence of allowing non-fixed aspect ratio:

I don't know if the advantage brought by allowing aspect ratio to change linearly is generalizable or not, even if limited to the situations when objects do not have much body gesture change. But I know it can better handle the case that an object is moving into or out of occlusion where the aspect ratio of the bounding box is changing.

For Dancetrack, objects have aggressive body gesture change. Many dancing movements make the aspect ratio bigger but suddenly smaller. Using a linear assumption for the change of aspect ratio in such cases is the same as using the linear motion model to predict the motion of an object which plunges forward on this side, dashes in on that (chinese "左冲右突"). You are not likely to get reliable prediction in this situation. The incorporation of non-fixed aspect ratio is likely to introduce more noise instead of signal here.

But still, we need more experiment support to get more sense. It would be great if you can show some cases where JDE is right but OC-SORT is wrong on MOT17 and where on the contrary on Dancetrack.
the OOS is realized by the freeze/unfreeze of parameters in my customized Kalman Filter.
You can provide your implementation in a forked repo of your own or make a PR to this repo. That would be a good practice to share your intelligence with the community.

HanGuangXin · 2022-04-14T06:04:21Z

Thanks! You convinced me again. There is a lot things I have to do and to learn.

I will make a fork or PR as soon as possible, after I finish other deadlines :(

noahcao · 2022-04-18T19:08:14Z

I am keeping this issue open as many others may be interested in the combination of OC-SORT and BYTE. I wish the posts here could be helpful to them.

HanGuangXin · 2022-04-26T15:17:01Z

@noahcao Sorry for the delay! I make a PR which combine OC-SORT and BYTE, getting both higher MOTA and HOTA.
Maybe you can check it or merge it?

It is an honor for me to contribute to this repository!

noahcao · 2022-04-27T04:56:46Z

OC-SORT has supported BYTE from PR #19. Thanks @HanGuangXin for the contribution.

abhigoku10 · 2022-06-16T04:22:35Z

@HanGuangXin thanks for the detailed explanation, can you please provide the code for BYTE_OCsort

Umar1998 · 2022-08-29T05:14:16Z

And I can provide the code using JDE kalman filter in ocsort, if needed. Maybe there is something I missed out.

Yes please, can you provide the code implementation of JDE kalman Filter @HanGuangXin

noahcao · 2022-08-29T14:43:13Z

@abhigoku10 Please refer to the code contribution from PR #19

noahcao pinned this issue May 5, 2022

noahcao mentioned this issue Jul 2, 2022

Performance improvement with Online smoothing (OOS) #45

Closed

noahcao closed this as completed Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCSORT + ByteTrack? #12

OCSORT + ByteTrack? #12

HanGuangXin commented Apr 10, 2022

noahcao commented Apr 10, 2022

HanGuangXin commented Apr 11, 2022

HanGuangXin commented Apr 12, 2022 •

edited

Loading

noahcao commented Apr 14, 2022 •

edited

Loading

Mobu59 commented Apr 14, 2022

HanGuangXin commented Apr 14, 2022

HanGuangXin commented Apr 14, 2022

noahcao commented Apr 14, 2022 •

edited

Loading

noahcao commented Apr 14, 2022 •

edited

Loading

HanGuangXin commented Apr 14, 2022

noahcao commented Apr 18, 2022

HanGuangXin commented Apr 26, 2022

noahcao commented Apr 27, 2022

abhigoku10 commented Jun 16, 2022

Umar1998 commented Aug 29, 2022

noahcao commented Aug 29, 2022

OCSORT + ByteTrack? #12

OCSORT + ByteTrack? #12

Comments

HanGuangXin commented Apr 10, 2022

noahcao commented Apr 10, 2022

HanGuangXin commented Apr 11, 2022

HanGuangXin commented Apr 12, 2022 • edited Loading

noahcao commented Apr 14, 2022 • edited Loading

Mobu59 commented Apr 14, 2022

HanGuangXin commented Apr 14, 2022

HanGuangXin commented Apr 14, 2022

noahcao commented Apr 14, 2022 • edited Loading

noahcao commented Apr 14, 2022 • edited Loading

HanGuangXin commented Apr 14, 2022

noahcao commented Apr 18, 2022

HanGuangXin commented Apr 26, 2022

noahcao commented Apr 27, 2022

abhigoku10 commented Jun 16, 2022

Umar1998 commented Aug 29, 2022

noahcao commented Aug 29, 2022

HanGuangXin commented Apr 12, 2022 •

edited

Loading

noahcao commented Apr 14, 2022 •

edited

Loading

noahcao commented Apr 14, 2022 •

edited

Loading

noahcao commented Apr 14, 2022 •

edited

Loading