Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem about code,the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403 #5

Open
sunguoquan1005 opened this issue Jun 9, 2021 · 5 comments

Comments

@sunguoquan1005
Copy link

I can't reproduce the results, I run the code, and the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403 respectively,which are more higher than the results.

@Junaid112
Copy link
Collaborator

Junaid112 commented Jun 15, 2021

Did you take average of all k-folds or these are only for one 80-20 distribution? Average along all validation parts will reduce close to original numbers.

@sunguoquan1005
Copy link
Author

I take all ,but I get that result

@mpalaourg
Copy link

Hello, first thank you for your contribution in Video summarization research and for making your work open-source.

I am trying too to compute the correlation coefficient myself and I am in a weird loop. @sunguoquan1005 the reported result on the paper I'll get it if I first take the mean of the user summaries (to only have 1 user/true summary for each video). Then, compute the coefficients (ρ and τ) for each video, take the mean to compute the ρ and τ for each split and then again the mean of the splits.

Your result I'll take it if I skip the first step of taking the mean of the user summaries (and so I'll have N user/true summaries). Compute the coefficients (ρ and τ) for each true summary and then take the mean to compute ρ and τ for each video) And so on, ...

The weird thing is that the results are too good to be true! Reading the paper that introduced this evaluation protocol the authors talk about how much the F1 value is correlated with the use of knapsack. I am thinking (and I would like your opinion on that) that the coefficients must not be computed on the (binary) user summaries (produced by the knapsack) but rather on the (real) user scores! That means that this evaluation protocol is only applicable on TVSum and not SumMe, something that the original paper validate when the ρ and τ coefficients of TVSum are reported.

Sorry for the late response in the issue, but only now I found that discussion and I would love to see your opinion on that matter.

@xings19
Copy link

xings19 commented Jan 26, 2022

I get that result,too. How should I modify the code in the project to achieve the results in the paper?

@Junaid112
Copy link
Collaborator

Junaid112 commented Jan 27, 2022

I get that result,too. How should I modify the code in the project to achieve the results in the paper?

In this paper, evaluation is followed provided by "Video Summarization with Long Short-term Memory" & TvSum paper. There average was used among users and then we did average for k-folds. Weird thing was for SumMe max was taken among score per user and then we took average for k-folds. Consider this average accumulation I just stated and original scores from TvSum for correlation wit prediction then you will get the results. I do not agree for evaluation criteria where max is taken for SumMe but we can avoid this in next research by doing comparison of both criteria's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants