Cast inputs from uint8 to float before computing metrics #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses a bug in the evaluation script for TokenBench. Because videos are read in as uint8 arrays, if the ground truth and reconstructions are subtracted from each other prior to casting to float, the result will potentially under or overflow, changing the resulting value.
For a demonstration of this effect, consider the following simple example:
This manifests in TokenBench's PSNR calculation here:
For example, for the DV 4x8x8 model, the original vs. fixed PSNR results for the DAVIS dataset are as follows.
To fix the bug, we cast the input videos to float immediately upon loading. The SSIM and rFVD results do not seem affected by this bug.
We really appreciate the work on TokenBench and believe standardizing the evaluation of these video models is incredibly valuable for the community! We kindly request that the reported values in the benchmark be updated to reflect the fixed results. Thank you for all your hard work!