Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
willi-menapace committed Feb 22, 2024
1 parent 8abbcd0 commit 6d9d747
Show file tree
Hide file tree
Showing 11 changed files with 54 additions and 54 deletions.
10 changes: 5 additions & 5 deletions gen2_pikalab_floor33.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -81,9 +81,9 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<h2 class="pt-4">Comparison to Gen-2, Floor33 and PikaLab</h2>

<p class="lead">We compare results produced by Snap Video against the publicly accessible Gen-2, Floor33 and PikaLab video generators on a selection of 65 prompts from the Evalcrafter benchmark eliciting dynamic scenes.</p>
<p class="lead text-justify">We compare results produced by Snap Video against the publicly accessible Gen-2, Floor33 and PikaLab video generators on a selection of 65 prompts from the Evalcrafter benchmark eliciting dynamic scenes.</p>

<p class="lead">When evaluated on a user study, our method shows increased photorealism with respect to PikaLab and Floor33, has significantly better video-text alignment and outperforms the baselines on all motion metrics. Results are expressed in percentage of votes in favor of our method.</p>
<p class="lead text-justify">When evaluated on a user study, our method shows increased photorealism with respect to PikaLab and Floor33, has significantly better video-text alignment and outperforms the baselines on all motion metrics. Results are expressed in percentage of votes in favor of our method.</p>

<table id="example" class="table table-striped mt-5" style="width:100%">
<thead>
Expand Down Expand Up @@ -121,7 +121,7 @@ <h2 class="pt-4">Comparison to Gen-2, Floor33 and PikaLab</h2>
</table>


<p class="lead pt-2">Hover the cursor on the video to reveal the prompt. Note that a selection of prompts issued to Gen-2 resulted in input prompt filtering issues and no output was generated for them.</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt. Note that a selection of prompts issued to Gen-2 resulted in input prompt filtering issues and no output was generated for them.</p>



Expand Down Expand Up @@ -4229,7 +4229,7 @@ <h4>PikaLab</h4>
<!-- Container -->

<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples_hierarchical.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="imagen_video.html" role="button">Next</a>
</div>
Expand Down
10 changes: 5 additions & 5 deletions imagen_video.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -81,7 +81,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<h2 class="pt-4">Comparison to Imagen Video</h2>

<p class="lead">We compare Snap Video against publicly available samples released by the authors and perform a user study evaluating photorealism, video-text alignment, and motion quantity and quality. While the public samples may have been chosen to showcase the method's strengths, our method features improved photorealism, video-text alignment and quality of motion. Results ate expressed in percentage of votes in favor of our method.</p>
<p class="lead text-justify">We compare Snap Video against publicly available samples released by the authors and perform a user study evaluating photorealism, video-text alignment, and motion quantity and quality. While the public samples may have been chosen to showcase the method's strengths, our method features improved photorealism, video-text alignment and quality of motion. Results ate expressed in percentage of votes in favor of our method.</p>

<table id="example" class="table table-striped mt-5" style="width:100%">
<thead>
Expand All @@ -104,8 +104,8 @@ <h2 class="pt-4">Comparison to Imagen Video</h2>
</tbody>
</table>

<p class="lead mt-5">We compare results produced by our method (left) with results produced by Imagen Video (right)</p>
<p class="lead pt-2">Hover the cursor on the video to reveal the prompt.</p>
<p class="lead text-justify mt-5">We compare results produced by our method (left) with results produced by Imagen Video (right)</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>

<!-- Grid row -->
<div class="row pt-3 nopadding">
Expand Down Expand Up @@ -1038,7 +1038,7 @@ <h4 class="pt-3">Imagen Video</h4>
<!-- Container -->

<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="gen2_pikalab_floor33.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="pyoco.html" role="button">Next</a>
</div>
Expand Down
14 changes: 7 additions & 7 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -158,13 +158,13 @@ <h3 class="pt-5">The Snap Video Model</h3>

</div>

<p class="lead pt-2">The widely adopted U-Net architecture is required to fully processes each video frame. This increases computational overhead compared to purely text-to-image models, posing a very practical limit on model scalability. In addition, extending U-Net-based architectures to naturally support spatial and temporal dimensions requires volumetric attention operations, which have prohibitive computational demands.</p>
<p class="lead text-justify pt-2">The widely adopted U-Net architecture is required to fully processes each video frame. This increases computational overhead compared to purely text-to-image models, posing a very practical limit on model scalability. In addition, extending U-Net-based architectures to naturally support spatial and temporal dimensions requires volumetric attention operations, which have prohibitive computational demands.</p>

<p class="lead pt-2">Inspired by FITs, we propose to leverage redundant information between frames and introduce a scalable transformer architecture that treats spatial and temporal dimensions as a single, compressed, 1D latent vector. This highly compressed representation allows us to perform spatio-temporal computation jointly and enables modelling of complex motions.</p>
<p class="lead text-justify pt-2">Inspired by FITs, we propose to leverage redundant information between frames and introduce a scalable transformer architecture that treats spatial and temporal dimensions as a single, compressed, 1D latent vector. This highly compressed representation allows us to perform spatio-temporal computation jointly and enables modelling of complex motions.</p>

<p class="lead pt-2">Thanks to joint spatiotemporal video modeling, Snap Video can synthesize temporally coherent videos with large motion (left) while retaining the semantic control capabilities typical of large-scale text-to-video generators (right).</p>
<p class="lead text-justify pt-2">Thanks to joint spatiotemporal video modeling, Snap Video can synthesize temporally coherent videos with large motion (left) while retaining the semantic control capabilities typical of large-scale text-to-video generators (right).</p>

<p class="lead pt-2">Hover the cursor on the video to reveal the prompt.</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>

<!-- Grid row -->
<div class="row mt-5 pt-5 nopadding">
Expand Down Expand Up @@ -238,14 +238,14 @@ <h3 class="pt-5">The Snap Video Model</h3>
<h3 class="pt-5">Acknowledgements</h3>


<p class="lead pt-2">We would like to thank Oleksii Popov, Artem Sinitsyn, Anton Kuzmenko, Vitalii Kravchuk, Vadym Hrebennyk, Grygorii Kozhemiak, Tetiana Shcherbakova, Svitlana Harkusha, Oleksandr Yurchak, Andrii Buniakov, Maryna Marienko, Maksym Garkusha, Brett Krong, Anastasiia Bondarchuk for their help in the realization of video presentations, stories and graphical assets, Colin Eles, Dhritiman Sagar, Vitalii Osykov, Eric Hu for their supporting technical activities, Maryna Diakonova for her assistance with annotation tasks.</p>
<p class="lead text-justify pt-2">We would like to thank Oleksii Popov, Artem Sinitsyn, Anton Kuzmenko, Vitalii Kravchuk, Vadym Hrebennyk, Grygorii Kozhemiak, Tetiana Shcherbakova, Svitlana Harkusha, Oleksandr Yurchak, Andrii Buniakov, Maryna Marienko, Maksym Garkusha, Brett Krong, Anastasiia Bondarchuk for their help in the realization of video presentations, stories and graphical assets, Colin Eles, Dhritiman Sagar, Vitalii Osykov, Eric Hu for their supporting technical activities, Maryna Diakonova for her assistance with annotation tasks.</p>


</div>


<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="make_a_video.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="stories.html" role="button">Next</a>
</div>
Expand Down
10 changes: 5 additions & 5 deletions make_a_video.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -81,7 +81,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<h2 class="pt-4">Comparison to Make-A-Video</h2>

<p class="lead">We compare Snap Video against publicly available samples released by the authors and perform a user study evaluating photorealism, video-text alignment, and motion quantity and quality. While the public samples may have been chosen to showcase the method's strengths, our method outperforms the baseline on all metrics. Results are expressed in percentage of votes in favor of our method.</p>
<p class="lead text-justify">We compare Snap Video against publicly available samples released by the authors and perform a user study evaluating photorealism, video-text alignment, and motion quantity and quality. While the public samples may have been chosen to showcase the method's strengths, our method outperforms the baseline on all metrics. Results are expressed in percentage of votes in favor of our method.</p>

<table id="example" class="table table-striped mt-5" style="width:100%">
<thead>
Expand All @@ -104,8 +104,8 @@ <h2 class="pt-4">Comparison to Make-A-Video</h2>
</tbody>
</table>

<p class="lead mt-5">We compare results produced by our method (left) with results produced by Make-A-Video (right)</p>
<p class="lead pt-2">Hover the cursor on the video to reveal the prompt.</p>
<p class="lead text-justify mt-5">We compare results produced by our method (left) with results produced by Make-A-Video (right)</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>

<!-- Grid row -->
<div class="row pt-3 nopadding">
Expand Down Expand Up @@ -308,7 +308,7 @@ <h4 class="pt-3">Make-A-Video</h4>
<!-- Container -->

<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="video_ldm.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Next</a>
</div>
Expand Down
10 changes: 5 additions & 5 deletions our_samples.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -81,11 +81,11 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<h2 class="pt-4">Snap Video Samples</h2>

<p class="lead">We show a collection of samples produced by our model on a set of gathered prompts.</p>
<p class="lead text-justify">We show a collection of samples produced by our model on a set of gathered prompts.</p>

<p class="lead">Snap Video can syntesize a large number of different concepts. Most importantly, thanks to joint spatiotemporal modeling, it can produce videos with challenging motion including large camera movement, POV videos, and videos of fast moving objects. Notably, the method maintains temporal consistency and avoids video flickering artifacts.</p>
<p class="lead text-justify">Snap Video can syntesize a large number of different concepts. Most importantly, thanks to joint spatiotemporal modeling, it can produce videos with challenging motion including large camera movement, POV videos, and videos of fast moving objects. Notably, the method maintains temporal consistency and avoids video flickering artifacts.</p>

<p class="lead pt-2">Hover the cursor on the video to reveal the prompt.</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>

<!-- Grid row -->
<div class="row pt-3 nopadding">
Expand Down Expand Up @@ -1378,7 +1378,7 @@ <h2 class="pt-4">Snap Video Samples</h2>
</div>

<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="stories.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples_3d.html" role="button">Next</a>
</div>
Expand Down
8 changes: 4 additions & 4 deletions our_samples_3d.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -81,10 +81,10 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<h2 class="pt-4">Novel View Generation</h2>

<p class="lead">We show a collection of samples obtained from Snap Video with prompts eliciting circular camera movement around different object categories. We find that the model is capable of generating plausible novel views of objects, suggesting that the model possesses an understanding of the 3D object geometry.</p>
<p class="lead text-justify">We show a collection of samples obtained from Snap Video with prompts eliciting circular camera movement around different object categories. We find that the model is capable of generating plausible novel views of objects, suggesting that the model possesses an understanding of the 3D object geometry.</p>


<p class="lead pt-2">Hover the cursor on the video to reveal the prompt.</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>

<!-- Grid row -->
<div class="row pt-3 nopadding">
Expand Down Expand Up @@ -1295,7 +1295,7 @@ <h2 class="pt-4">Novel View Generation</h2>
</div>

<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples_diversity.html" role="button">Next</a>
</div>
Expand Down
8 changes: 4 additions & 4 deletions our_samples_diversity.html
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<div class="container-md">

<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">

<a class="sm-1 mx-1 btn btn-primary mt-2" href="" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
Expand Down Expand Up @@ -81,10 +81,10 @@ <h4 class="font-italic pt-2" style="font-weight: normal">

<h2 class="pt-4">Samples Diversity</h2>

<p class="lead">To show the capabilities of Snap Video of producing varied outputs, we select a set of prompts and sample 3 videos from each, showing the results in each row. Our model is capable of producing diverse outputs for each prompt.</p>
<p class="lead text-justify">To show the capabilities of Snap Video of producing varied outputs, we select a set of prompts and sample 3 videos from each, showing the results in each row. Our model is capable of producing diverse outputs for each prompt.</p>


<p class="lead pt-2">Hover the cursor on the video to reveal the prompt.</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>

<!-- Grid row -->
<div class="row pt-3 nopadding">
Expand Down Expand Up @@ -955,7 +955,7 @@ <h2 class="pt-4">Samples Diversity</h2>
</div>

<div class="container-md mt-4">
<div class="row pt-1 justify-content-sm-center">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples_3d.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples_hierarchical.html" role="button">Next</a>
</div>
Expand Down
Loading

0 comments on commit 6d9d747

Please sign in to comment.