-
Notifications
You must be signed in to change notification settings - Fork 0
/
our_samples_hierarchical.html
197 lines (153 loc) · 9.4 KB
/
our_samples_hierarchical.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<!-- Bootstrap CSS -->
<link href="resources/bootstrap.min.css" rel="stylesheet">
<link href="resources/stylesheet.css" rel="stylesheet">
<title>Snap Video</title>
<link rel="icon" type="image/x-icon" href="resources/cvr_logo_notext.png">
</head>
<body>
<section class="jumbotron text-center pb-2">
<div class="container">
<h1 class="jumbotron-heading">Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis</h1>
<h4 class="font-italic pt-2" style="font-weight: normal">
<a href="https://www.willimenapace.com/">Willi Menapace</a><sup>1,2,*</sup>
<a href="https://github.com/AliaksandrSiarohin">Aliaksandr Siarohin</a><sup>1</sup>
<a href="https://universome.github.io/">Ivan Skorokhodov</a><sup>1</sup>
<a href="https://edeyneka.github.io/">Ekaterina Deyneka</a><sup>1</sup>
<a href="https://tsaishien-chen.github.io/">Tsai-Shien Chen</a><sup>1,3,*</sup></br>
<a href="https://anilkagak2.github.io/">Anil Kag</a><sup>1</sup>
<a href="https://yuwfan.github.io/">Yuwei Fang</a><sup>1</sup>
<a href="https://scholar.google.com/citations?user=PVZ0-dEAAAAJ&hl=en">Aleksei Stoliar</a><sup>1</sup>
<a href="http://elisaricci.eu/">Elisa Ricci</a><sup>2,4</sup>
<a href="https://alanspike.github.io/">Jian Ren</a><sup>1</sup>
<a href="http://www.stulyakov.com/">Sergey Tulyakov</a><sup>1</sup>
</h4>
<p class="pb-4 mt-3">
Snap Inc.<sup>1</sup> University of Trento<sup>2</sup> UC Merced<sup>3</sup> Fondazione Bruno Kessler<sup>4</sup>
Work performed while interning at Snap Inc.<sup>*</sup><br>
</p>
</div>
</section>
<div class="container-md">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="http://arxiv.org/abs/2402.14797" role="button">Paper</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="index.html" role="button">Overview</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="stories.html" role="button">Stories</a>
<div class="btn-group">
<a class="sm-1 mx-1 btn btn-primary mt-2 dropdown-toggle" href="#" role="button" id="dropdownMenuLink1" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Our Samples</a>
<div class="dropdown-menu" aria-labelledby="dropdownMenuLink1">
<a class="dropdown-item" href="our_samples.html" role="button">More Samples</a>
<a class="dropdown-item" href="our_samples_3d.html" role="button">Novel Views</a>
<a class="dropdown-item" href="our_samples_diversity.html" role="button">Samples Diversity</a>
<a class="dropdown-item" href="our_samples_hierarchical.html" role="button">Hierarchical Generation</a>
</div>
</div>
<div class="btn-group">
<a class="sm-1 mx-1 btn btn-primary mt-2 dropdown-toggle" href="#" role="button" id="dropdownMenuLink2" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Comparisons</a>
<div class="dropdown-menu" aria-labelledby="dropdownMenuLink2">
<a class="dropdown-item" href="gen2_pikalab_floor33.html" role="button">Gen2-PikaLab-Floor33</a>
<a class="dropdown-item" href="imagen_video.html" role="button">Imagen Video</a>
<a class="dropdown-item" href="pyoco.html" role="button">PYoCo</a>
<a class="dropdown-item" href="video_ldm.html" role="button">Video LDM</a>
<a class="dropdown-item" href="make_a_video.html" role="button">Make-A-Video</a>
</div>
</div>
</div>
</div>
<div class="container">
<hr class="mt-5">
<h2 class="pt-4">Hierarchical Generation</h2>
<p class="lead text-justify">We devise a hierarchical generation strategy to increase video duration and framerate where we adopt the reconstruction guidance method of "Video Diffusion Models" to condition the video generator on previously generated frames. We define a hierarchy of progressively increasing framerates and start by autoregressively generating a video of the desired length at the lowest framerate, at each step using the last generated frame as the conditioning. Subsequently, for each successive framerate in the hierarchy, we autoregressively generate a video of the same length but conditioning the model on all frames that have already been generated at the lower framerates.</p>
<p class="lead text-justify">We show a selection of 32 frames videos sampled at 12fps.</p>
<p class="lead text-justify pt-2">Hover the cursor on the video to reveal the prompt.</p>
<!-- Grid row -->
<div class="row pt-3 nopadding">
</div>
</div>
<div class="container" style="max-width: 100%; ">
<!-- Grid row -->
<div class="row pt-3 nopadding">
<!-- Grid column -->
<div class="col-sm-6 col-md-6 col-xl-4 col-lg-6 p-0 text-center containeroverlay">
<video class="video-fluid w-100 flex" controls autoplay playsinline loop muted>
<source src="video_samples/our_samples_hierarchical/4a33ccea-862d-4fa4-a2f5-267b64484f8f.mp4" type="video/mp4" />
</video>
<div class="overlay">
<div class="textoverlay">A chihuahua in astronaut suit floating in space, photo realistic, 8k, cinematic lighting, hd, atmospheric, hyperdetailed, photography, glow effect.</div>
</div>
</div>
<!-- Grid column -->
<!-- Grid column -->
<div class="col-sm-6 col-md-6 col-xl-4 col-lg-6 p-0 text-center containeroverlay">
<video class="video-fluid w-100 flex" controls autoplay playsinline loop muted>
<source src="video_samples/our_samples_hierarchical/fa47a3d9-6af6-4eb6-b8ae-caf642c08780.mp4" type="video/mp4" />
</video>
<div class="overlay">
<div class="textoverlay">In a high-tech control room, otters operate an imaginary spaceship console, embarking on an interstellar adventure. Cinematic lighting effects enhance the futuristic setting, and the camera executes quick cuts to showcase the excitement of their space journey.</div>
</div>
</div>
<!-- Grid column -->
<!-- Grid column -->
<div class="col-sm-6 col-md-6 col-xl-4 col-lg-6 p-0 text-center containeroverlay">
<video class="video-fluid w-100 flex" controls autoplay playsinline loop muted>
<source src="video_samples/our_samples_hierarchical/8b8e19c1-2b98-4bf8-a3b9-8a269bf9b519.mp4" type="video/mp4" />
</video>
<div class="overlay">
<div class="textoverlay">In Macro len style, a photograph of a knight in shining armor holding a basketball</div>
</div>
</div>
<!-- Grid column -->
<!-- Grid column -->
<div class="col-sm-6 col-md-6 col-xl-4 col-lg-6 p-0 text-center containeroverlay">
<video class="video-fluid w-100 flex" controls autoplay playsinline loop muted>
<source src="video_samples/our_samples_hierarchical/d9ebdc1a-1bc1-4465-a999-b9af0c0cc20c.mp4" type="video/mp4" />
</video>
<div class="overlay">
<div class="textoverlay">A corgi ice skating in winter wonderland, photorealistic</div>
</div>
</div>
<!-- Grid column -->
<!-- Grid column -->
<div class="col-sm-6 col-md-6 col-xl-4 col-lg-6 p-0 text-center containeroverlay">
<video class="video-fluid w-100 flex" controls autoplay playsinline loop muted>
<source src="video_samples/our_samples_hierarchical/a41e90f1-91b6-4b05-b9f6-4633735ecf4d.mp4" type="video/mp4" />
</video>
<div class="overlay">
<div class="textoverlay">In a potter's studio, skilled hands mold clay into a delicate sculpture. Utilize sweeping arcs to highlight the shaping process, emphasizing the intricate details emerging from the artist's touch.</div>
</div>
</div>
<!-- Grid column -->
<!-- Grid column -->
<div class="col-sm-6 col-md-6 col-xl-4 col-lg-6 p-0 text-center containeroverlay">
<video class="video-fluid w-100 flex" controls autoplay playsinline loop muted>
<source src="video_samples/our_samples_hierarchical/f96fc01f-fd93-4936-a2a5-1cb147342480.mp4" type="video/mp4" />
</video>
<div class="overlay">
<div class="textoverlay">In Macro len style, a photograph of a knight in shining armor holding a basketball</div>
</div>
</div>
<!-- Grid column -->
</div>
<!-- Grid row -->
</div>
<div class="container-md mt-4">
<div class="row pt-1 justify-content-center">
<a class="sm-1 mx-1 btn btn-primary mt-2" href="our_samples_diversity.html" role="button">Back</a>
<a class="sm-1 mx-1 btn btn-primary mt-2" href="gen2_pikalab_floor33.html" role="button">Next</a>
</div>
</div>
<!-- Footer -->
<section class="jumbotron text-center py-7 mt-5 mb-0">
</section>
<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="resources/jquery-3.4.1.slim.min.js"></script>
<script src="resources/popper.min.js"></script>
<script src="resources/bootstrap.min.js"></script>
</body>
</html>