-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
246 lines (236 loc) · 14.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Meta and Title -->
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Story-Adapter</title>
<!-- Fonts and Icons -->
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500&display=swap" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
<!-- External Stylesheets -->
<link rel="stylesheet" href="styles.css">
<link href="../css/cs.css" rel="stylesheet">
<!-- Internal Styles -->
<style>
body {
background: #fdfcf9 no-repeat fixed top left;
font-family:'DM Mono','Open Sans', sans-serif;
}
pre {
/* Existing styles */
}
code {
/* Existing styles */
}
</style>
</head>
<body>
<header>
<!-- Existing header content -->
<div class="container">
<h1>Story-Adapter: A Training-free Iterative Framework</h1>
<br/>
<br/>
<br/>
<h1> For Long Story Visualization</h1>
<br/>
<br/>
<br/>
<br/>
<br/>
<div class="details">
<a href="https://github.com/jwmao1" target="_blank">Jiawei Mao</a> <sup>* 1,2</sup>,
<a href="https://xk-huang.github.io/" target="_blank">Xiaoke Huang</a> <sup>* 1</sup>,
<a href="https://yunfeixie233.github.io/" target="_blank">Yunfei Xie</a> <sup>1</sup>,
<a href="" target="_blank">Yuanqi Chang</a> <sup>2</sup>,
<a href="https://thefllood.github.io/mudehui.github.io/" target="_blank">Mude Hui</a> <sup>1</sup>,
<a href="https://scholar.google.com/citations?user=JHTNigYAAAAJ&hl=en" target="_blank">Bingjie Xu</a> <sup>3</sup>,
<a href="https://yuyinzhou.github.io/" target="_blank">Yuyin Zhou</a> <sup>1</sup>
</div>
<div class="details">
<sup>1</sup><a href="https://ucsc-vlaa.github.io/" target="_blank">UC Santa Cruz</a>,
<sup>2</sup><a href="https://en.hdu.edu.cn/" target="_blank">Hangzhou Dianzi University</a>,
<sup>3</sup><a href="https://www.singaporetech.edu.sg/" target="_blank">Singapore Institute of Technology</a>
</div>
<div class="details"><sup>*</sup>Equal Contribution</div>
<div class="links">
<a href="https://github.com/jwmao1/story-adapter" target="_blank">
<i class="fab fa-github"></i> GitHub
</a>
<a href="https://arxiv.org/abs/2410.06244" target="_blank">
<i class="fas fa-file-alt"></i> arXiv
</a>
</div>
</div>
</header>
<div class="container main">
<section class="section section-abstract">
<div class="add-div" style="height: 150px;">
<img src="images/logo2.png" height="150"/>
</div>
<h2>Abstract</h2>
<div class="abs">
Story visualization, the task of generating coherent images based on a narrative, has seen significant advancements with the emergence of text-to-image models, particularly diffusion models. However, maintaining semantic consistency, generating high-quality fine-grained interactions, and ensuring computational feasibility remain challenging, especially in long story visualization (i.e., up to 100 frames).
In this work, we propose a training-free and computationally efficient framework, termed Story-Adapter, to enhance the generative capability of long stories. Specifically, we propose an iterative paradigm to refine each generated image, leveraging both the text prompt and all generated images from the previous iteration.
Central to our framework is a training-free global reference cross-attention module, which aggregates all generated images from the previous iteration to preserve semantic consistency across the entire story, while minimizing computational costs with global embeddings. This iterative process progressively optimizes image generation by repeatedly incorporating text constraints, resulting in more precise and fine-grained interactions. Extensive experiments validate the superiority of Story-Adapter in improving both semantic consistency and generative capability for fine-grained interactions, particularly in long story scenarios.
</div>
</section>
<section class="section section-other">
<h2>Story-Adapter Architecture</h2>
<div class="abs">
Story-Adapter framework. Illustration of the proposed iterative paradigm, which consists of initialization, iterations in Story-Adapter, and implementation of Global Reference Cross-Attention (GRCA). Story-Adapter first visualizes each image only based on the text prompt of the story and uses all results as reference images for the future round. In the iterative paradigm, Story-Adapter inserts GRCA into SD. For the ith iteration of each image visualization, GRCA will aggregate the information flow of all reference images during the denoising process through cross-attention. All results from this iteration will be used as a reference image to guide the dynamic update of the story visualization in the next iteration.
</div>
<div class="add-div">
<img src="images/images/architecture.jpg" style="width: 100%;"/>
</div>
</section>
<!-- 修改后的 Regular-length Story Visualization 部分 -->
<section class="section section-no-border">
<h2>Regular-length Story Visualization</h2>
<div class="video-grid-6">
<!-- 视频 1 -->
<div class="video-item">
<video src="images/video_regular_length/video1.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Pigeon" visualized by our Story-Adapter</div>
</div>
<!-- 视频 2 -->
<div class="video-item">
<video src="images/video_regular_length/video2.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Dinosaur and Traveler" visualized by our Story-Adapter</div>
</div>
<!-- 视频 3 -->
<div class="video-item">
<video src="images/video_regular_length/video3.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Boy" visualized by our Story-Adapter</div>
</div>
<!-- 视频 4 -->
<div class="video-item">
<video src="images/video_regular_length/video4.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Pepper" visualized by our Story-Adapter</div>
</div>
<!-- 视频 5 -->
<div class="video-item">
<video src="images/video_regular_length/video5.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Gril" visualized by our Story-Adapter</div>
</div>
<!-- 视频 6 -->
<div class="video-item">
<video src="images/video_regular_length/video6.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Animal Rescuer" visualized by our Story-Adapter</div>
</div>
<!-- 视频 7 -->
<div class="video-item">
<video src="images/video_regular_length/video7.mp4" autoplay controls muted></video>
<div class="video-title">A story of "City Monkey" visualized by our Story-Adapter</div>
</div>
<!-- 视频 8 -->
<div class="video-item">
<video src="images/video_regular_length/video8.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Old Man and Monkey" visualized by our Story-Adapter</div>
</div>
<!-- 视频 9 -->
<div class="video-item">
<video src="images/video_regular_length/video9.mp4" autoplay controls muted></video>
<div class="video-title">A story of "The Boy's Journey" visualized by our Story-Adapter</div>
</div>
<!-- 视频 10 -->
<div class="video-item">
<video src="images/video_regular_length/video10.mp4" autoplay controls muted></video>
<div class="video-title">A story of "A Day for a Girl" visualized by our Story-Adapter</div>
</div>
<!-- 视频 11 -->
<div class="video-item">
<video src="images/video_regular_length/video11.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Rain" visualized by our Story-Adapter</div>
</div>
<!-- 视频 12 -->
<div class="video-item">
<video src="images/video_regular_length/video12.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Fruit" visualized by our Story-Adapter</div>
</div>
</div>
</section>
<!-- 修改后的 Long Story Visualization 部分 -->
<section class="section section-no-border">
<h2>Long Story Visualization</h2>
<div class="video-grid-5">
<!-- 50-length 视频 1 -->
<div class="video-item">
<video src="images/video_long_length/50/video1.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Little Red Riding Hood" visualized by our Story-Adapter</div>
</div>
<!-- 50-length 视频 2 -->
<div class="video-item">
<video src="images/video_long_length/50/video2.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Emperor and the Nightingale" visualized by our Story-Adapter</div>
</div>
<!-- 50-length 视频 3 -->
<div class="video-item">
<video src="images/video_long_length/50/video3.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Robinson Crusoe" visualized by our Story-Adapter</div>
</div>
<!-- 50-length 视频 4 -->
<div class="video-item">
<video src="images/video_long_length/50/video4.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Snowman" visualized by our Story-Adapter</div>
</div>
<!-- 50-length 视频 5 -->
<div class="video-item">
<video src="images/video_long_length/50/video5.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Loyal Dog" visualized by our Story-Adapter</div>
</div>
<!-- 100-length 视频 1 -->
<div class="video-item">
<video src="images/video_long_length/100/long_video1.mp4" autoplay controls muted></video>
<div class="video-title">A story of "The Tortoise and the Hare" visualized by our Story-Adapter</div>
</div>
<!-- 100-length 视频 2 -->
<div class="video-item">
<video src="images/video_long_length/100/long_video2.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Winnie the Pooh" visualized by our Story-Adapter</div>
</div>
<!-- 100-length 视频 3 -->
<div class="video-item">
<video src="images/video_long_length/100/long_video3.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Pirate" visualized by our Story-Adapter</div>
</div>
<!-- 100-length 视频 4 -->
<div class="video-item">
<video src="images/video_long_length/100/long_video4.mp4" autoplay controls muted></video>
<div class="video-title">A story of "Lonely Me" visualized by our Story-Adapter</div>
</div>
<!-- 100-length 视频 5 -->
<div class="video-item">
<video src="images/video_long_length/100/long_video5.mp4" autoplay controls muted></video>
<div class="video-title">A story of "The Prince and the Princess" visualized by our Story-Adapter</div>
</div>
</div>
</section>
<section class="section section-other">
<h2>Qualitative Comparison of Different Methods</h2>
<div class="abs">Qualitative comparison of story visualization shows AR-LDM and StoryGen generate coherent image sequences but degrade with story length due to autoregressive errors. StoryDiffusion and Story-Adapter perform well, though StoryDiffusion struggles with subject consistency and ID image flaws due to high computation demands. Story-Adapter better meets the requirements for effective story visualization.</div>
<div class="add-div">
<img src="images/images/Qualitative Comparison of StorySalon Story.png" style="width: 100%;"/>
</div>
</section>
<section class="section section-other">
<h2>BibTeX</h2>
<p><b>
If you find our work helpful for your research, please consider giving a citation 📃
</b></p>
<div class="add-div">
<pre><code>
@misc{mao2024story_adapter,
title={{Story-Adapter: A Training-free Iterative Framework for Long Story Visualization}},
author={Mao, Jiawei and Huang, Xiaoke and Xie, Yunfei and Chang, Yuanqi and Hui, Mude and Xu, Bingjie and Zhou, Yuyin},
journal={arXiv},
volume={abs/2410.06244},
year={2024},
}
</code></pre>
</div>
</section>
</div>
</body>
</html>