Skip to content

Commit

Permalink
add architecture
Browse files Browse the repository at this point in the history
  • Loading branch information
francescotaioli committed Dec 2, 2024
1 parent 3d472ad commit 1acd49e
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,7 @@ <h2 class="title is-3">Abstract</h2>

<section class="section">
<div class="container is-max-desktop is-centered">

<!-- Abstract.
<div class="columns is-centered has-text-centered">
<div class="column is-10">
Expand Down Expand Up @@ -366,6 +367,25 @@ <h2 class="title is-3">Abstract</h2>
<div class="columns is-centered has-text-centered">
<div class="column is-is-full">
<h2 class="title is-3">Method</h2>
<img
class="image is-fullwidth"
src="./static/images/arch_1.png"
alt="Teaser"
/>

<br>
<p style="text-align: justify;">
Graphical depiction of <b>AIUTA</b>: left shows its interaction cycle with the user, and right provides an exploded view of our method.
① The agent receives an initial instruction <i>I</i>: "Find a c = <i>< object category > </i>". ② At each timestep t, a zero-shot policy π (VLFM), comprising a
frozen object detection module, selects the optimal action a<sub>t</sub>. ③ Upon detection, the agent performs the proposed AIUTA. Specifically,
④ the agent first obtains an initial scene description of observation O<sub>t</sub> from a VLM. Then, a Self-Questioner module leverages an LLM to
automatically generate attribute-specific questions to the VLM, acquiring more information and refining the scene description with reduced
attribute-level uncertainty, producing S<sub>refined</sub>. ⑤ The Interaction Trigger module then evaluates S<sub>refined</sub> against the “facts” related to
the target, to determine whether to terminate the navigation (if the agent believes it has located the target object ⑥), or to pose template-free,
natural-language questions to a human ⑦, updating the “facts” based on the response ⑧.
</p>
<hr>
<h1 class="title is-3">Video</h1>
<div class="publication-video">
<iframe
style="display: block; background-color: white"
Expand All @@ -378,6 +398,7 @@ <h2 class="title is-3">Method</h2>
></iframe>
</div>
</div>

</div>
<!--/ Paper video. -->
</div>
Expand Down
Binary file added static/images/arch_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1acd49e

Please sign in to comment.