Skip to content

UTAustin-SwarmLab/Neuro-Symbolic-Video-Search-Temporal-Logic

Repository files navigation

Neuro Symbolic Video Search with Temporal Logic (NSVS-TL)

arXiv Paper Website GitHub GitHub

Abstract

The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. Hence, decoupling but co-designing the semantic understanding and temporal reasoning is essential for efficient scene identification. We propose a system that leverages vision-language models for semantic understanding of individual frames but effectively reasons about the long-term evolution of events using state machines and temporal logic (TL) formulae that inherently capture memory. Our TL-based reasoning improves the F1 score of complex event identification by 9-15% compared to benchmarks that use GPT-4 for reasoning on state-of-the-art self-driving datasets such as Waymo and NuScenes. The source code is available on Github.

Installation Guide

Ensure you have CUDA 12.4 installed and available on your system.
On Linux, you can verify with:

nvcc --version

From the root of the repo, run the following to build all STORM dependencies:

./build_dependency

Next, install uv:

pip install uv

Next, set up venv:

uv venv .venv

Finally, install everything in pyproject.toml to build project dependencies:

uv sync

Running the System

NSVS can be run in two ways: running it with raw mp4 files and input queries or running it via the TLV dataset.

To run it with mp4 files, modify the mp4 file paths and the natural language search query inside execute_with_mp4.py and run it with:

uv run execute_with_mp4

To run it with the TLV dataset, first download the dataset from GitHub. Then, specify the dataset path in execute_with_tlv.py and run the program:

uv run execute_with_tlv

Connect with Me

Feel free to connect with me through these professional channels:

LinkedIn Email Google Scholar Website X

Citation

If you find this repo useful, please cite our paper:

@inproceedings{choi2024towards,
  title={Towards neuro-symbolic video understanding},
  author={Choi, Minkyu and Goel, Harsh and Omama, Mohammad and Yang, Yunhao and Shah, Sahil and Chinchali, Sandeep},
  booktitle={European Conference on Computer Vision},
  pages={220--236},
  year={2024},
  organization={Springer}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 5