Skip to content

Popular repositories Loading

  1. Tune-A-Video Tune-A-Video Public

    [ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

    Python 4.3k 389

  2. Awesome-Video-Diffusion Awesome-Video-Diffusion Public

    A curated list of recent diffusion models for video generation, editing, and various other applications.

    4.3k 253

  3. computer_use_ootb computer_use_ootb Public

    Out-of-the-box (OOTB) GUI Agent for Windows and macOS

    Python 1.5k 155

  4. Show-o Show-o Public

    [ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

    Python 1.3k 58

  5. ShowUI ShowUI Public

    [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

    Python 1.2k 76

  6. Show-1 Show-1 Public

    [IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

    Python 1.1k 56

Repositories

Showing 10 of 90 repositories
  • FAR Public

    Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

    Python 170 MIT 5 0 0 Updated Apr 18, 2025
  • Awesome-Video-Diffusion Public

    A curated list of recent diffusion models for video generation, editing, and various other applications.

    4,299 252 1 0 Updated Apr 18, 2025
  • Awesome-Robotics-Diffusion Public

    A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.

    135 4 0 0 Updated Apr 16, 2025
  • ROICtrl Public

    Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

    Python 106 0 1 0 Updated Apr 16, 2025
  • computer_use_ootb Public

    Out-of-the-box (OOTB) GUI Agent for Windows and macOS

    Python 1,508 Apache-2.0 155 29 7 Updated Apr 15, 2025
  • GUI-Thinker Public

    Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.

    Python 61 5 1 0 Updated Apr 11, 2025
  • Awesome-MLLM-Hallucination Public

    📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

    651 24 1 0 Updated Apr 9, 2025
  • Awesome-Unified-Multimodal-Models Public

    📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

    505 24 1 0 Updated Apr 9, 2025
  • GUI-Narrator Public

    Repository of GUI Action Narrator

    JavaScript 10 0 0 0 Updated Apr 8, 2025
  • VideoGUI Public

    [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

    JavaScript 33 2 0 0 Updated Apr 7, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.