This repository lists papers authored by Focoos AI.
Title | Venue | Code |
---|---|---|
📜 SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi, Carlo Masone, Giuseppe Averta SAMWISE is a Referring Video Object Segmentation (RVOS) method that overcomes limitations of previous models by enabling streaming processing while retaining context. Built on the Segment-Anything 2 (SAM2) model, it integrates natural language understanding and temporal modeling, achieving state-of-the-art performance with minimal overhead. |
CVPR 2025 | |
📜 AI Versus Nature: Navigating the Complex Interplay of Technology and the Environment Barbara Caputo, Antonio Tavera, Fabio Cermelli, Giuseppe Roberto Marseglia The relationship between technology and the environment is complex, particularly with the advent of advanced artificial intelligence tools. While AI technologies consume significant energy and resources, they also offer solutions to environmental challenges. This chapter examines the dual impact of AI: increasing energy consumption and pollution, yet enabling more efficient and sustainable practices. Key applications include optimizing wind turbine placement, photovoltaic production, and energy consumption patterns. Additionally, the need for holistic energy assessments is emphasized, highlighting emerging efficient AI systems and advancements in public services like waste and water management. |
Springer Nature | - |
Title | Venue | Code |
---|---|---|
📜 PEM: Prototype-based Efficient MaskFormer for Image Segmentation Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano, Francesca Pistilli, Marco Ciccone, Giuseppe Averta, Fabio Cermelli Prototype-based Efficient MaskFormer (PEM) is a transformer-based architecture for image segmentation that improves efficiency without sacrificing performance. It uses prototype-based cross-attention and a multi-scale feature pyramid network to reduce computation. PEM outperforms task-specific models while being more computationally efficient. |
CVPR 2024 | 🌐 Project Page |
📜 The Revenge of BiSeNet: Efficient Multi-Task Image Segmentation Gabriele Rosi, Claudia Cuttano, Niccolò Cavagnero, Giuseppe Averta, Fabio Cermelli BiSeNetFormer is a multi-task image segmentation architecture designed for efficiency and accuracy, supporting semantic and panoptic segmentation. It combines two-stream architectures with a transformer-based segmentation head, achieving high inference speeds and competitive accuracy on datasets like Cityscapes and ADE20K. |
CVPR 2024 (Workshop) | - |
📜 What does CLIP know about peeling a banana? Claudia Cuttano, Gabriele Rosi, Gabriele Trivigno, Giuseppe Averta AffordanceCLIP leverages pre-trained Vision-Language models like CLIP to improve affordance segmentation for robots, bypassing the need for costly annotations or predefined actions. It achieves competitive zero-shot performance, works with any action prompt, and requires minimal additional training, enabling scalable, flexible models. |
CVPR 2024 (Workshop) | - |
Feel free to explore the papers and reach out for collaborations or inquiries!