mmaaz60

Follow

😀

Muhammad Maaz mmaaz60

😀

Follow

I am a final year Ph.D. student in the Computer Vision Department at MBZUAI, working under the supervision of Dr. Salman Khan and Prof. Fahad Khan.

155 followers · 4 following

@MBZUAI
Abu Dhabi, UAE, San Francisco, USA
https://www.mmaaz60.com
in/mmaaz60
@mmaaz60

Achievements

Achievements

Organizations

mmaaz60/README.md

Hi there 👋

🔭 I’m currently working on multi-modal transformers and multi-task learning
🌱 I’m currently learning to play Table Tennis 🏓
📫 How to reach me: [email protected]

Pinned Loading

facebookresearch/perception_models facebookresearch/perception_models Public

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1.2k 62
mbzuai-oryx/Video-ChatGPT mbzuai-oryx/Video-ChatGPT Public

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1.4k 112
mbzuai-oryx/groundingLMM mbzuai-oryx/groundingLMM Public

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 886 48
mbzuai-oryx/VideoGPT-plus mbzuai-oryx/VideoGPT-plus Public

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 275 19
mbzuai-oryx/LLaVA-pp mbzuai-oryx/LLaVA-pp Public

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 836 61
EdgeNeXt EdgeNeXt Public

[CADL'22, ECCVW] Official repository of paper titled "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications".

Python 378 43