Skip to content

chenfei-wu/chenfei-wu.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chenfei WU (吴晨飞)

Google Scholar | Github | LinkedIn | [email protected]

Dr. Wu Chenfei obtained his doctoral degree from Beijing University of Posts and Telecommunications in 2020 and currently is a senior researcher at Microsoft Research Asia. His research focuses on large-scale pre-training, multimodal understanding, and generation. His main research includes a series of multimodal generation models NUWA (NUWA, NUWA-LIP, NUWA-Infinity, NUWA-3D, NUWA-XL), a series of multimodal understanding models (KD-VLP, Bridge-Tower), and multimodal dialogue systems (Visual ChatGPT, TaskMatrix.AI). He published several papers in conferences such as CVPR, NeurIPS, ACL, ECCV, AAAI, MM, with more than 1000 citations. His Github open source projects have been liked more than 30,000 times.

吴晨飞,北京邮电大学博士,微软亚洲研究院高级研究员。研究方向为大模型预训练、多模态理解和生成。主要研究工作包括多模态生成模型 NUWA(女娲)系列(NUWA, NUWA-Infinity, NUWA-XL, DragNUWA)、多模态理解模型 Bridge Tower(桥塔)系列(KD-VLP, Bridge-Tower)以及多模态对话系统(Visual ChatGPT, TaskMatrix.AI)。在 CVPR, NeurIPS, ACL, ECCV, AAAI, MM 等会发表多篇论文,引用量千余次, Github 开源项目获赞三万余次。

Highlight

Talks

Media Report

Publications

Multimodal Generation

  • Godiva: Generating open-domain videos from natural descriptions.
    Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan.
    Arxiv, 2021

  • Nüwa: Visual synthesis pre-training for neural visual world creation.
    Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan.
    ECCV, 2022.

  • NUWA-LIP: language-guided image inpainting with defect-free VQGAN.
    Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, Wangmeng Zuo, Nan Duan.
    CVPR 2023.

  • NUWA-Infinity: Autoregressive over autoregressive generation for infinite visual synthesis.
    Jian Liang, Chenfei Wu, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan.
    CVPR 2022.

  • NUWA-XL: Diffusion over diffusion for extremely long video generation.
    Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan.
    ACL 2023.

  • DragNUWA: Fine-grained control in video generation by integrating text, image, and trajectory.
    Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan.
    Arxiv 2023.

  • StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis.
    Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan.
    Arxiv 2024.

  • LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models.
    Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan.
    ICLR 2024.

  • NUWA-3D: Learning 3D photography videos via self-supervised diffusion on single images.
    Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan.
    IJCAI 2023.

  • HORIZON: A High-Resolution Panorama Synthesis Framework.
    Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma.
    AAAI 2024.

  • Trace Controlled Text to Image Generation.
    Kun Yan, Lei Ji, Chenfei Wu, Jianmin Bao, Ming Zhou, Nan Duan, Shuai Ma.
    ECCV, 2022.

  • ORES: Open-vocabulary Responsible Visual Synthesis.
    Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan.
    AAAI 2024.

  • Reco: Region-controlled text-to-image generation.
    Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang.
    CVPR, 2023.

  • DiVAE: Photorealistic images synthesis with denoising diffusion decoder.
    Jie Shi, Chenfei Wu, Jian Liang, Xiang Liu, Nan Duan.
    Arxiv 2022.

Multimodal Understanding

  • Using Left and Right Brains Together: Towards Vision and Language Planning.
    Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang.
    Arxiv 2024.

  • Kd-vlp: Improving end-to-end vision-and-language pretraining with object knowledge distillation.
    Yongfei Liu, Chenfei Wu, Shao-yen Tseng, Vasudev Lal, Xuming He, Nan Duan.
    Findings of NAACL, 2022.

  • Bridgetower: Building bridges between encoders in vision-language representation learning.
    Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
    AAAI 2023.

  • ManagerTower: Aggregating the insights of uni-modal experts for vision-language representation learning.
    Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
    ACL 2023.

  • Learning temporal video procedure segmentation from an automatically collected large dataset.
    Lei Ji, Chenfei Wu, Daisy Zhou, Kun Yan, Edward Cui, Xilin Chen, Nan Duan.
    WACV 2022.

  • Deep reason: A strong baseline for real-world visual reasoning.
    Chenfei Wu, Yanzhao Zhou, Gen Li, Nan Duan, Duyu Tang, Xiaojie Wang.
    CVPR VQA Workshop, 2019.

  • Object-difference attention: A simple relational attention for visual question answering.
    Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong
    ACM Multimedia, 2018

  • Chain of reasoning for visual question answering.
    Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong.
    NeurIPS, 2018

  • Differential networks for visual question answering.
    Chenfei Wu, Jinlai Liu, Xiaojie Wang, Ruifan Li.
    AAAI, 2019.

  • Sequential visual reasoning for visual question answering.
    Jinlai Liu, Chenfei Wu, Xiaojie Wang, Xuan Dong.
    CCIS 2018.

Multimodal Systems/Evaluations

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models.
    Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan.
    arXiv, 2023.

  • Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis.
    Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan.
    Intelligent Computing, 2024

  • Vl-interpret: An interactive visualization tool for interpreting vision-language transformers.
    Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal.
    CVPR 2022.

  • Low-code llm: Visual programming over llms.
    Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan.
    Arxiv 2023.

  • Learning to program with natural language.
    Yiduo Guo, Yaobo Liang, Chenfei Wu, Wenshan Wu, Dongyan Zhao, Nan Duan.
    Arxiv 2023.

  • GEM: A general evaluation benchmark for multimodal tasks.
    Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti.
    Findings of ACL, 2021.

  • EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation.
    Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan.
    Arxiv 2023.

  • GameEval: Evaluating LLMs on Conversational Games.
    Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan.
    Arxiv 2023.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published