index.html

<html>
<head>

    <meta http-equiv="Content-Type" content="text/html">
    <meta name="description" content="***">
    <meta name="keywords" content="Kai Yu, Shanghai Jiao Tong Univerisity">
    <meta charset="UTF-8">

    <font face="Times New Roman,SimSun"> </font>
    <title> Kai Yu, Shanghai Jiao Tong Univerisity</title>

    <style>
        .tag {
            position: relative;
            left: 0;
            color: white; /* 标签文字颜色 */
            padding: 2px 5px; /* 标签内边距 */
            border-radius: 3px; /* 圆角边框 */
            font-size: 12px; /* 文字大小 */
        }

        .blue-tag {
            background-color: #3498db;
        }

        .red-tag {
            background-color: #e74c3c;
        }

        .green-tag {
            background-color: #2ecc71;
        }

        .yellow-tag {
            background-color: #f1c40f;
        }
    </style>

</head>

<body>
<table>
    <tr>
        <th></th>
        <th></th>
        <th></th>

    <tr>
        <td><img src="kaiyu.jpeg" width=200 alt="a photo"></td>
        <td>&nbsp;&nbsp;&nbsp;</td>

        <td><h1 style="display : inline"> Kai Yu </h1> Ph.D. (Cantab)
            </p>
            <strong>Distinguished Professor</strong></br>
            <a style="text-decoration:none" href="https://x-lance.sjtu.edu.cn/" target="_blank">Cross-media Language Intelligence (X-LANCE) Lab</a>  (Former SpeechLab)</br>
            <a style="text-decoration:none" href="https://www.cs.sjtu.edu.cn/index.aspx" target="_blank">Department of Computer Science and Engineering</a></br>
            <a style="text-decoration:none" href="https://www.sjtu.edu.cn/" target="_blank">Shanghai Jiao Tong University</a></br>
            </br>
            Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn</br>
            Address: Computer Science Department, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China</br>
            </br>
            <a style="text-decoration:none" href="./index_zh.html" target="_blank">[中文]</a>|[English]
        </td>
</table>

<hr>

<h3>Biography</h3>
<p>
I am currently a distinguished professor in the Department of Computer Science and Engineering at Shanghai Jiao Tong University (SJTU), as well as the co-founder and chief scientist of AISpeech. I am now leading the Institute of Intelligent Human-Computer Interaction of the Department of Computer Science, as well as the Center for Intelligent Speech and Natural Language Processing of the AI Institute of SJTU.  
</p>
<p>
My academic journey began at the Department of Automation at Tsinghua University, where I completed my bachelor and master degrees in 1999 and 2002 respectively. I obtained my PhD at the Machine Intelligence Lab of the Engineering Department, Cambridge University, U.K. in 2006 and then worked as a senior research associate there. I joined SJTU in 2012 and founded <i>SpeechLab</i> at SJTU. Later, SpeechLab is extended and renamed as <i><a style="text-decoration:none" href="https://x-lance.sjtu.edu.cn/" target="_blank">Cross-media Language Intelligence (X-LANCE) Lab</a></i> as it is now. I am a senior member of the IEEE and have served as a member of IEEE Speech and Language Processing Technical Committee (2017-2019) as well as an associate editor of IEEE/ACM Transactions on Audio, Speech, and Language Processing (2019-2024). I am currently a board member of the IEEE Signal Processing Society Conferences Board and Membership Board. I am also a distinguished member of the CCF (China Computer Federation), a member of the CCF council and serve as the director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF. 
</p>
<p>
My research interests primarily lie in the field of conversational AI, including rich aspects of speech and language processing as well as multi-modal linguistic computing. The goal of my research is to build cognitive conversational agent which can operate in complex real-world environment, deal with uncertainty, deliver information in a humanized way and evolve via interacting with environment. I have published over 200 peer-reviewed journal and conference papers and won numerous paper awards. I used to serve as program chairs for Interspeech, ICMI and SigDial, general chair for National Conference on Man-machine Communication (the largest domestic speech conference in China), as well as area chairs of speech processing or dialogue systems for Interspeech, ACL, EMNLP etc.  
</p>
<p>
The outcome of my research have been both recognized in academia and successfully industrialized. I founded AISpeech to commercialize state-of-the-art speech and language processing technology. AISpeech has been selected into the “AI Key Players” list in the Equity Research Report of AI by Goldman Sachs in 2016 and one of the Cool Vendors for AI (East Asia) by Gartner in 2017. On behalf of AISpeech, I am also leading the National AI Open Innovation Platform on Language Computing, granted by Ministry of Science and Technology of China in 2022.
</p>


<hr>

<h3> SJTU X-LANCE Lab </h3>
&nbsp;&nbsp;&nbsp;&nbsp; <font color="DarkRed"><i>We are looking for self-motivated Ph.D./master/undergraduate students and postdocs interested in speech and language processing. Please send your CV to me if you want to join
    us. </i></font><br/>


<h4>Research Interests</h4>
<ul>
    <li> <i> Speech and Audio Processing: </i> neural speech signal processing, robust speech and speaker recognition, high-fidelity speech synthesis, audio analysis and auditory cognition, multi-modal speech processing and universal audio model </li>
    <li> <i> Natural Language Processing: </i> structured language understanding, KBQA and machine reading comprehension, statistical dialogue systems, multi-lingual language processing, foundation language model, large language model agent </li>
    <li> <i> Multi-modal interaction: </i> digital avatar, GUI understanding and manipulation, AGI for science </li>
</ul>


<!--<h4>Students</h4>
<ul>
    <li>Lu Chen (Ph.D., -)</li>
    <li>Ruisheng Cao (Ph.D., 2021.3-)</li>
    <li>Danyang Zhang (Ph.D., 2020.9-)</li>
    <li>Zihan Zhao (Ph.D., 2020.9-)</li>
    <li>Hongshen Xu (Ph.D., 2019.9-)</li>
    </p>
</ul>
-->
<hr>


<h3>Selected Publication <a class="grey" href="https://scholar.google.com/citations?user=APssqUMAAAAJ&hl=zh-CN">[Google Scholar]</a><a class="grey" href="./publication_2023.pdf">[More Papers]</a></p></h3>

<!-- </td></tr></table> -->

<h4>Speech and Audio Processing</h4>
    <ul>
        <li>
        <p><span class="tag blue-tag">ASR</span> <b>TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-duration Transducer</b><br/>
            Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu and <b>Kai Yu</b><br/>

            ICASSP 2024
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <p><span class="tag blue-tag">Signal</span> <b>Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking</b><br/>
            Wenbin Jiang and <b>Kai Yu</b><br/>

            IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1758-1770, 2023
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag blue-tag">TTS</span> <b>Text-To-Speech With Latent Diffusion</b><br/>
            Zhijun Liu, Yiwei Guo and <b>Kai Yu</b><br/>

            ICASSP 2023
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag blue-tag">TTS</span> <b>VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature</b><br/>
            Chenpeng Du, Yiwei Guo, Xie Chen and <b> Kai Yu </b> <br/>
            Interspeech 2022
        </li>
        <li>
        <p><span class="tag blue-tag">RAA</span> <b>Towards Duration Robust Weakly Supervised Sound Event Detection</b><br/>
            Heinrich Dinkel, Mengyue Wu and <b> Kai Yu </b> <br/>
            IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 887-900, 2021
        </li>
    </ul>

<h4>Natural Language Processing</h4>
    <ul>
        <li>
        <p><span class="tag red-tag">LLM</span> <b>SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research</b><br/>
            Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen and <b>Kai Yu</b><br/>

            AAAI 2024
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag red-tag">LLM</span> <b>Large Language Models Are Semi-Parametric Reinforcement Learning Agents.</b><br/>
            Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao and <b>Kai Yu</b><br/>

            NeurIPS 2023
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag red-tag">NLP</span> <b>A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL</b><br/>
            Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, <b>Kai Yu</b> <br/>
            IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 11, pp. 13796-13813, 2023
        </li>
        <p><span class="tag red-tag">NLP</span> <b>OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue</b><br/>
            Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu and <b>Kai Yu</b> <br/>
            Transactions of the Association for Computational Linguistics (TACL), vol.11, pp. 68-84, 2022
        </li>
        <li>
        <p><span class="tag red-tag">NLP</span> <b>LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations</b><br/>
            Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu and <b>Kai Yu</b> <br/>
            ACL 2021
        </li>
    </ul>
<h4>Multi-modal Interaction</h4>
    <ul>
        <li>
        <p><span class="tag green-tag">Avatar</span> <b>DIFFDUB: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder</b><br/>
            Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen and <b>Kai Yu</b><br/>

            ICASSP 2024
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag green-tag">Avatar</span> <b>DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder</b><br/>
            Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, <b>Kai Yu</b>, Sheng Zhao and Jiang Bian<br/>

            ACM-MM 2023
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag green-tag">GUI</span> <b>Towards Multi-modal Conversational Agents on Mobile GUI</b><br/>
            Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu and <b>Kai Yu</b><br/>

            EMNLP 2022
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
        <li>
        <p><span class="tag green-tag">GUI</span> <b>TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages</b><br/>
            Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen and <b>Kai Yu</b><br/>

            NAACL 2022
            <!-- <a class="grey" href="https://arxiv.org/abs/2308.13149">[Paper]</a><a class="grey" href="https://bai-scieval.duiopen.com/#/">[Website]</a></p> -->
        </li>
    </ul>

<hr>
<h3>Professional Qualification and Service </h3>

<h4>Institute of Electrical and Electronics Engineers (IEEE)</h4>
    <ul>
        <li> Senior member of IEEE </li>
        <li> Board Member of IEEE Signal Processing Society Conferences Board </li>
        <li> Board Member of IEEE Signal Processing Society Membership Board  </li>
        <li> Member of IEEE Speech and Language Processing Technical Committee (2017-2019) </li>
        <li> Associate Editor of IEEE/ACM Transactions on Audio Speech and Language Processing (2019-2024) </li>
    </ul>

<h4>China Computer Federation (CCF)</h4>
    <ul>
        <li> Distinguished Member of CCF </li>
        <li> Member of the 13th Council of CCF </li>
        <li> Director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF </li>
        <li> Associate Director of the Corporation Development Forum (Suzhou) of CCF </li>
        <li> Standing Committee Member of the Large Model Forum of CCF </li>
    </ul>

<h4>Chinese Information Processing Society of China (CIPSC)</h4>
    <ul>
        <li> Member of the 9th Council of CIPSC </li>
        <li> Associate Director of the Speech Information Processing Technical Committee of CIPSC </li>
    </ul>

<h4>Industry Service</h4>
    <ul>
        <li> Director of the National AI Open Innovation Platform on Language Computing, Ministry of Science and Technology of China (MOST) </li> 
        <li> Member of the AI Key Technology and Application Evaluation Academic Committee of the Key Laboratory of the Ministry of Industry and Information Technology of China </li>
        <li> Member of the Information System User Interfaces Branch (TC28/SC35) of the National Information Technology Standardization Technical Committee </li>
        <li> Member of the 4th National Computer Science and Technology Terminology Approval Committee </li>
        <li> Director of the Academic and Intellectual Property Working Group of the China Artificial Intelligence Industry Alliance (AIIA) </li>
        <li> Associate Director of the Technical Committee of the Alliance of Intelligent Speech Technology Industry of China </li>
    </ul>

<h4>Other Service</h4>
    <ul>
        <li> Vice President of the Shanghai Overseas Returned Scholar Association (SORSA) </li>
        <li> Chairman of the AI Branch of SORSA </li>
        <li> Member of the Young Scientists Committee of the World Laureates Forum </li> 
    </ul>

<h4>Academic Conference Service</h4>
    <ul>
        <li> <b>ICASSP</b> </li>
            <ul><li> IEEE SLTC Member </li></ul>
        <li> <b>Interspeech</b> </li>
            <ul><li> Program Chair, Area Chair (Speech Recognition/Dialogue Systems) </li></ul>
        <li> <b>EUSIPCO</b></li>
            <ul><li> Area chair (Speech Processing) </li></ul>
        <li> <b>ACL</b> </li>
            <ul><li> (Senior) Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems/Spoken Language Technology) </li></ul>
        <li> <b>NAACL</b></li>
            <ul><li> Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems) </li></ul>
        <li> <b>EMNLP</b> </li>
            <ul><li> Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems) </li></ul>
        <li> <b>NeurIPS</b> </li>
            <ul><li> Area Chair </li></ul>
        <li> <b>SigDial</b> </li>
            <ul><li> Program Chair </li></ul>
        <li> <b>ICMI</b> </li>
            <ul><li> Program Chair </li></ul>
        <li> <b>NCMMSC</b> </li>
            <ul><li> General Chair, Program Chair </li></ul>
    </ul>

<h4> Reviewer Service </h4>
    <ul>
        <li> <b> Journal </b> </li>
        <ul>
            <li> IEEE/ACM Transactions on Audio, Speech, and Language Processing </li>
            <li> IEEE Transactions on Pattern Analysis and Machine Intelligence </li>
            <li> IEEE Signal Processing Letters </li>
            <li> IEEE Signal Processing Magazine </li>
            <li> Speech Communication </li>
            <li> Computer Speech and Language </li>
            <li> Journal of Computer Science (Chinese) </li>
            <li> Journal of Automation (Chinese) </li>
        </ul>
        <li> <b> Conference </b> </li>
        <ul>
            <li> ICASSP, Interspeech, IEEE ASRU, IEEE SLT, APSIPA, ISCSLP, NCMMSC </li>
            <li> ACL/NAACL/EACL, EMNLP, SigDial </li>
            <li> AAAI, NeurIPS </li> 
        </ul>
        <li> <b> Proposal and Award </b> </li>
        <ul>
            <li> EPSRC, U.K. </li>
            <li> Science and Engineering Research Council, Agency for Science and Technology Research, Singapore </li>
            <li> Israel Science Foundation (ISF), Israel </li>
            <li> Foundation for Polish Science </li>
            <li> Research Grants Council (RGC) of Hong Kong </li> 
            <li> National Natural Science Foundation of China </li>
            <li> Ministry of Science and Technology of China </li>
            <li> Ministry of Industry and Information Technology of China </li>
            <li> Ministry of Education of China </li>
            <li> Chinese Academy of Sciences </li>
        </ul>
    </ul>

<hr>
<h3> Award </h3>

<h4> Best Paper Award </h4>
    <ul>
        <li> EURASIP Speech Communication Best Paper Award </li>
        <li> International Symposium on Chinese Spoken Language Processing Best Paper Award </li>
        <li> ISCA Computer Speech and Language Best Paper Award </li>
        <li> Interspeech Best Paper Award </li>
        <li> IEEE SLT Best Paper Award </li>
        <li> NCMMSC Best Paper Award </li>
    </ul>

<h4> National and Provincial Award </h4>
    <ul>
        <li><i> Leading Talents in Scientific and Technological Innovation </i> by Ministry of Science and Technology of China </li>
        <li><i> Excellent Young Researcher Fund </i> by National Science Foundation of China (NSFC) </li>
        <li><i> Chinese Patent Excellence Award </i> by China National Intellectual Property Administration </li>
        <li><i> Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning </i> by Shanghai Municipal Education Commission </li>
    </ul>

<h4> Professional Society Academic Award </h4>
    <ul>
        <li><i> Bamboo Award </i> by China Computer Federation (CCF) </li>
        <li><i> Distinguished Lecturer of Advanced Disciplines Lectures </i>  by China Computer Federation (CCF) </li>
        <li><i> Second Prize for Scientific and Technological Progress, WuWenJun AI Science and Technology Award </i>  by Chinese Association for Artificial Intelligence (CAAI) </li>
        <li><i> First Prize for Natural Science, WuWenJun AI Science and Technology Award </i> by Chinese Association for Artificial Intelligence (CAAI) </li>
    </ul>

<h4> Other Award </h4>
    <ul>
        <li><i> Scientific Chinese (2016) Person of the Year </i> by Scientific Chinese Magazine </li>
    </ul>

<hr>
</br>
<!--<hr>
<h3>Teaching</h3>
<ul>
    <li><a href=""><i>Natural Language Processing</i></a> for CS&AI undergraduates at SJTU, 2021/2022/2023 Fall</li>
    <li><a href=""><i>Knowledge Representation and Reasoning</i></a> for AI undergraduates at SJTU, 2022/2023/2024 Spring</li>
    </p>
</ul>

<hr>
-->

<div class="content footer">
    Last updated on <font color="DarkRed">2024-12-20</font>.
<!--     Visitor number: <a href="https://www.hitwebcounter.com" target="_blank">
    <img src="https://hitwebcounter.com/counter/counter.php?page=7804457&style=0027&nbdigits=8&type=page&initCount=0" title="Free Counter" Alt="web counter" border="0"/></a>
 -->
</div>


</body>
</font>
</html>