T1: Artificial Intelligence in Music (AI-Music)

Date and time: November 18, 2022

Organizers and Presenters

Cong Jin
Communication University of China, China


Zechao Li
Nanjing University of Science and Technology, China


Tutorial Summary

As the concept of artificial intelligence is becoming more and more popular as a national strategy, the term AI-Music has appeared in the music field. Artificial intelligence (AI) technoloy is currently the main focus of algorithmic research in the domestic music field, and there is a trend of its cross-integration in the fields of music creation, music education and music protection. Looking back at the development of AI-Music, AI first originated in algorithmic composition, and now deep learning composition has become mainstream. A series of AI technologies such as voice synthesis, audio-video and image recognition, and interactive technologies, have driven innovations in music creation modes, personalized customization of streaming media, and audio device production.

In addition, there are many areas worth paying attention to in the field of music artificial intelligence. For example, meta-learning, self-supervised learning, and distillation learning. In order to better predict and help achieve practical application of AI-Music, we need to extend the existing methods and update the current technology.

At the same time, AI-Music also faces some challenges. For instance, the beauty of music often comes from a balance. It is not only about the balance of rhythm, accompaniment, human voices, and instrument sound, but also the balance of various emotions. Therefore, it may be necessary to consider more about the balance in research, instead of focusing only on the superior prediction like most contemporary research. We hope to integrate musical understanding into artificial intelligence to expand our research ideas.

Topics to be Covered

The purpose of this workshop is to bring together academic researchers, engineers and individuals working in these fields to explore the future development scope and possibilities of AI music. We welcome all original research related to one or any aspects of artificial intelligence for music of all types.

Intended Audience

Students, researchers, and developers interested in AI-Music, with a background in computer, AI, art, communication, networking, or computing.

Session Presenters

Xi Shao, Nanjing university of posts and telecommunications


Presentation Topic: Automatic Generation of Family Album Based on Multimodality Fusion

Xi Shao received the B.Sc. and M.Sc. degree in Computer Science from the Nanjing University of Posts and Telecommunications, Nanjing, China in 1999 and 2002, respectively, and got his Ph.D degree in Computer Science from the School of Computing, National University of Singapore, Singapore in 2007. He is currently a professor in the School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China. His research interests are in multidisciplinary-inspired researches on Multimedia System, specifically including audio/music content analysis, content based multimedia retrieval, multimedia personalization and recommendations, cross-media generation etc. He published more than 40 Journals and conference papers in the area and as the principle investor for two NSF projects.

Jing Wang, Beijing Institute of Technology


Presentation Topic: Binaural Audio Processing with AI techniques for Virtual Reality Applications

Jing Wang received the Ph.D. degree in communication and information system from Beijing Institute of Technology (BIT) in China in 2007. She works as an Associate Professor and a Doctoral Supervisor at the Research Institute of Communication Technology (RICT), School of Information and Electronics, Beijing Institute of Technology. She used to be a Visiting Scholar in the Chinese University of Hong Kong (CUHK) and the Ohio State University (OSU). Her current research is speech and audio signal processing, multimedia communication and virtual reality. She has authored or coauthored over 60 papers in SCI/EI-indexed publications and obtained more than 10 authorized invention patents. Prof. Wang is currently IEEE Senior Member, AES (Audio Engineering Society) Member, CIC (China Institute of Communications) Senior Member, CIE (Chinese Institute of Electronics) Senior Member, CCF (China Computer Federation) Member, CAAI (Chinese Association for Artificial Intelligence) Member and an expert member of Digital Audio and Video Coding Standard Workgroup (AVS) in China. She also actively participates in ITU/MPEG/3GPP international standard organizations and domestic standard groups in the field of multimedia quality evaluation. She is the first drafter of one electronic industry standard, and presides over the research on VR audio national standard.
Email: wangjing@bit.edu.cn

Shengchen Li, Xi’an Jiaotong-Liverpool University


Presentation Topic: From the Identification of Human and Machine Composition to the Modelling of Music Melodies

Graduated from Queen Mary University of London, Shengchen Li started his research related to auditory intelligence with a research on computational musicology and obtained his PhD degree in the year of 2016. He then served as a lecturer in Beijing University of Posts and Telecommunications with his research interests extended to acoustic signal processing. Being named among the top-tanked teams between 2018-2021 in DCASE (Detection and Classification of Acoustic Scenes and Events) data challenge, Shengchen has also worked actively as a committee member in task force on speech dialogue and auditory processing, CCF with a few domestic fellowships and a NSFC project offered. He is currently an assistant professor in Xi’an Jiaotong-Liverpool University and serves as the research group leader of machine learning and data analytics in School of Advanced Technology.

Cong Jin, Communication University of China


Cong Jing is with the School of Information and Communication Engineering, Communication University of China, as an Associate Professor. She received the B.E. and MA.Sc in Communication and Information System in 2010 and 2013 from Communication University of China, respectively, and Ph.D. degree in Communication and Information System from Communication University of China, Beijing, P.R.China. Her research interests focus on reinforcement learning, music AI and audio digital twins. She is presiding over and undertaking the youth, general and key projects of the National Natural Science Foundation, Xiaomi joint fund of Beijing Natural Science Foundation, and National Key Research and Development Projects. She has published a total of more than 30 academic papers in IEEE, Springer and other international journals and conferences, including more than 10 SCI retrieved journals. She has served as AE or reviewer for several leading journals and the session chair or PC Member for several major international conferences.