2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

4-8 May 2026, Barcelona, Spain

SP Grand Challenges

Challenges

GC-1: EEG Auditory Attention Decoding (EEG-AAD)

Organized by: Cunhang Fan, Zhao Lv, Jian Zhou, Siqi Cai, Jing Lu, and Jingdong Chen

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=1

Challenge website: https://fchest.github.io/icassp-aad

Title: EEG-AAD 2026: EEG Auditory Attention Decoding Challenge

Short description: Auditory attention decoding (AAD) aims to decode an individual’s focus of attention from neural signals in multi-speaker environments. One of the tasks of AAD is to identify and locate the direction of the attended speaker using electroencephalography (EEG) signals. Despite significant progress, a key limitation of current EEG-AAD studies is their poor generalization to unseen subjects and sessions. In the EEG-AAD challenge, teams will compete to develop generalized models for accurately decoding directional auditory attention. We provide the first multi-modal audio-visual auditory attention dataset simulating real-world audio-visual scenes. The dataset contains approximately 4,400 minutes of EEG signals collected from 40 different subjects during two experimental sessions conducted under both audio-visual and audio-only scenes. Participants in the challenge are presented with two tasks:

Task 1 (cross-subject): Train models on EEG data from a subset of subjects and identify the spatial direction of auditory attention in unseen subjects.
Task 2 (cross-session): Train models on EEG data from one session of a subject and identify the spatial direction of auditory attention in the unseen session of the same subject.

GC-2: Automatic Song Aesthetics Evaluation

Organized by: Ting Dang, Haohe Liu, Hao Liu, Hexin Liu, Lei Xie, Huixin Xue, Wei Xue, Guobin Ma, Hao Shi, Yui Sudo, Jixun Yao, Ruibin Yuan, Jingyao Wu, Wenwu Wang

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=2

Challenge website: https://aslp-lab.github.io/Automatic-Song-Aesthetics-Evaluation-Challenge/

Title: Automatic Song Aesthetics Evaluation Challenge

Short description: Recent advances in generative music models have enabled automatic song creation with impressive quality and diversity, powering applications from virtual artists to movie dubbing. However, evaluating the aesthetic quality of generated songs, capturing factors like emotional expressiveness, musicality, and listener enjoyment, remains a key challenge. Existing metrics often fail to reflect human perception. To address this gap, the Automatic Song Aesthetics Evaluation Challenge invites participants to develop models that predict human ratings of song aesthetics based solely on audio. This competition aims to establish a standardized benchmark for assessing musical aesthetics in song generation, with human-annotated datasets and a focus on listener-centered criteria. By bridging signal processing, affective computing, and machine learning, this challenge seeks to drive progress toward more human-aligned music generation and evaluation.

GC-3: Predicting Lyric Intelligibility (CADENZA)

Organized by: Gerardo Roa Dabike, Michael Akeroyd, Scott Bannister, Jon Barker, Trevor Cox, Bruno Fazenda, Simone Graetzer, Alinka Greasley, Rebecca Vos, and William Whitmer

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=3

Challenge website: https://cadenzachallenge.org/

Title: ICASSP 2026 Cadenza Challenge: Predicting Lyric Intelligibility

Short description: Our challenge is to develop systems that can predict lyric intelligibility from extracts of popular Western music. The system will take stereo audio as input and estimate the word correct rate a listener achieved in a perceptual test. In speech technology, having metrics to automatically evaluate intelligibility has driven improvements in speech enhancement. We want to do the same for music. Recently, foundation models have made blind (intrusive) speech intelligibility metrics much more accurate. But speech metrics are unreliable for music, because spoken and sung language and intonation are different. Also, sung speech is typically embedded in a music accompaniment which has different characteristics than the independent noise background that spoken speech metrics try to account for. The Cadenza project is improving music for people with hearing loss. Hearing loss can make it harder to pick out lyrics No expertise in hearing loss is needed to enter our challenge, however.

GC-4: Speech Analysis for Neurodegenerative Diseases (SAND)

Organized by: Lucia Aruta, Vincenzo Bevilacqua, Nadia Brancati, Ivanoe De Falco, Antonio Di Marino, Raffaele Dubbioso, Maria Frucci, Valentina Virginia Iuzzolino, Daniel Riccio, Giovanna Sannino, Gianmaria Senerchia, Myriam Spisto, and Laura Verde.

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=4

Challenge website:https://www.sand.icar.cnr.it/

Title: Speech Analysis for Neurodegenerative Diseases (SAND)

Short description: This challenge stems from the need to analyse noninvasive, objective, and scalable biomarkers, such as speech signals, for early diagnosis and longitudinal monitoring of patients suffering from neurodegenerative diseases. This is because diseases such as Amyotrophic Lateral Sclerosis (ALS), present complex diagnostic challenges due to heterogeneous symptom profiles and overlapping clinical features. Current diagnostic tools are largely based on subjective clinical scales and often fail to detect early changes, resulting in delayed intervention and suboptimal care for patients. This underscores the urgent need to use noninvasive biomarkers.

We have designed two tasks to address the major challenges faced in the monitoring of neurodegenerative diseases through voice analysis. We warmly welcome researchers from academia and industry to participate and jointly explore reliable solutions for these challenging scenarios. Participants can contribute to one or both tasks proposed here:

TASK 1: multi-class classification at time 0 (first assessment)
TASK 2: multi-class prediction (latest assessment)

Detailed information and all guidelines are available on the challenge’s official website.

GC-5: Environmental Sound Deepfake Detection (ESDD)

Organized by: Han Yin (School of Electrical Engineering, KAIST, Daejeon, Republic of Korea), Yang Xiao (University of Melbourne, Australia; Fortemedia Singapore, Singapore), Rohan Kumar Das (Fortemedia Singapore, Singapore), Jisheng Bai (Xi’an University of Posts & Telecommunications, Xi’an, China), and Ting Dang (University of Melbourne, Australia)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=5

Challenge website: https://sites.google.com/view/esdd-challenge

Title: Environmental Sound Deepfake Detection Challenge

Short description: Recent advances in audio generation systems have enabled the creation of highly realistic and immersive soundscapes, which are increasingly used in film and virtual reality. However, these audio generators also raise concerns about potential misuse, such as generating deceptive audio content for fake videos and spreading misleading information. Existing datasets for environmental sound deepfake detection (ESDD) are limited in scale and audio types. To address this gap, we have proposed EnvSDD, the first large-scale curated dataset designed for ESDD, consisting of 45.25 hours of real and 316.7 hours of fake sound. Based on EnvSDD, we are launching the Environmental Sound Deepfake Detection Challenge.

To deal with the key challenges encountered in real-life scenarios, we have designed two different tracks: ESDD in Unseen Generators (track 1) and Black-Box Low-Resource ESDD (track 2). Track 1 aims to explore the generalizability to unseen Text-to-Audio (TTA) and Audio-to-Audio (ATA) generators. Track 2 presents a more challenging scenario, simulating real-world deepfake detection under extreme uncertainty and limited data.

GC-6: Face-Voice Association in Multilingual Environments (FAME)

Organized by: Marta Moscati, Ahmed Abdullah, Muhammad Saad Saeed, Shah Nawaz, Rohan Kumar Das, Muhammad Zaigham Zaheer, Junaid Mir, Muhammad Haroon Yousaf, Khalid Malik, Markus Schedl

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=6

Challenge website: https://mavceleb.github.io/dataset/competition.html

Title: Face-voice Association in Multilingual Environments (FAME)

Short description: The face and voice of a person have unique characteristics and they are well used as biometric measures for person authentication either as a unimodal or multimodal. A strong correlation has been found between face and voice of a person, which has attracted significant research interest. Though previous works have established association between faces and voices, none of these approaches investigated the effect of multiple languages on this task. As half of the population of world is bilingual and we are more often communicating in multilingual scenarios, therefore, it is essential to investigate the effect of language for associating faces with the voices. Thus, the goal of the Face-voice Association in Multilingual Environments (FAME) 2026 challenge is to analyze the impact of multiple languages on face-voice association task. For more information on challenge please see evaluation plan.

GC-7: Reconstructing Hyperspectral Cubes of Everyday Objects from Low-Cost Inputs (Hyper-Object)

Organized by: Pai Chet Ng, Konstantinos N. Plataniotis, Juwei Lu, Gabriel Lee Jun Rong, Malcolm Low, Nikolaos Boulgouris, Thirimachos Bouriai, Seyed Mohammad Sheikholeslami

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=7

Challenge website: https://hyper-object.github.io/

Title: 2026 ICASSP Hyper-Object Challenge – Hyperspectral Reconstruction of Everyday Objects from Low-Cost Inputs

Short description: The Hyper-Object Challenge is the second edition of our ICASSP Grand Challenge series, continuing from the Hyper-Skin Challenge. This second edition broadens the scope from skin imaging to a diverse range of everyday objects, such as fruits, fabrics, books, and tools, captured under realistic acquisition constraints. The challenge focuses on the reconstruction of high-fidelity hyperspectral image cubes spanning 400–1000 nm from cost-effective input modalities, including simulated mosaic images and low-resolution RGB data. It features two tracks: (1) spectral reconstruction from mosaic data, and (2) joint spatial–spectral super-resolution from low-resolution RGB. By addressing the technical and practical challenges of low-cost hyperspectral reconstruction, the Hyper-Object Challenge aims to advance methods that make hyperspectral imaging more affordable and accessible.

GC-8: Multimodal Learning for 6G Wireless Communications (CONVERGE)

Organized by: Jichao Chen (EURECOM), Filipe B. Teixeira (INESC TEC and FEUP), Francisco M. Ribeiro (INESC TEC and FEUP), Ahmed Alkhateeb (Arizona State University), Luis M. Pessoa (INESC TEC and FEUP), and Dirk Slock (EURECOM)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=8

Challenge website: https://converge-project.eu/converge-icassp-2026-sp-grand-challenge/

Title: CONVERGE Challenge: Multimodal Learning for 6G Wireless Communications

Short description: High-frequency mmWave communication enables ultra-high data rates and low latency but faces considerable challenges due to severe path loss, especially in non-line-of-sight (NLoS) scenarios. Augmenting radios with visual sensing has recently proven effective, as cameras provide rich environmental context that helps predict obstructions and guide proactive network actions. In this CONVERGE Challenge, we invite participants to develop machine learning models that integrate visual and radio data to address key communication tasks in high-frequency wireless systems. The challenge consists of four independent tracks—blockage prediction, UE localization and position prediction, channel prediction, and beam prediction—based on a rich, real-world multimodal dataset collected in a controlled indoor mmWave testbed. This challenge offers an opportunity to benchmark cross-modal learning approaches and promotes interdisciplinary collaboration among the wireless communications, signal processing, computer vision, and AI communities.

GC-9: Human-Like Spoken Dialogue Systems (HumDial)

Organized by: Eng Siong Chng, Xuelong Geng, Guangzhi Sun, Hongfei Xue, Lei Xie, Xixin Wu, Shuai Wang, Shuiyuan Wang, Xinsheng Wang, Longshuai Xiao, Zhixian Zhao, Chao Zhang, Zihan Zhang

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=9

Challenge website: https://aslp-lab.github.io/HumDial-Challenge/

Title: ICASSP2026 Human-like Spoken Dialogue Systems Challenge (HumDial Challenge)

Short description: Recent breakthroughs in large foundation models and speech technology have propelled spoken dialogue systems toward more natural and expressive interactions. However, evaluating the true “human-likeness” of these systems remains an open challenge, as existing benchmarks often fall short in capturing emotional intelligence and real-time conversational dynamics. The 2026 HumDial Challenge (Human-like Spoken Dialogue Systems Challenge) addresses this critical gap by introducing two focused tracks: Emotional Intelligence and Full-Duplex Interaction. Participants will tackle rich, multi-turn dialogues that demand nuanced emotional reasoning, dynamic empathy, and real-time coordination. With comprehensive evaluation frameworks and human-annotated real-recording datasets, the challenge aims to establish a new standard for assessing human-like dialogue capabilities, driving the next generation of emotionally aware, fluidly interactive AI agents.

GC-10: Inaugural Music Source Restoration (MSR)

Organized by: Yongyi Zang, Jiarui Hai, Wanying Ge, Helin Wang, Zheqi Dai, Yuki Mitsufuji, Qiuqiang Kong, Mark Plumbley

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=10

Challenge website: https://msrchallenge.com/

Title: The Inaugural Music Source Restoration Challenge

Short description: As music consumption and production evolve in the digital age, the ability to extract and restore individual instrument stems from mixed recordings has become increasingly vital. Traditional music source separation (MSS) systems operate under the limiting assumption that mixtures are simple linear combinations of sources, failing to address the complex signal processing chain of professional audio production. Therefore, we introduce the Music Source Restoration (MSR) Challenge, targeting the recovery of original, unprocessed instrument signals from fully mixed and mastered audio. Unlike conventional separation approaches, MSR requires generative solutions capable of reversing various audio transformations including equalization, compression, reverberation, and transmission degradations. This challenge addresses critical industry needs such as stem-level reproduction for remixing, restoration of degraded historical recordings, and enhancement of live performances affected by venue acoustics. We provide participants with access to extensive open-source datasets and a baseline synthetic mixture generation pipeline, encouraging innovative approaches to data augmentation. The challenge features two evaluation settings: non-blind evaluation using professionally mixed clips with ground-truth stems, and blind evaluation of real-world degraded recordings from historical archives, live performances, FM broadcasts, and lossy streaming. Assessment combines objective metrics with subjective ratings from professional audio engineers across eight target instruments. Our challenge aims to bridge academic research with industry applications, fostering the development of restoration technologies that advance both theoretical understanding and practical implementation in professional audio production.

GC-11: Radar Acoustic Speech Enhancement (RASE)

Organized by: Andy W. H. Khong, Patrick A. Naylor, Zhi-Wei Tan, V. G. Reju, Ritesh C. Tewari, and Ruotong Ding

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=11

Challenge website: https://rase-challenge.github.io/RASE2026-Challenge/

Title: RASE 2026: Radar Acoustic Speech Enhancement

Short description: RASE is an ICASSP 2026 Grand Challenge on the recovery of clear, full-bandwidth speech from noisy, band-limited signals acquired by a frequency-modulated continuous-wave (FMCW) mmWave radar that is based on the radar vibrometry principle. This speech acquisition approach is particularly effective where microphone technology suffer from signal degradation due to acoustic noise or distance. A key advantage of the FMCW radar vibrometry is range-gating, where the radar isolates reflections based on distance to capture speech and suppress interferers selectively. Unlike microphones, radar can also acquire speech through barriers. The organizers will provide a curated dataset and baseline code, allowing the broader speech community to participate without any prior radar expertise.

The challenge features two difficulty levels: (1) direct sensing from a loudspeaker diaphragm (controlled, strong vibrations), and (2) indirect sensing from a nearby secondary surface (aluminium foil). In both cases, the source is inside a glass-walled room while the radar is deployed outside the enclosure. Participants will receive paired .wav files, radar-derived signals and corresponding clean references, for training and development of neural network model; the test set will be released later. To lower the barrier to entry, raw radar signals will not be shared. Submissions are ranked using well-established enhancement and intelligibility metrics (ESTOI, normalized PESQ, normalized DNSMOS, and MFCC cosine similarity), aggregated across difficulty levels. To promote reproducibility and fairness, participating teams must train the neural network models on the provided data from scratch (no external data or pre-trained models) and submit code and environment details. Top-scoring teams will be invited to present their methods at ICASSP 2026.

GC-12: x-to-audio alignment (XACLE)

Organized by: Yuki Okamoto, Shinnosuke Takamichi, Keisuke Imoto, Noriyuki Tonami, Ryotaro Nagase, Riki Takizawa, Yusuke Kanamori, and Minoru Kishi

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=12

Challenge website: https://xacle.org/

Title: The first x-to-audio alignment challenge (XACLE Challenge)

Short description: The scope of this challenge is to predict the semantic alignment of a given general audio and text pair. Research on generating general audio, not limited to speech and music, from various inputs such as text and video (x-to-audio generation) is actively being pursued. In a generative model, evaluating alignment between input and output is extremely important. For instance, in the evaluation of text-to-audio generation (TTA), methods have been proposed to evaluate the alignment between audio and text objectively. However, it has been pointed out that these methods often have a low correlation with human subjective evaluations. In this challenge, our goal is to build a model that automatically predicts the semantic alignment between audio and text for TTA evaluation. The aim is to create objective evaluations that correlate highly with human subjective evaluations.

GC-13: Universality, Robustness, and Generalizability for EnhancemeNT (URGENT)

Organized by: Chenda Li (Shanghai Jiao Tong University, China), Wei Wang (Shanghai Jiao Tong University, China), Wangyou Zhang (Shanghai Jiao Tong University, China), Kohei Saijo (Waseda University, Japan), Robin Scheibler (Google Deepmind, Japan), Samuele Cornell (Carnegie Mellon University, USA), Zhaoheng Ni (Meta, USA), Anurag Kumar (Google Deepmind, USA), Marvin Sach (Technische Universität Braunschweig, Germany), Yihui Fu (Technische Universität Braunschweig, Germany), Tim Fingscheidt (Technische Universität Braunschweig, Germany), Shinji Watanabe (Carnegie Mellon University, USA), Yanmin Qian (Shanghai Jiao Tong University, China)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=13

Challenge website: https://urgent-challenge.github.io/urgent2026/

Title: URGENT challenge (Universal, Robust and Generalizable speech EnhancemeNT)

Short description: The ICASSP 2026 URGENT Challenge, building upon two successful previous editions, returns for a third iteration to advance cutting-edge research in universal speech enhancement. This year, the challenge emphasizes advanced data curation pipelines and introduces substantially more diverse speech conditions, including emotional, accented, and whispered speech, to better promote robustness and generalization across real-world scenarios. Additionally, it features the first-ever large-scale Speech Quality Assessment track, specifically tailored to speech enhancement, which opens new avenues for reliable perceptual evaluation and generalizability.

GC-14: In-the-wild generation and detection of spoofed speech (WildSpoof)

Organized by: Yihan Wu (Renmin University of China), Jee-weon Jung (Apple / CMU), Hye-jin Shim (CMU), Xin Wang (NII Japan), Xin Cheng (Renmin University)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=14

Challenge website: https://wildspoof.github.io/

Title: WildSpoof – In-the-wild generation and detection of spoofed speech

Short description: The WildSpoof Challenge aims to advance the use of in-the-wild data in two speech processing tasks that generates and detects spoofed speech. We invite you to participate in the WildSpoof Challenge, designed to advance the use of in-the-wild data in two critical and increasingly intertwined speech processing tasks:

Text-to-Speech Generation (TTS)
Spoofing-aware Automatic Speaker Verification (SASV)

The WildSpoof Challenge promotes research that bridges the gap between speech generation and spoofing detection, fostering interdisciplinary innovation towards more robust, realistic, and integrated speech systems. Specifically, we set the following objectives:

Advance the use of in-the-wild data in two closely related but underexplored tasks: TTS and SASV, moving beyond conventional clean and controlled datasets.
Foster interdisciplinary collaboration between spoofing generation (TTS) and detection (SASV) sides, encouraging the development of more integrated, robust, and realistic systems.