2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

4-8 May 2026, Barcelona, Spain

SP Grand Challenges

Challenges

GC-1: EEG Auditory Attention Decoding (EEG-AAD)

Organized by: Cunhang Fan, Zhao Lv, Jian Zhou, Siqi Cai, Jing Lu, and Jingdong Chen

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=1

Challenge website: https://fchest.github.io/icassp-aad

Title: EEG-AAD 2026: EEG Auditory Attention Decoding Challenge

Short description: Auditory attention decoding (AAD) aims to decode an individual’s focus of attention from neural signals in multi-speaker environments. One of the tasks of AAD is to identify and locate the direction of the attended speaker using electroencephalography (EEG) signals. Despite significant progress, a key limitation of current EEG-AAD studies is their poor generalization to unseen subjects and sessions. In the EEG-AAD challenge, teams will compete to develop generalized models for accurately decoding directional auditory attention. We provide the first multi-modal audio-visual auditory attention dataset simulating real-world audio-visual scenes. The dataset contains approximately 4,400 minutes of EEG signals collected from 40 different subjects during two experimental sessions conducted under both audio-visual and audio-only scenes. Participants in the challenge are presented with two tasks:

Task 1 (cross-subject): Train models on EEG data from a subset of subjects and identify the spatial direction of auditory attention in unseen subjects.
Task 2 (cross-session): Train models on EEG data from one session of a subject and identify the spatial direction of auditory attention in the unseen session of the same subject.

GC-2: Automatic Song Aesthetics Evaluation

Organized by: Ting Dang, Haohe Liu, Hao Liu, Hexin Liu, Lei Xie, Huixin Xue, Wei Xue, Guobin Ma, Hao Shi, Yui Sudo, Jixun Yao, Ruibin Yuan, Jingyao Wu, Wenwu Wang

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=2

Challenge website: https://aslp-lab.github.io/Automatic-Song-Aesthetics-Evaluation-Challenge/

Title: Automatic Song Aesthetics Evaluation Challenge

Short description: Recent advances in generative music models have enabled automatic song creation with impressive quality and diversity, powering applications from virtual artists to movie dubbing. However, evaluating the aesthetic quality of generated songs, capturing factors like emotional expressiveness, musicality, and listener enjoyment, remains a key challenge. Existing metrics often fail to reflect human perception. To address this gap, the Automatic Song Aesthetics Evaluation Challenge invites participants to develop models that predict human ratings of song aesthetics based solely on audio. This competition aims to establish a standardized benchmark for assessing musical aesthetics in song generation, with human-annotated datasets and a focus on listener-centered criteria. By bridging signal processing, affective computing, and machine learning, this challenge seeks to drive progress toward more human-aligned music generation and evaluation.

GC-3: Predicting Lyric Intelligibility (CADENZA)

Organized by:

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=3

Challenge website:

Title:

Short description:

GC-4: Speech Analysis for Neurodegenerative Diseases (SAND)

Organized by: Lucia Aruta, Vincenzo Bevilacqua, Nadia Brancati, Ivanoe De Falco, Antonio Di Marino, Raffaele Dubbioso, Maria Frucci, Valentina Virginia Iuzzolino, Daniel Riccio, Giovanna Sannino, Gianmaria Senerchia, Myriam Spisto, and Laura Verde.

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=4

Challenge website:https://www.sand.icar.cnr.it/

Title: Speech Analysis for Neurodegenerative Diseases (SAND)

Short description: This challenge stems from the need to analyse noninvasive, objective, and scalable biomarkers, such as speech signals, for early diagnosis and longitudinal monitoring of patients suffering from neurodegenerative diseases. This is because diseases such as Amyotrophic Lateral Sclerosis (ALS), present complex diagnostic challenges due to heterogeneous symptom profiles and overlapping clinical features. Current diagnostic tools are largely based on subjective clinical scales and often fail to detect early changes, resulting in delayed intervention and suboptimal care for patients. This underscores the urgent need to use noninvasive biomarkers.

We have designed two tasks to address the major challenges faced in the monitoring of neurodegenerative diseases through voice analysis. We warmly welcome researchers from academia and industry to participate and jointly explore reliable solutions for these challenging scenarios. Participants can contribute to one or both tasks proposed here:

TASK 1: multi-class classification at time 0 (first assessment)
TASK 2: multi-class prediction (latest assessment)

Detailed information and all guidelines are available on the challenge’s official website.

GC-5: Environmental Sound Deepfake Detection (ESDD)

Organized by: Han Yin (School of Electrical Engineering, KAIST, Daejeon, Republic of Korea), Yang Xiao (University of Melbourne, Australia; Fortemedia Singapore, Singapore), Rohan Kumar Das (Fortemedia Singapore, Singapore), Jisheng Bai (Xi’an University of Posts & Telecommunications, Xi’an, China), and Ting Dang (University of Melbourne, Australia)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=5

Challenge website: https://sites.google.com/view/esdd-challenge

Title: Environmental Sound Deepfake Detection Challenge

Short description: Recent advances in audio generation systems have enabled the creation of highly realistic and immersive soundscapes, which are increasingly used in film and virtual reality. However, these audio generators also raise concerns about potential misuse, such as generating deceptive audio content for fake videos and spreading misleading information. Existing datasets for environmental sound deepfake detection (ESDD) are limited in scale and audio types. To address this gap, we have proposed EnvSDD, the first large-scale curated dataset designed for ESDD, consisting of 45.25 hours of real and 316.7 hours of fake sound. Based on EnvSDD, we are launching the Environmental Sound Deepfake Detection Challenge.

To deal with the key challenges encountered in real-life scenarios, we have designed two different tracks: ESDD in Unseen Generators (track 1) and Black-Box Low-Resource ESDD (track 2). Track 1 aims to explore the generalizability to unseen Text-to-Audio (TTA) and Audio-to-Audio (ATA) generators. Track 2 presents a more challenging scenario, simulating real-world deepfake detection under extreme uncertainty and limited data.

GC-6: Face-Voice Association in Multilingual Environments (FAME)

Organized by:

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=6

Challenge website:

Title:

Short description:

GC-7: Reconstructing Hyperspectral Cubes of Everyday Objects from Low-Cost Inputs (Hyper-Object)

Organized by: Pai Chet Ng, Konstantinos N. Plataniotis, Juwei Lu, Gabriel Lee Jun Rong, Malcolm Low, Nikolaos Boulgouris, Thirimachos Bouriai, Seyed Mohammad Sheikholeslami

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=7

Challenge website: https://hyper-object.github.io/

Title: 2026 ICASSP Hyper-Object Challenge – Hyperspectral Reconstruction of Everyday Objects from Low-Cost Inputs

Short description: The Hyper-Object Challenge is the second edition of our ICASSP Grand Challenge series, continuing from the Hyper-Skin Challenge. This second edition broadens the scope from skin imaging to a diverse range of everyday objects, such as fruits, fabrics, books, and tools, captured under realistic acquisition constraints. The challenge focuses on the reconstruction of high-fidelity hyperspectral image cubes spanning 400–1000 nm from cost-effective input modalities, including simulated mosaic images and low-resolution RGB data. It features two tracks: (1) spectral reconstruction from mosaic data, and (2) joint spatial–spectral super-resolution from low-resolution RGB. By addressing the technical and practical challenges of low-cost hyperspectral reconstruction, the Hyper-Object Challenge aims to advance methods that make hyperspectral imaging more affordable and accessible.

GC-8: Multimodal Learning for 6G Wireless Communications (CONVERGE)

Organized by: Jichao Chen (EURECOM), Filipe B. Teixeira (INESC TEC and FEUP), Francisco M. Ribeiro (INESC TEC and FEUP), Ahmed Alkhateeb (Arizona State University), Luis M. Pessoa (INESC TEC and FEUP), and Dirk Slock (EURECOM)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=8

Challenge website: https://converge-project.eu/converge-icassp-2026-sp-grand-challenge/

Title: CONVERGE Challenge: Multimodal Learning for 6G Wireless Communications

Short description: High-frequency mmWave communication enables ultra-high data rates and low latency but faces considerable challenges due to severe path loss, especially in non-line-of-sight (NLoS) scenarios. Augmenting radios with visual sensing has recently proven effective, as cameras provide rich environmental context that helps predict obstructions and guide proactive network actions. In this CONVERGE Challenge, we invite participants to develop machine learning models that integrate visual and radio data to address key communication tasks in high-frequency wireless systems. The challenge consists of four independent tracks—blockage prediction, UE localization and position prediction, channel prediction, and beam prediction—based on a rich, real-world multimodal dataset collected in a controlled indoor mmWave testbed. This challenge offers an opportunity to benchmark cross-modal learning approaches and promotes interdisciplinary collaboration among the wireless communications, signal processing, computer vision, and AI communities.

GC-9: Human-Like Spoken Dialogue Systems (HumDial)

Organized by: Eng Siong Chng, Xuelong Geng, Guangzhi Sun, Hongfei Xue, Lei Xie, Xixin Wu, Shuai Wang, Shuiyuan Wang, Xinsheng Wang, Longshuai Xiao, Zhixian Zhao, Chao Zhang, Zihan Zhang

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=9

Challenge website: https://aslp-lab.github.io/HumDial-Challenge/

Title: ICASSP2026 Human-like Spoken Dialogue Systems Challenge (HumDial Challenge)

Short description: Recent breakthroughs in large foundation models and speech technology have propelled spoken dialogue systems toward more natural and expressive interactions. However, evaluating the true “human-likeness” of these systems remains an open challenge, as existing benchmarks often fall short in capturing emotional intelligence and real-time conversational dynamics. The 2026 HumDial Challenge (Human-like Spoken Dialogue Systems Challenge) addresses this critical gap by introducing two focused tracks: Emotional Intelligence and Full-Duplex Interaction. Participants will tackle rich, multi-turn dialogues that demand nuanced emotional reasoning, dynamic empathy, and real-time coordination. With comprehensive evaluation frameworks and human-annotated real-recording datasets, the challenge aims to establish a new standard for assessing human-like dialogue capabilities, driving the next generation of emotionally aware, fluidly interactive AI agents.

GC-10: Inaugural Music Source Restoration (MSR)

Organized by: Yongyi Zang, Jiarui Hai, Wanying Ge, Helin Wang, Zheqi Dai, Yuki Mitsufuji, Qiuqiang Kong, Mark Plumbley

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=10

Challenge website: https://msrchallenge.com/

Title: The Inaugural Music Source Restoration Challenge

Short description: As music consumption and production evolve in the digital age, the ability to extract and restore individual instrument stems from mixed recordings has become increasingly vital. Traditional music source separation (MSS) systems operate under the limiting assumption that mixtures are simple linear combinations of sources, failing to address the complex signal processing chain of professional audio production. Therefore, we introduce the Music Source Restoration (MSR) Challenge, targeting the recovery of original, unprocessed instrument signals from fully mixed and mastered audio. Unlike conventional separation approaches, MSR requires generative solutions capable of reversing various audio transformations including equalization, compression, reverberation, and transmission degradations. This challenge addresses critical industry needs such as stem-level reproduction for remixing, restoration of degraded historical recordings, and enhancement of live performances affected by venue acoustics. We provide participants with access to extensive open-source datasets and a baseline synthetic mixture generation pipeline, encouraging innovative approaches to data augmentation. The challenge features two evaluation settings: non-blind evaluation using professionally mixed clips with ground-truth stems, and blind evaluation of real-world degraded recordings from historical archives, live performances, FM broadcasts, and lossy streaming. Assessment combines objective metrics with subjective ratings from professional audio engineers across eight target instruments. Our challenge aims to bridge academic research with industry applications, fostering the development of restoration technologies that advance both theoretical understanding and practical implementation in professional audio production.

GC-11: Radar Acoustic Speech Enhancement (RASE)

Organized by:

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=11

Challenge website:

Title:

Short description:

GC-12: x-to-audio alignment (XACLE)

Organized by: Yuki Okamoto, Shinnosuke Takamichi, Keisuke Imoto, Noriyuki Tonami, Ryotaro Nagase, Riki Takizawa, Yusuke Kanamori, and Minoru Kishi

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=12

Challenge website: https://xacle.org/

Title: The first x-to-audio alignment challenge (XACLE Challenge)

Short description: The scope of this challenge is to predict the semantic alignment of a given general audio and text pair. Research on generating general audio, not limited to speech and music, from various inputs such as text and video (x-to-audio generation) is actively being pursued. In a generative model, evaluating alignment between input and output is extremely important. For instance, in the evaluation of text-to-audio generation (TTA), methods have been proposed to evaluate the alignment between audio and text objectively. However, it has been pointed out that these methods often have a low correlation with human subjective evaluations. In this challenge, our goal is to build a model that automatically predicts the semantic alignment between audio and text for TTA evaluation. The aim is to create objective evaluations that correlate highly with human subjective evaluations.

GC-13: Universality, Robustness, and Generalizability for EnhancemeNT (URGENT)

Organized by:

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=13

Challenge website:

Title:

Short description:

GC-14: In-the-wild generation and detection of spoofed speech (WildSpoof)

Organized by: Yihan Wu (Renmin University of China), Jee-weon Jung (Apple / CMU), Hye-jin Shim (CMU), Xin Wang (NII Japan), Xin Cheng (Renmin University)

Submission Link: https://cmsworkshops.com/ICASSP2026/Papers/Submission.asp?Type=Challenge&ID=14

Challenge website: https://wildspoof.github.io/

Title: WildSpoof – In-the-wild generation and detection of spoofed speech

Short description: The WildSpoof Challenge aims to advance the use of in-the-wild data in two speech processing tasks that generates and detects spoofed speech. We invite you to participate in the WildSpoof Challenge, designed to advance the use of in-the-wild data in two critical and increasingly intertwined speech processing tasks:

Text-to-Speech Generation (TTS)
Spoofing-aware Automatic Speaker Verification (SASV)

The WildSpoof Challenge promotes research that bridges the gap between speech generation and spoofing detection, fostering interdisciplinary innovation towards more robust, realistic, and integrated speech systems. Specifically, we set the following objectives:

Advance the use of in-the-wild data in two closely related but underexplored tasks: TTS and SASV, moving beyond conventional clean and controlled datasets.
Foster interdisciplinary collaboration between spoofing generation (TTS) and detection (SASV) sides, encouraging the development of more integrated, robust, and realistic systems.

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

Top Reading

Most Upvoted

Top Reading

Most Upvoted

SP Grand Challenges

Challenges

GC-1: EEG Auditory Attention Decoding (EEG-AAD)

GC-2: Automatic Song Aesthetics Evaluation

GC-3: Predicting Lyric Intelligibility (CADENZA)

GC-4: Speech Analysis for Neurodegenerative Diseases (SAND)

GC-5: Environmental Sound Deepfake Detection (ESDD)

GC-6: Face-Voice Association in Multilingual Environments (FAME)

GC-7: Reconstructing Hyperspectral Cubes of Everyday Objects from Low-Cost Inputs (Hyper-Object)

GC-8: Multimodal Learning for 6G Wireless Communications (CONVERGE)

GC-9: Human-Like Spoken Dialogue Systems (HumDial)

GC-10: Inaugural Music Source Restoration (MSR)

GC-11: Radar Acoustic Speech Enhancement (RASE)

GC-12: x-to-audio alignment (XACLE)

GC-13: Universality, Robustness, and Generalizability for EnhancemeNT (URGENT)

GC-14: In-the-wild generation and detection of spoofed speech (WildSpoof)