2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

4-8 May 2026, Barcelona, Spain

Tutorials

All tutorials will be three hours in duration, excluding a 30-minute break, and will be held on Monday, May 4, prior to the main technical program. Tutorials require separate registration and are not included in the main conference registration.

Select

9:00 am 12:30 pm

Xie Chen, Kai Yu, Wei-Qiang Zhang, Min Ma, Yu Zhang

T1: Multilingual Speech Recognition and Synthesis

9:00 am 12:30 pm

Saurabh Sihag and Gonzalo Mateos

T2: Learning with Covariance Matrices: Foundations and Applications to Network Neuroscience

9:00 am 12:30 pm

Henk Wymeersch and Nuria González-Prelcic

T3: Distributed Integrated Sensing and Communication (DISAC): Foundations, Architectures, and Signal Processing Enablers

9:00 am 12:30 pm

Elvin Isufi, Samuel Rey, Bishwadeep Das and Geert Leus

T4: Signal Processing on and for Dynamic Graphs

9:00 am 12:30 pm

Osvaldo Simeone, Bipin Rajendran and Tianyi Chen

T5: Advances in Neuromorphic Computing: Models, Hardware, and Optimization

9:00 am 12:30 pm

Qu Qing, Yuxin Chen, Yuting Wei and Liyue Shen

T6: Harnessing Low Dimensionality in Diffusion Generative Modeling: From Theory to Practice

9:00 am 12:30 pm

Ali Tajer, Karthikeyan Shanmugam, Burak Varici and Emre Acartürk

T7: Causal Representation Learning

2:30 pm 6:30 pm

Bingcong Li, Georgios Giannakis and Yilang Zhang

T8: Low-Rank Adaptation Redux in Large Models

2:30 pm 6:00 pm

Sharon Gannot and Tal Rosewein

T9: Speech Enhancement in the Hearables Era: From Basics to Advanced Neural Methods

2:30 pm 6:00 pm

Stanley Chan

T10: Diffusion Models for Imaging and Vision

2:30 pm 6:00 pm

Matteo Nerini, Bruno Clerckx and Carlo Fischione

T11: Analog Computing for Signal Processing and Communications

2:30 pm 6:00 pm

Andrea Poltronieri, Xavier Serra, Dmitry Bogdanov, Martín Rocamora and Pablo Alonso-Jiménez

T13: Current Approaches to Computational Analysis of Music Audio Signals

2:30 pm 6:00 pm

Samuel Pinilla, Kumar Vijay Mishra, Brian Sadler

T14: Invex Optimization: Theory and Applications for Signal/Image Processing and Machine Learning

T1: Multilingual Speech Recognition and Synthesis

This tutorial provides a comprehensive overview of multilingual speech recognition (ASR) and speech synthesis (TTS), focusing on recent advances that enable scalable, language-agnostic speech technologies. It begins by motivating the need for multilingual systems to support global accessibility and linguistic diversity, followed by a review of the evolution from traditional monolingual approaches to modern deep learning and large-scale foundation models.

The tutorial covers key techniques such as self-supervised learning, cross-lingual transfer, and unified modeling frameworks, highlighting how these approaches improve performance in low-resource languages. It also examines the integration of ASR and TTS within joint frameworks, enabling more efficient and robust speech systems.

In addition, participants will be introduced to widely used datasets, benchmarks, and open-source toolkits for multilingual speech processing. The tutorial concludes with a discussion of current challenges—including data scarcity, language imbalance, and ethical considerations—and outlines emerging research directions in unified and multimodal speech technologies.

T2: Learning with Covariance Matrices: Foundations and Applications to Network Neuroscience

This tutorial presents a unified framework for learning with covariance matrices, bridging classical statistical methods and modern deep learning approaches. It begins by revisiting principal component analysis (PCA) and its limitations in handling high-dimensional, dynamic data, motivating the need for more robust and transferable models.

The tutorial introduces coVariance neural networks (VNNs), which interpret covariance matrices as graph-structured data and leverage tools from graph signal processing and graph neural networks. Key concepts include spectral representations, stability, and transferability, enabling reliable learning across heterogeneous settings.

Emerging directions such as sparse, spatio-temporal, and fairness-aware VNNs are also discussed. The framework is illustrated through applications in network neuroscience, highlighting interpretable modeling for tasks such as brain age prediction. Overall, the tutorial provides both theoretical insights and practical tools for analyzing complex multivariate data using covariance-driven learning.

T3: Distributed Integrated Sensing and Communication (DISAC): Foundations, Architectures, and Signal Processing Enablers

This tutorial introduces Distributed Integrated Sensing and Communication (DISAC), a key paradigm for future 6G networks that unifies sensing, communication, and localization across distributed infrastructures. It covers foundational concepts, contrasting centralized and distributed architectures, and highlights enabling technologies such as reconfigurable intelligent surfaces, large-scale MIMO, and cooperative synchronization.

The tutorial explores signal processing methods for network sensing, including monostatic, bistatic, and multistatic configurations, as well as distributed parameter estimation and data fusion. It further examines user equipment localization, cooperative positioning, and tracking within distributed systems, alongside emerging capabilities such as simultaneous localization and mapping (SLAM) using communication signals.

Practical challenges are addressed, including synchronization errors, hardware impairments, and scalability constraints, together with mitigation strategies. Overall, the tutorial provides a comprehensive view of architectures, models, and algorithms that underpin distributed sensing–communication integration in next-generation wireless networks.

T4: Signal Processing on and for Dynamic Graphs

This tutorial provides a comprehensive introduction to signal processing on and for dynamic graphs, addressing the challenges of modeling and learning from data supported on time-evolving network structures. It begins with the fundamentals of graph signal processing (GSP) and graph learning, then extends these concepts to settings where graph topology changes over time.

The tutorial covers key methodologies for learning dynamic graphs, including both offline and online approaches, as well as techniques for processing signals on evolving networks with changing nodes and edges. It emphasizes theoretical foundations, such as graph signal priors and guarantees, alongside practical algorithms.

A central focus is the connection between dynamic graph signal processing and emerging temporal graph machine learning, highlighting how these frameworks can be combined for improved modeling of complex systems. The tutorial concludes with open challenges and future directions in adaptive, scalable, and interpretable learning on dynamic graphs.

T5: Advances in Neuromorphic Computing: Models, Hardware, and Optimization

This tutorial provides a unified overview of neuromorphic computing, focusing on the interplay between models, hardware, and optimization for energy-efficient learning systems. It introduces spiking neural networks as recurrent models with discrete activations and highlights their connections to state-space and transformer-based architectures.

The tutorial examines emerging hardware paradigms, particularly in-memory computing and nanoelectronic devices, which enable low-latency and energy-efficient implementations by integrating computation and memory. It further explores optimization and training methods tailored to neuromorphic systems, emphasizing algorithm–hardware co-design and robustness to hardware non-idealities.

By bridging theoretical models, hardware architectures, and learning algorithms, the tutorial offers a comprehensive perspective on designing next-generation intelligent systems that combine efficiency, scalability, and adaptability in resource-constrained environments.

T6: Harnessing Low Dimensionality in Diffusion Generative Modeling: From Theory to Practice

This tutorial presents a theoretical and practical framework for understanding and improving diffusion generative models through low-dimensional structures. It examines fundamental questions of generalization, efficiency, and controllability, highlighting how diffusion models learn and exploit intrinsic low-dimensional representations of high-dimensional data.

The tutorial develops rigorous insights into sample complexity and generalization behavior, connecting diffusion models to classical high-dimensional statistics. It further introduces convergence theory for diffusion-based samplers, enabling the design of faster and more efficient algorithms that adapt to underlying data structures.

In addition, the tutorial explores methods for controlling generation, with a focus on solving inverse problems in scientific and biomedical imaging. Topics include latent-space modeling, adaptive sampling, and techniques for handling high-dimensional data. Overall, it provides a principled foundation for advancing reliable, efficient, and controllable generative modeling.

T7: Causal Representation Learning

This tutorial introduces causal representation learning (CRL), a framework for uncovering interpretable and mechanistic representations from high-dimensional data. Moving beyond correlation-based approaches, CRL aims to identify latent variables that correspond to meaningful factors of variation and capture their underlying causal relationships.

The tutorial covers foundational concepts, including disentanglement, causal inference, and identifiability, and presents a taxonomy of CRL methods. It emphasizes interventional approaches that leverage controlled perturbations to recover causal structure, as well as temporal methods that exploit time-series data for improved identifiability.

Algorithmic principles for learning causally grounded representations are discussed alongside theoretical guarantees. The tutorial also highlights applications in domains such as robotics, source separation, and genomics, demonstrating how CRL enables generalization, interpretability, and counterfactual reasoning. Overall, it provides a principled framework for learning representations that reflect the true generative mechanisms of data.

T8: Low-Rank Adaptation Redux in Large Models

This tutorial explores low-rank adaptation techniques for efficient training and fine-tuning of large models, with a focus on bridging classical signal processing principles and modern deep learning. It revisits foundational low-rank methods—such as subspace modeling and matrix factorization—and connects them to parameter-efficient fine-tuning approaches, including low-rank adaptation (LoRA).

The tutorial covers architectural design choices for low-rank modeling, including extensions based on singular value decomposition, sparsity, and tensor representations. It also examines optimization challenges arising from nonconvex low-rank formulations, highlighting recent advances in initialization, manifold-based methods, and symmetry-aware optimization.

Applications to large-scale models are discussed, including pre-training, fine-tuning, and multimodal systems. Overall, the tutorial provides a unified perspective on leveraging low-dimensional structure to improve scalability, efficiency, and adaptability in modern AI systems.

T9: Speech Enhancement in the Hearables Era: From Basics to Advanced Neural Methods

This tutorial provides a comprehensive overview of speech enhancement technologies in the emerging era of hearable devices. It traces the evolution from traditional model-based methods to modern data-driven approaches, including deep neural networks for noise reduction, source separation, and target speaker extraction.

The tutorial examines key challenges specific to hearables, such as preserving binaural cues, handling real-time constraints, and balancing the latency–performance trade-off across algorithmic, computational, and communication components. It also reviews hardware considerations, including device form factors, processing capabilities, and communication protocols.

Recent trends are highlighted, including selective attention mechanisms, generative AI techniques such as diffusion models and neural audio codecs, and speech foundation models. The tutorial concludes with discussions on practical deployment challenges, as well as regulatory and ethical considerations in assistive and consumer audio technologies.

T10: Diffusion Models for Imaging and Vision

his tutorial provides a rigorous, first-principles introduction to diffusion models for imaging and vision, focusing on their mathematical foundations and practical implications. It begins with core concepts in probabilistic modeling, including variational autoencoders, evidence lower bounds, and reparameterization techniques, establishing the basis for generative learning.

The tutorial then covers denoising diffusion probabilistic models, detailing forward and reverse processes, training objectives, and sampling mechanisms. It further explores score-based methods, including score matching and Langevin dynamics, and unifies these approaches through stochastic differential equation formulations.

Emphasis is placed on intuitive understanding through derivations, proofs, and illustrative examples, enabling participants to connect theoretical principles with real-world applications in image and video generation. Overall, the tutorial offers a comprehensive and coherent framework for understanding diffusion-based generative models.

T11: Analog Computing for Signal Processing and Communications

This tutorial introduces analog computing as an emerging paradigm for signal processing and wireless communications, addressing the limitations of conventional digital architectures in terms of energy efficiency, latency, and scalability. It explores how computations can be performed directly in the electromagnetic domain, leveraging physical properties of signals for fast and efficient processing.

The tutorial covers analog computing architectures, including microwave-based systems, and demonstrates how core signal processing tasks—such as beamforming and filtering—can be implemented in the analog domain. It also presents over-the-air computation techniques, where functions are computed during wireless transmission by exploiting signal superposition.

Applications in next-generation networks are discussed, including distributed sensing, federated learning, and large-scale communication systems. The tutorial concludes with an analysis of practical challenges, such as noise, synchronization, and hardware constraints, along with emerging research directions.

T13: Current Approaches to Computational Analysis of Music Audio Signals

This tutorial provides a comprehensive overview of computational approaches to music audio analysis within the framework of Music Information Retrieval (MIR). It introduces core tasks such as pitch and melody extraction, rhythm and beat tracking, chord and key recognition, instrument classification, and music tagging, emphasizing the unique challenges of modeling musical structure and perception.

The tutorial examines representation learning techniques, tracing the evolution from handcrafted features to modern deep learning and self-supervised approaches that enable large-scale, transferable audio representations. It also explores multimodal methods that integrate audio, symbolic, and textual data for enhanced music understanding and cross-modal retrieval.

Finally, the tutorial addresses ethical and legal considerations, including data bias, copyright, and reproducibility, highlighting the importance of responsible and transparent practices in large-scale music analysis systems.

T14: Invex Optimization: Theory and Applications for Signal/Image Processing and Machine Learning

This tutorial introduces invex optimization as a powerful framework for solving inverse problems in signal and image processing and machine learning. It addresses the limitations of classical convex methods and the lack of guarantees in non-convex formulations by presenting invexity as a generalization that preserves global optimality while enabling more flexible modeling.

The tutorial covers theoretical foundations of invex functions, including their properties, duality, and relationships to convexity, followed by algorithmic approaches such as gradient-based methods, proximal algorithms, and plug-and-play techniques. It also explores data-driven extensions, including invex neural networks and learned regularizers.

Applications are presented across imaging and machine learning tasks, such as denoising, deconvolution, spectral imaging, and dimensionality reduction. Overall, the tutorial provides a principled approach to combining modeling flexibility with strong theoretical guarantees in optimization-driven signal processing.

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

Top Reading

Most Upvoted

Top Reading

Most Upvoted

Tutorials

Tutorials

9:00 am remove 12:30 pm

Xie Chen, Kai Yu, Wei-Qiang Zhang, Min Ma, Yu Zhang

T1: Multilingual Speech Recognition and Synthesis

9:00 am remove 12:30 pm

Saurabh Sihag and Gonzalo Mateos

T2: Learning with Covariance Matrices: Foundations and Applications to Network Neuroscience

9:00 am remove 12:30 pm

Henk Wymeersch and Nuria González-Prelcic

T3: Distributed Integrated Sensing and Communication (DISAC): Foundations, Architectures, and Signal Processing Enablers

9:00 am remove 12:30 pm

Elvin Isufi, Samuel Rey, Bishwadeep Das and Geert Leus

T4: Signal Processing on and for Dynamic Graphs

9:00 am remove 12:30 pm

Osvaldo Simeone, Bipin Rajendran and Tianyi Chen

T5: Advances in Neuromorphic Computing: Models, Hardware, and Optimization

9:00 am remove 12:30 pm

Qu Qing, Yuxin Chen, Yuting Wei and Liyue Shen

T6: Harnessing Low Dimensionality in Diffusion Generative Modeling: From Theory to Practice

9:00 am remove 12:30 pm

Ali Tajer, Karthikeyan Shanmugam, Burak Varici and Emre Acartürk

T7: Causal Representation Learning

2:30 pm remove 6:30 pm

Bingcong Li, Georgios Giannakis and Yilang Zhang

T8: Low-Rank Adaptation Redux in Large Models

2:30 pm remove 6:00 pm

Sharon Gannot and Tal Rosewein

T9: Speech Enhancement in the Hearables Era: From Basics to Advanced Neural Methods

2:30 pm remove 6:00 pm

Stanley Chan

T10: Diffusion Models for Imaging and Vision

2:30 pm remove 6:00 pm

Matteo Nerini, Bruno Clerckx and Carlo Fischione

T11: Analog Computing for Signal Processing and Communications

2:30 pm remove 6:00 pm

Andrea Poltronieri, Xavier Serra, Dmitry Bogdanov, Martín Rocamora and Pablo Alonso-Jiménez

T13: Current Approaches to Computational Analysis of Music Audio Signals

2:30 pm remove 6:00 pm

Samuel Pinilla, Kumar Vijay Mishra, Brian Sadler

T14: Invex Optimization: Theory and Applications for Signal/Image Processing and Machine Learning

9:00 am 12:30 pm

9:00 am 12:30 pm

9:00 am 12:30 pm

9:00 am 12:30 pm

9:00 am 12:30 pm

9:00 am 12:30 pm

9:00 am 12:30 pm

2:30 pm 6:30 pm

2:30 pm 6:00 pm

2:30 pm 6:00 pm

2:30 pm 6:00 pm

2:30 pm 6:00 pm

2:30 pm 6:00 pm