2026 IEEE International Conference on Acoustics, Speech, and Signal Processing
The Signal Processing Cup (SP Cup) student competition, presented by the IEEE Signal Processing Society, gives students the opportunity to work together to solve real-life problems using signal processing methods. After students submit their work, three final teams are selected to present their work and compete for the grand prize at ICASSP!
While only the final three teams will be competing at ICASSP, all are welcome to watch the students present their work during the final competition. Join us and come see the action!
This challenge aims to inspire young minds to tackle practical technical problems. Smartphones are now indispensable, offering features that simplify daily tasks—from entertainment to health tracking. One exciting development is audio zooming, which allows smartphones to focus on specific sounds while reducing background noise. This feature is especially useful in noisy environments like public gatherings, railway stations, and stadiums. While this technology has appeared in high-end models, it remains limited or ineffective in many devices.
When capturing images with a smartphone, one typically focuses on the scene and takes a shot. If specific details in the scene need emphasis, optical zooming in the camera provides a solution. Nowadays, optical zooming is also available while shooting videos. However, regardless of where the camera is pointed or how zoomed it is, audio is also captured. This can lead to a mismatch in synchronization between the captured video and audio, resulting in an unnatural experience. While the camera has an optical field of view, it lacks an auditory field of view. Achieving synchronization between what is seen and what is heard is crucial for enhancing user experience. This concept, known as “audio-visual zooming,” integrates visual zoom capabilities with enhanced audio capture, enabling synchronized focus on both visual and auditory details (see Figure 1). This technology has the potential to revolutionize applications where precise audiovisual alignment is essential, such as photography, cinematography, and more.
The main issue is how to accurately locate and track audio-visual targets in a dynamic environment using low-power hardware. The goal is to improve the intelligent zooming in on sounds and visuals of interest, such as a person speaking in a noisy outdoor scene. This problem matters because most current solutions for audio-visual analysis require heavy computation and centralized processing, which makes them unsuitable for real-time use in remote, low-power, or privacy-sensitive situations. Processing at the edge in real time can create more efficiency. Current limitations include limited integration of sound source localization with visual tracking on low-resource devices, unreliable real-time performance, and constrained capabilities under tight compute budgets. Most systems either focus only on video or require cloud processing to achieve acceptable accuracy.
Read the Call for Participation
The audio zooming problem can be viewed as ‘spatial filtering’ in the array signal processing context. There are many contemporary methods available, not many for audio zooming but in different applications (some useful references are provided in section 2.3 of the 2026 SP Cup Official Document). In general, the resolution of spatial filtering techniques such as beamforming is a function of the number of sensors (in this case, microphones). Higher resolution typically requires more microphones; however, due to space constraints, smartphones usually include only two or three microphones. There is no requirement that this problem must be addressed solely through beamforming. Since both audio and visual zooming are involved, creative and hybrid approaches are encouraged whether based on classical signal processing, artificial intelligence (Machine learning (ML) / Deep Learning / TinyML) or novel combinations of both. The focus of this challenge is to design a real-time audio-visual zooming system. This includes designing a microphone array configuration, developing processing algorithms, and building a mobile application for Android or iOS. The solution must also include real-time implementation and evaluation similar to the example presented in Figure 2. The system should integrate the following components:
This challenge has three phases.
Full technical details, dataset(s), evaluation metrics, and all other pertinent information about the competition is located in the Call for Participation.
MathWorks, Inc. continues to support the IEEE SP Cup. Participating students are encouraged to download the complimentary MathWorks Student Competitions Software for use in the competition.
Instructions on how to apply for the complimentary MATLAB License can be found in the following DropBox folder:
DropBox Folder: SP Cup – Complimentary MATLAB License (MathWorks)
Competition Organizers (technical, competition-specific inquiries): Dr. Ashok Chandrasekaran
SPS Staff (Terms & Conditions, Travel Grants, Prizes): Jaqueline Rash, SPS Membership Program and Events Manager
SPS Student Services Committee: Lucas Thomaz, Chair
This competition is sponsored by the IEEE Signal Processing Society and MathWorks.
© Copyright 2025 IEEE – All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.