Top Reading
Sorry, there is nothing for the moment.
Most Upvoted

Signal Processing Cup (SP)

Background

AV Zoom: Real-Time Audio-Visual Zooming on Smartphones

SPS 2026 SP Cup Website | 4-8 May 2026


Sponsored by the MathWorks and IEEE SPS


The Signal Processing Cup (SP Cup) student competition, presented by the IEEE Signal Processing Society, gives students the opportunity to work together to solve real-life problems using signal processing methods. After students submit their work, three final teams are selected to present their work and compete for the grand prize at ICASSP!

While only the final three teams will be competing at ICASSP, all are welcome to watch the students present their work during the final competition. Join us and come see the action!

Introduction

This challenge aims to inspire young minds to tackle practical technical problems. Smartphones are now indispensable, offering features that simplify daily tasks—from entertainment to health tracking. One exciting development is audio zooming, which allows smartphones to focus on specific sounds while reducing background noise. This feature is especially useful in noisy environments like public gatherings, railway stations, and stadiums. While this technology has appeared in high-end models, it remains limited or ineffective in many devices.

When capturing images with a smartphone, one typically focuses on the scene and takes a shot. If specific details in the scene need emphasis, optical zooming in the camera provides a solution. Nowadays, optical zooming is also available while shooting videos. However, regardless of where the camera is pointed or how zoomed it is, audio is also captured. This can lead to a mismatch in synchronization between the captured video and audio, resulting in an unnatural experience. While the camera has an optical field of view, it lacks an auditory field of view. Achieving synchronization between what is seen and what is heard is crucial for enhancing user experience. This concept, known as “audio-visual zooming,” integrates visual zoom capabilities with enhanced audio capture, enabling synchronized focus on both visual and auditory details (see Figure 1). This technology has the potential to revolutionize applications where precise audiovisual alignment is essential, such as photography, cinematography, and more.

The main issue is how to accurately locate and track audio-visual targets in a dynamic environment using low-power hardware. The goal is to improve the intelligent zooming in on sounds and visuals of interest, such as a person speaking in a noisy outdoor scene. This problem matters because most current solutions for audio-visual analysis require heavy computation and centralized processing, which makes them unsuitable for real-time use in remote, low-power, or privacy-sensitive situations. Processing at the edge in real time can create more efficiency. Current limitations include limited integration of sound source localization with visual tracking on low-resource devices, unreliable real-time performance, and constrained capabilities under tight compute budgets. Most systems either focus only on video or require cloud processing to achieve acceptable accuracy.


Read the Call for Participation

Task Description

 The audio zooming problem can be viewed as ‘spatial filtering’ in the array signal processing context. There are many contemporary methods available, not many for audio zooming but in different applications (some useful references are provided in section 2.3 of the 2026 SP Cup Official Document). In general, the resolution of spatial filtering techniques such as beamforming is a function of the number of sensors (in this case, microphones). Higher resolution typically requires more microphones; however, due to space constraints, smartphones usually include only two or three microphones. There is no requirement that this problem must be addressed solely through beamforming. Since both audio and visual zooming are involved, creative and hybrid approaches are encouraged whether based on classical signal processing, artificial intelligence (Machine learning (ML) / Deep Learning / TinyML) or novel combinations of both. The focus of this challenge is to design a real-time audio-visual zooming system. This includes designing a microphone array configuration, developing processing algorithms, and building a mobile application for Android or iOS. The solution must also include real-time implementation and evaluation similar to the example presented in Figure 2. The system should integrate the following components:

  • Real-time audio zooming using microphone arrays to focus on specific sound sources.
  • Visual alignment with the chosen sound source, ensuring that what is heard matches what is seen.
  • All components should be optimized for edge devices (smartphones), with emphasis on low power, low latency, and fully on-device operation.

This challenge has three phases. 

Full technical details, dataset(s), evaluation metrics, and all other pertinent information about the competition is located in the Call for Participation.

Important Dates

  • Challenge Announcement/Registration Starts: 21 October 2025
  • Team Registration Deadline: 10 November 2025Registration Link
  • Phase 1 Team Work Submission Deadline: 11 December 2025
  • Announcement of the Phase 1 Results: 21 December 2025
  • Phase 2 Team Work Submission Deadline: 08 February 2026
  • Announcement of 3 Finalists Teams: 02 March 2026
  • Presentation of final results at ICASSP 2026: 4-8 May 2026

Registration and Important Resources

  • All teams MUST be registered through the official competition registration system before the deadline in order to be considered as a participating team. Teams must meet all eligibility requirements at the time of team registration as well as throughout the competition. 
  • All team members for each team MUST agree to the SPS Student Terms and Conditions and submit a completed agreement form here before the team registration deadline. 
  • Register your team for the 2026 SP Cup before the Team Registration Deadline date above and submit work before Final Submission Due date above at the following link: [Register your team HERE]

Complimentary MATLAB License

MathWorks, Inc. continues to support the IEEE SP Cup. Participating students are encouraged to download the complimentary MathWorks Student Competitions Software for use in the competition.

Instructions on how to apply for the complimentary MATLAB License can be found in the following DropBox folder:

DropBox Folder: SP Cup – Complimentary MATLAB License (MathWorks)

Contacts

Competition Organizers (technical, competition-specific inquiries): Dr. Ashok Chandrasekaran

SPS Staff (Terms & Conditions, Travel Grants, Prizes): Jaqueline Rash, SPS Membership Program and Events Manager

SPS Student Services Committee: Lucas Thomaz, Chair

Sponsors

This competition is sponsored by the IEEE Signal Processing Society and MathWorks.