9:00 - 9:05 (PT) Welcome
9:05 - 10:00 (PT) Paper session #1

Laughing Matters: Introducing Audio-Driven Laughing-Face Generation with Diffusion Models - Extended Abstract Antoni Bigata Casademunt, Rodrigo Mira, Nikita Drobyshev, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
Can CLIP Help Visual Sound Localization? Sooyoung Park, Arda Senocak, Joon Son Chung
Learning Continual Audio-Visual Sound Separation Models Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition - Extended Abstract Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic
Q&A session
Audio-Visual Autism Behavior Recognition with Multimodal Large Language Models Shijian Deng, Erin Kosloski, Siddhi Patel, Zeke A Barnett, Yiyang Nan, Alexander M Kaplan, Sisira Aarukapalli, William Doan, Matthew Wang, Harsh Singh, Rollins Pamela, Yapeng Tian
Dataset distillation for audio-visual datasets Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, Yapeng Tian
AVQA-CoT: When CoT Meets Question Answering in Audio-Visual Scenarios Guangyao Li, Henghui Du, Di Hu
Q&A session
10:00 - 10:30 (PT) Posters & Coffee Break
10:30 - 11:15 (PT) Paper session #1

ViSpeR: Multilingual Visual Speech Recognition Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Ankit Singh, Hakim Hacid
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning Nikhil Singh, Chih-Wei Wu, Iroro Orife, Mahdi M. Kalayeh
Q&A session
AVHuMAR: Audio-Visual Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng
AV-Mamba: Cross-Modality Selective State Space Models for Audio-Visual Question Answering Ziru Huang, Jia Li, Wenjie Zhao, Yunhui Guo, Yapeng Tian
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition - Extended Abstract Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic
Q&A session
11:15 - 11:45 (PT) Invited talk
Alexander Richard
11:45 - 1:00 (PT) Lunch
1:00 - 2:00 (PT) Invited papersChair: Ziyang Chen

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman
Q&A session
TIM: A Time Interval Machine for Audio-Visual Action Recognition Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao
Q&A session
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion ModelsSanjoy Chowdhury, Sayan Nag, Joseph K J, Balaji Vasan Srinivasan, Dinesh Manocha
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Mark Hamilton, Andrew Zisserman, John R. Hershey, William T. Freeman
Q&A session

2:00 - 2:30 (PT) Invited talk
Ruohan Gao
2:30 - 3:00 (PT) Invited talk
Shyam Gollakota
3:00 - 3:30 (PT) Coffee Break
3:30 - 4:00 (PT) Invited talk
Hilde Kuehne
4:00 - 4:30 (PT) Invited talk
Samuel Clarke
4:30 - 5:00 (PT) Invited talk
Tengda Han

Presentation instructions

Previous workshops: 2018, 2019, 2020, 2021, 2022, 2023

  • Authors of accepted papers will present a 5-minute talk about their work. You may either present in person, or submit a video. For the latter option, please submit by June 15th (11:59 PST) to CMT as a .mp4 file. Please submit the video as a supplementary file on CMT, along with the PDF for your paper.
  • We'll have a paper presentation session from 9am - 11:15am. These will be run as two sub-sessions, with a coffee and poster break in between. Each session will be a mix of in-person and video presentations. Throughout the paper sessions, there will be short Q&A sessions for all of the papers that precede them. We'll also release recordings on our website for offline viewing. We'll post the paper schedule in the coming weeks.
  • You are welcome to optionally present a poster during the lunch and coffee breaks. We unfortunately are unable to offer a hybrid option for posters.
  • Please also submit the camera ready version of your paper via CMT by June 13th (11:59 PST). Papers will be available on our website.
  • Looking forward to seeing you there!


Andrew Owens
University of Michigan

Jiajun Wu

Arsha Nagrani

Triantafyllos Afouras

Ruohan Gao
Meta /
University of Maryland

Hang Zhao
Tsinghua University

Ziyang Chen
University of Michigan

William Freeman

Andrew Zisserman

Kristen Grauman
UT Austin / Meta

Antonio Torralba

Jean-Charles Bazin