Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.9 KB

README.md

File metadata and controls

30 lines (22 loc) · 1.9 KB

3D Audio-Visual Segmentation

Artem Sokolov, Swapnil Bhosale, Xiatian Zhu

NeurIPS 2024 Workshop on Audio Imagination

Project page arXiv Dataset

teaser

This repository is the official implementation of "3D Audio-Visual Segmentation". In this paper, we introduce a novel research problem, 3D Audio-Visual Segmentation, extending the existing AVS to the 3D output space. To facilitate this research, we create the very first simulation based benchmark, 3DAVS-S34-O7, providing photorealistic 3D scene environments with grounded spatial audio under single-instance and multi-instance settings, across 34 scenes and 7 object categories. Subsequently, we propose a new approach, EchoSegnet, characterized by integrating the ready-to-use knowledge from pretrained 2D audio-visual foundation models synergistically with 3D visual scene representation through spatial audio-aware mask alignment and refinement.

Updates

  • Data & Code coming soon!

Method: EchoSegnet

teaser

Citation

If you find our project useful, please use the following BibTeX entry:

@inproceedings{sokolov20243daudiovisualsegmentation,
    title     = {3D Audio-Visual Segmentation},
    author    = {Sokolov, Artem and Bhosale, Swapnil and Zhu, Xiatian},
    booktitle = {Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation},
    year      = {2024}
}

Contact

For feedback or questions please contact Artem Sokolov