PhD student Skyler Wang and Meta AI release speech-to-speech translation AI model

PhD student Skyler Wang and a team of researchers at Meta AI announced and open-sourced a big foundational speech-to-speech translation AI model that can translate across 100 languages. 

In contrast to many conventional speech-to-speech translation systems, which rely on cascaded systems composed of multiple subsystems that perform translation progressively (e.g., from automatic speech recognition to text-to-text translation, and subsequently text-to-speech synthesis in a 3-stage system), SeamlessM4T—Massively Multilingual & Multimodal Machine Translation—is a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. Introducing a unified system like this could help expedite the creation of end-to-end speech translation systems that users can carry around on their mobile devices. 

More information:

Blog post: https://ai.meta.com/blog/seamless-m4t/

Live demo: https://seamless.metademolab.com/

Paper: https://ai.meta.com/research/publications/seamless-m4t/ or https://arxiv.org/abs/2308.11596

The Verge: https://www.theverge.com/2023/8/22/23840571/meta-multilingual-speech-translation-model-ai

Tech Crunch: https://techcrunch.com/2023/08/22/meta-releases-an-ai-model-that-can-transcribe-and-translate-close-to-100-languages/