PhD student Skyler Wang and Meta AI release speech-to-speech translation AI model

PhD student Skyler Wang and a team of researchers at Meta AI announced and open-sourced a big foundational speech-to-speech translation AI model that can translate across 100 languages. 

In contrast to many conventional speech-to-speech translation systems, which rely on cascaded systems composed of multiple subsystems that perform translation progressively (e.g., from automatic speech recognition to text-to-text translation, and subsequently text-to-speech synthesis in a 3-stage system), SeamlessM4T—Massively Multilingual & Multimodal Machine Translation—is a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. Introducing a unified system like this could help expedite the creation of end-to-end speech translation systems that users can carry around on their mobile devices. 

More information:

Blog post:

Live demo:

Paper: or

The Verge:

Tech Crunch: