PhD student Skyler Wang and a team of researchers at Meta AI announced and open-sourced a big foundational speech-to-speech translation AI model that can translate across 100 languages.
In contrast to many conventional speech-to-speech translation systems, which rely on cascaded systems composed of multiple subsystems that perform translation progressively (e.g., from automatic speech recognition to text-to-text translation, and subsequently text-to-speech synthesis in a 3-stage system), SeamlessM4T—Massively Multilingual & Multimodal Machine Translation—is a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. Introducing a unified system like this could help expedite the creation of end-to-end speech translation systems that users can carry around on their mobile devices.
More information:
Blog post: https://ai.meta.com/
Live demo: https://seamless.
Paper: https://ai.meta.com/
The Verge: https://www.theverge.
Tech Crunch: https://techcrunch.