ElevenLabs introduces AI real time dubbing in 20 languages

0 3

By Matthew Griffin Intelligence and the Senses 14th October 2023

WHY THIS MATTERS IN BRIEF

Being able to translate what people are saying it – along with their accents and emotions – in real time into other languages is revolutionary in the industry.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

ElevenLabs, a year-old voice cloning and voice synthesis startup founded by former Google and Palantir employees, today announced the launch of AI Dubbing, a dedicated product that can translate any speech, including long-form content, into more than 20 different languages.

Available to all platform users, the offering comes as a new way to dub audio and video content and can transform an area that has largely been manual for years.

More importantly, it can break language barriers for smaller content creators who don’t have the resources to hire manual translators to convert their content and take it global.

“We have tested and iterated this feature in collaboration with hundreds of content creators to dub their content and make it more accessible to wider audiences,” said Mati Staniszewski, CEO and co-founder of ElevenLabs. “We see huge potential for independent creatives – such as those creating video content and podcasts – all the way through to film and TV studios.”

ElevenLabs claims the feature can deliver high-quality translated audio in minutes – depending on the length of the content – while retaining the original voice of the speaker, complete with their emotions and intonation.

However, in this age of Artificial Intelligence (AI), when almost every enterprise is looking at language models to drive efficiencies, it is not the only one exploring speech-to-speech translation.

While AI-driven translation involves multiple layers of work, starting from noise removal to speech translation, users at the front end don’t have to go through any of those steps. They just have to select the AI Dubbing tool on ElevenLabs, create a new project, select the source and target languages and upload the file of the content.

Once the content is uploaded, the tool automatically detects the number of speakers and gets to work with a progress bar appearing on the screen. This is just like any other conversion tool on the internet. After completion, the file can be downloaded and used.

Behind the scenes, the tool works by tapping ElevenLabs’ proprietary method to remove background noise, differentiating music and noise from actual dialogue from speakers. It recognizes which speakers speak when, keeping their voices distinct, and transcribes what they say in their original language using a speech-to-text model. Then, this text is translated, adapted (so lengths match) and voiced in the target language to produce the desired speech while retaining the speaker’s original voice characteristics.

Finally, the translated speech is synced back with the music and background noise originally removed from the file, preparing the dubbed output for use. EvenLabs claims this work is the culmination of its research on voice cloning, text and audio processing and multilingual speech synthesis.

For producing the final speech from translated text, the company taps its latest Multilingual v2 model. It currently supports more than 20 languages, including Hindi, Portuguese, Spanish, Japanese, Ukrainian, Polish and Arabic, giving users a wide range of options to globalize their content.

Prior to this end-to-end interface, ElevenLabs offered separate tools for voice cloning and text-to-speech synthesis. This way, if one wanted to translate their audio content, like a podcast, into a different language, they first had to create a clone of their voice on the platform while transcribing and translating the audio separately. Then, using the translated text file and their cloned speech, they could produce audio from the text-to-speech model. Not to mention, this only worked for speech without any major background music or noise.

Staniszewski confirmed that the new dubbing feature will be available to all users of the platform, but will have some character limits, as has been the case with text-to-speech generation. Around one minute of AI Dubbing would typically equate to 3,000 characters, he said.

While ElevenLabs is making headlines with back-to-back developments, it isn’t the only one exploring AI-based voicing. A few weeks back, Microsoft-backed OpenAI made ChatGPT multimodal with the ability to have conversations in response to voice prompts, like Amazons Alexa product.

Here too the company is using speech-to-text and text-to-speech models to convert audio, but the technology is not available to all.

OpenAI said it is using it with select partners to prevent misuse of the capabilities. One of these is Spotify which is using is helping its podcasters transcribe their content into different languages while retaining their own voice.

On his part, Staniszewski said ElevenLabs’ AI Dubbing tool differentiates by translating video or audio of any length, containing any number of speakers, while preserving their voice and emotions across up to 20 languages and delivering the highest quality results.

Other players are also active in the AI-powered voice and speech synthesis space, including MURF.AI, Play.ht and WellSaid Labs.

Just recently, Meta also launched SeamlessM4T, an open-source multilingual foundational model that can understand nearly 100 languages from speech or text and generate translations into either or both in real-time.

According to Market US, the global market for such tools stood at $1.2 billion in 2022 and is estimated to touch nearly $5 billion in 2032, with a CAGR of slightly above 15.40%.

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.