Sonantic: Meet the start-up changing the games industry with human-like AI voices

Rockstar announced its western-inspired epic Red Dead Redemption 2 back in 2016 but the game didn’t make it onto consoles until two years later, in November 2018. All in all, the game took an intense seven years to make.

Imagine being a developer like Rockstar, deep in the crunch period before a game’s release and realising you need an extra line or two of dialogue. You need to call in an actor, work around their schedule, get them to record the lines you need, before inputting that into the game. Using voice AI tech, this process can be cut down to minutes and that’s exactly what London start-up Sonantic wants to achieve.

Sonantic, which is announcing today it has raised £2 million in funding from the likes of EQT Ventures, is the brainchild of Zeen Qureshi (CEO) and John Flynn (CTO) who met during an Entrepreneur First programme in 2018. The start-up works with actors to create AI voice models, which can then be leased to games studios to be voice characters in a game.

In a world dominated by the likes of Google Assistant and Amazon Alexa, voice tech may not sound like anything new. But what is different about Sonantic’s work is that it is expressive and features realistic elements such as breathing so it doesn’t sound like a robot is speaking.

“The average AI assistant sounds more realistic but the speech is very consistent 100 per cent throughout, which makes it quite robotic. If you listen to my voice I have speech patterns, it goes up and down and nothing is consistent all the way through,” explains Qureshi. “We pick up all those little nuances like breath, how fast and slow, and everything from a whisper to a shout to create expression.”

The strength of Sonantic’s tech comes from the backgrounds of Qureshi and Flynn. Qureshi worked at various tech start-ups over the years, including Bulb, whilst tutoring children who were non-verbal or with autism on the side, implementing speech and language therapy. Flynn, on the other hand, was a speech researcher who spent time working in the film industry on post-production dialogue, on films such as The Dark Knight and Bohemian Rhapsody.

“On Batman, I saw Christian Bale perform the same scene five or six different ways with subtle nuances. I was always really interested in how you might capture that performance algorithmically and it wasn’t until deep learning speech came along in the past five years that I thought tech has now matured to maybe do something like that,” he explains.

Despite Flynn’s routes in the film industry, the founders decided to target games studios with Sonantic’s tech instead. Qureshi says film studios can be quite tech-averse, unlike their gaming counterparts, and development lengths in gaming mean that tech like this can have an impact. “With Sonantic’s tool, we can take what takes months to do that process, and sometimes years, down to minutes. They can literally type in a line of dialogue and generate it and put it into a game and see how it feels,” she says.

Elijah Wood is a well-known actor who has put his voice to games, including The Legend of Spryo: Dawn of the Dragon (Jamie McCarthy / Staff / Getty Images )

But voice tech isn’t going to replace voice actors in gaming. For one, it’s a big business for the industry: this year’s BAFTA Games Awards in April now have two performer category awards, one for performer in a leading role and one for performer in a supporting role. Sonantic works with voice actors to record their voice and then build the model on top, something Qureshi describes as “augmenting” their work. “[Actors] could have their digital version do an audition or do a role somewhere else in the world. We think that’s pretty cool.”

At the moment, Sonantic is working with around 30 actors. They receive payment when they come on board and record a voice, as well as when their voice is used by a studio – more than 10 big-name gaming studios are on board, though Sonantic is unable to reveal any names.

Yet with AI tech such as this, what about the fear that someone’s voice could be deepfaked using Sonantic’s tool? “We don’t open up the tech to other people, we train the voices,” says Flynn. “In that way, we keep it a traditional entertainment voice model where people are used to assigning their voice to a product or film. We like to keep that on our side because if you open it out, it can be very dangerous.”

Sonantic also vets the studios they work with, discussing their work and pipeline before giving them access to the voices. And whilst the founders hope their tech won’t be used to deepfake something US House Speaker Nancy Pelosi says, it could be used to extend the working lives of actors through voice preservation.

“One of the areas we’ve been looking at is we’ve had actors come to us who might be getting older or potentially losing their voice – we’ve been exploring how to could preserve a voice and look at it from that point of view,” says Flynn.

“We’re definitely open to seeing how this evolves because this is a new area of technology and a new way of working,” adds Qureshi. “Tech moves a lot faster than legal and ethics so we’re trying to make sure we understand our actors, see how they feel, and explore the capabilities in the right way.”

The new funding will go towards growing the team and ensuring Sonantic can bring new actors into the model, as well as improving the product. “There’s so many different styles and capabilities of human voice. Our main north star is trying to do everything a human voice can do for performance,” says Qureshi.

sonantic.io