Interview: How Synthesis became a go-to studio for voice acting and localisation in games

As we enter the next generation of gaming, production values continue to expand, and with them player expectation. However, we shouldn’t forget the huge leaps and bounds gaming has taken through the PS4 and Xbox One era. As technology has improved, one of the factors that we’ve come to expect from modern games is high quality voice performances, bringing the medium ever closer to Hollywood blockbusters and big-budget TV.

One of the leading voice-recording and localisation studios in the world is Synthesis, and we were able to catch up with the company’s co-founders, Director of Operations Finn Seliger and Head of Translation Jan Werkmeister to talk about what goes into getting a perfect performance before it finds its way into your games.


TSA: Can you talk us through how Synthesis came to be?

– ARTICLE CONTINUES BELOW –

Finn & Jan: We three founders of Synthesis [Adrian Koch, Finn Seliger and Jan Werkmeister] met in university, studying media technology. We were interested in audio and shared the same dream – having a high-end recording studio of our own. We finished our studies and fulfilled this dream by planning and building a recording studio in the basement of a former U-boat engine factory. The conceptual side was our thesis and once the studio was constructed, we started our own company called Periscope Studio, in 2007. We had another colleague in university who founded a development studio (Daedalic Entertainment) at more or less the same time. He inspired us to venture into the games industry.

As a service provider, we concentrated on music production and sound design only to learn that there was a high demand for voice recordings, while the market for music and sound design was already crowded. We continued producing music and sound design initially but, quite naturally, the amount of voice recordings grew. Our first voice recordings were the German original recordings for Daedalic Entertainment’s first games. There were extensive scripts and a lot of files to handle, as we did almost everything manually back then. So, we learned a lot in those early days. In 2009, we were contacted by Synthesis, an international company which had been around since the mid-1990s and already had many AAA localization clients/projects under their belt.

At that time, we produced our first real localisations with an English source and German target. The collaboration went smoothly and we became their go-to studio for German localization. In late 2010, we felt ready to handle our largest production to date, which ran by the code name “big fella”. We had to deal with something new this time – translation would also have to be handled by us. Although it was a major challenge, we agreed and successfully localised a sheer monster of a game – long-awaited RPG, The Elder Scrolls V: Skyrim. Reviews for Skyrim were very positive and we were asked whether we would like to start a joint venture with Synthesis as their local German office, ‘Synthesis Germany’. We agreed and the company was founded in early 2012. With its founding came the need to build up a proper internal translation department, which was fully established in 2013. You could say that our transformation from a small audio provider to a full-service loc provider was completed by then.

TSA: How can the use of technology improve an actor’s performance in games?

Finn & Jan: Technology is a key factor in audio production. Today’s professional audio recording solutions are advanced and give us more creative freedom while taking care of the technical necessities. ADR functionality is, for example, integrated in many digital audio workstations which helps us a lot when it comes to matching lip movements. But recording technology is just one side of the coin.

Most work goes into preparing for the actual recording – organising assets and structuring the material is a great challenge. Due to the interactivity and enormous size of some game worlds, we have plenty of information that needs to be prepared properly. Building your own solutions is essential to achieve a productive workflow, because the less time we spend on the technical side of things, the more we can concentrate on actor performance.

Quality performances are only possible with specific direction. In other words, overwhelming the voice actor is counter productive. We need to filter out anything superfluous and only provide relevant information for the recording session at hand. Over the years, this has led to every part of the audio production being optimised. We have relevant character information, pictures, dialogues and audio pronunciation guides directly at our fingertips to support the actor in doing a great job.

TSA: One of your specialities is localisation – as games become ever more expansive, and are accessible across the world, how has that affected the way the studio works?

Finn & Jan: The biggest impact on our work has been the increased size and complexity of modern video game localisation projects. The work itself, in theory, doesn’t change, whether you do a few lines with one actor or several thousand lines with 100 actors.

What changes is the organisational side of production. Processes and workflows must be tailored to dealing with the huge amounts of data – both text and audio. We sometimes see professional loc platforms and database-driven systems struggle to cope with larger RPG/MMORPG content. That’s why we always recommend a robust localisation CMS such as XLOC. Also, typical production time frames aren’t being extended proportionately as the content gets bigger.

Usually developers are under pressure to finish their game within a certain timeframe and there’s a need for the publisher to ship on time to fully capitalise on their investment. Since localisation comes last, our time frame often gets ‘squished’, because production took longer, but the release date is still fixed.

TSA: You’ve worked on some of the biggest series of the last decade, what level of pressure does that bring to your work?

Finn & Jan: I believe the two main challenges in localisation are working on a tight deadline and the typical ‘chaos’ that arises towards the end of production. Video game release dates are usually fixed and localization takes place in the very last stages of development, something you need to be prepared for. To handle a huge loc production under deadline pressure, you need to be razor-sharp once the first loc batch drops in.

You also need to manage the ‘chaos’ that arises when devs work on various last-minute updates to the game – be it gameplay improvements, adjustments to storytelling or bugs. As a loc provider, it’s our job to support the client by handling all of these updates in a timely manner and don’t overlook any consequences the single changes might have on consistency.

It’s easier to handle text translation updates than audio ones, therefore our clients do their best to lock in the audio at a certain point before release. However, it’s still possible that important scenes might need to be re-edited and could require last-minute pickup recordings. Again this is one of the moments where a seasoned loc provider can shine. Getting a VIP actor into the studio to record three lines and deliver the final cutscene audio stem right before gold master is almost like an art form. It’s one of the best things we can do to support our client’s successful release.

TSA: What are the challenges to getting a good performance out of a voice actor?

Finn & Jan: The biggest challenge is to inspire and capture high quality voice acting that makes the game a great experience for players! Especially as there are so many factors that can influence an actor’s performance. From my experience, creative freedom is the basis of any good performance. Even though we are ‘just’ localising, we need to maintain a healthy proximity to the original version in order to let the localised version shine.

This principle is found in all aspects of the production. Every actor interprets situations differently and reacts differently. Only if we allow for this freedom, will we achieve magical storytelling moments full of life. This being said, good actor performances start with good casting! We match the game characters with the talents of our actors. The more an actor reacts in a natural way, in tune with their character, the more convincing their performance will be in the game.

TSA: When you’re working with hugely famous actors like Christopher Lee or Alec Baldwin, do you find they have a different approach?

Finn & Jan: The process is always the same. Working with great actors is very fulfilling – for the audience as well as the voice director! When working with an especially experienced or talented actor, you want to get the most out of their recording session with proper preparation.

TSA: For the Elder Scrolls you matched up these names with local dubbing talent – how did that work?

Finn & Jan: It is true that popular actors have a set German counterpart who will be their voice in the local market. Whether in movies or TV series, these combinations seldom change. As Germans, we associate that ‘German’ voice with the original actor as if it was her or his own.

However, in video games you never see the actor’s real face so the above rule is not so strict. We can mix famous voices with game characters as we like, and make use of the effect it has on players.

However, if casting the voice of a game character which has been designed with a specific actor as reference, we might approach that actor’s German counterpart. Sir Cadwell, for example, has such a great portrayal by John Cleese in The Elder Scrolls Online that we absolutely wanted the ‘German John Cleese’ to do the job.

TSA: Where do you think the next step is going to be in voice performance technology?

Finn & Jan: Audio technology has progressed massively in recent times and there are a few technologies on the verge of becoming relevant to video games localisation. These have the potential to disrupt the current business, therefore it makes sense for us to adapt and integrate them into the work we do as early as possible.

The most intriguing ones are text-to-speech and ‘voice skins’ (or voice modulation). Text-to-speech is being used already for content which doesn’t need any emotional quality, such as e-learning, tutorials or similar. The current text-to-speech systems aren’t suited for anything beyond that, especially not for narrative-heavy video games content, but that might change once they provide a way to dynamically tune the voice to create something that resembles the changes of the human voice when emotions are being expressed.

On the other hand, ‘voice skins’ is a technology that alters a voice to make it sound like somebody else’s. It’s already on the market, at least at a level of quality suitable for in-game player chats. If the quality of voice skins continues to advance, it could provide a whole new approach to recording characters for a game. The process is as follows: You use a small group of talented actors (“dummy actors”) to record all the lines of a game and get raw voice data. This data includes all the acting flourishes and emotional nuances for each character. Then by applying voice skins, you can replace each voice until the ‘dummy’ cast of actors is extended to the full range of characters that will appear in-game.

Since the amount of voice actors employed on a project has a significant impact on budget, voice skins technology could be a cost-saver for the English original recordings as well as any localized recordings of the game. Also, imagine the flexibility. Last-minute changes to character dialogue could be quickly completed by recording a small pool of dummy actors and applying the corresponding voice skin. Instead of having to arrange a sudden recording with some super famous Hollywood actor!

There are pros and cons to using voice skins, but disruptive technology like this is something we professionals have to watch. It’s better to embrace change and innovation early than be left behind.

TSA: What does the arrival of the new console generation mean for the studio?

Finn & Jan: Theoretically, new recording setups could be introduced based on the enhanced surround sound capabilities (including voices) of the new consoles. For example, you might add a room sound mic to the regular direct one that is currently being used. However, I suspect the most prominent effect on the localization industry will be a surge of demand due to the new games being produced for next gen, as it has been for every new console generation. We will likely see even more sophisticated render scenes in need of high class voice acting and we will see an increase of size due to more complex games – more to translate and more to record.


Thanks to Finn & Jan for chatting with us. Keep an eye (or maybe an ear) out for Synthesis’ work in games like Doom Eternal, Monster Hunter: World, Dishonored and beyond. Synthesis is part of Keywords Studios, a technical and creative services provider to the videogames industry, with 50+ operational studios across 21 countries and four continents.

– PAGE CONTINUES BELOW –
Written by
TSA's Reviews Editor - a hoarder of headsets who regularly argues that the Sega Saturn was the best console ever released.