Text-to-speech audio

Updated on

March 1, 2023

In this VR Builder tutorial, you will learn how to use the Microsoft text-to-speech (TTS) synthesizer in VR Builder to create audio instructions by typing text only. Audio instructions are a powerful way to guide your users in virtual reality. Please note that in case you are using an operating system other than Windows,  another text-to-speech synthesizer is needed. 


  1. Adding text-to-speech audio in Unity
  2. Changing the language settings of the text-to-speech audio
  3. Changing the voice gender of text-to-speech audio
  4. Managing Text-To-Speech files
  5. Next Steps

Adding text-to-speech audio in Unity

For this tutorial, you don't require any additional objects since we only want to demonstrate the synthesized text audio in VR Builder. This Unity scene has been configured using VR Builder's Setup Wizard. In case you need further assistance, please refer to the tutorial on how to set up VR Builder.

We created a simple, one-step process and add a Play TextToSpeech Audio behavior to it. To do so, go to the Step Inspector > Behaviors > Add Behavior > Guidance > Play TextToSpeech Audio.


In the text field you can write any text you want your users to hear in VR. In this example, we wrote in German: Hallo. Ich bin ein Berliner. It's the famous quote from John F. Kennedys and translates to "Hello. I am a jelly donut".

The written text will be synthesized by Microsoft's SAPI and played in VR. This example is already ready to go. Give it a try by clicking preview to hear it!

Changing the language settings of the text-to-speech audio

The default settings of your text-to-speech audio are in English. This is why you have just heard the voice reading the German text with an English accent. If you want your text-to-speech audio to be spoken in German with a German accent, you can set this via Tools > VR Builder > Settings.

In the VR Builder settings pop-up window, select Language > Application Language. You can change the language by editing the Application Language field. In my case, I added De for Deutsch (i.e. German).

Changing the voice gender of text-to-speech audios

In our current example, the gender of the voice is set to male. However, we have only heard female voices. This is because I don't have a male TTS voice on my operating system that is SAPI compatible, so it is replaced with female voices. If you want to add male voices to your Unity VR application, look for Settings > Change text-to-speech settings on Windows.

From the Voices sub-tab you can select your preferred voice from those already available on your computer. The Manage voices sub-tab lets you add additional voices to your operating system.

Please note that not all of these voices are SAPI-compatible. To check which voices are actually SAPI-compatible, go to Control Panel > Speech Recognition > Advanced Speech Options > Text to Speech.

In our example, I have currently  three SAPI-supporting voices available. These are

  • Hazel (female, UK English)
  • Hedda (female, German) and
  • Zira (female, US English).

In a nutshell, in case you require additional languages or voices, look for another SAPI compatible voice and add it to the synthesizer in the text-to-speech settings.

Managing Text-To-Speech Files

In this tutorial we use Microsoft SAPI text-to-speech synthesizer. The big advantage of using Microsoft SAPI is that you don't need an internet connection for synthesis. The disadvantage of using Microsoft SAPI is that you need the Windows operating system to run your VR application. If you are using an operating system other than Windows or want to run your VR Unity application this will not work.If you need another provider, you can extend the Text-To-Speech feature and include your preferred provider.

On build, all Text-To speech files will be generated but a workaround to this is to to Tools > VR Builder > Settings > Language and click on Generate all TTS files.

After that, the text-to-speech synthesis files are created. The TTS audio files can be found under Project > Assets >StreamingAssets.

When you launch the application again, the software first checks whether the required TTS audio files already exist. Only if this is not the case, a new TTS audio file is created. For example, if I now run the application with German as the application language, a German synthesis file will be created.

You can view the newly synthesized TTS audio with two different tools. First, you can find your synthesized TTS audio files in the StreamingAssets folder in Unity.

Second, you can view them in the Explorer. To locate them, search for Project > Assets > Streaming Assets. Then right-click StreamingAssets and choose Show in Explorer.

In the Explorer, select StreamingAssets. A text-to-speech folder appears. Here you can find the TTS Microsoft SAPI text-to-speech file in German with a unique identifier.

If the application is running with English as the application language, the TTS audio files are generated in English.

Note that the identifier is the same because both versions refer to the same behavior in the same step of the VR application in Unity.

To quickly refresh the synthesized TTS audio files in the language of your choice, can Flush the generated files and click on Generate again in the Settings menu..

Next steps

In this VR Builder tutorial, you learned how to guide your users intuitively by using VR Builder's TTS engine and customizing the synthesized text-to-speech audio files to your needs. However, audio instructions are just one way to guide your users more intuitively in VR. What if you want your users to place virtual objects in specific positions? Check out our tutorial on creating snap zones in Unity!

Ready to get Started?

Download Vr Builder