July 27, 2022
In this VR Builder tutorial, you will learn how to use the Microsoft text-to-speech (TTS) synthesizer in VR Builder to create audio instructions by typing text only. Audio instructions are a powerful way to guide your users in virtual reality. Please note that in case you are using another operating system, another text-to-speech synthesizer is needed. Section 4 covers how you can still benefit from Microsoft text-to-speech synthesized audio files when running your application on Android based standalone devices.
For this tutorial, you don't require any additional objects since we only want to demonstrate the synthesized text audio in VR Builder. This Unity scene has been configured using VR Builder's Setup Wizard. In case you need further assistance, please refer to the tutorial on how to set up VR Builder.
We created a simple, one-step process and add a Play TextToSpeech Audio behavior to it. To do so, go to the Step Inspector > Behaviors > Add Behavior > Guidance > Play TextToSpeech Audio.
In the text field you can write any text you want your users to hear in VR. In this example, we wrote in German: Hallo. Ich bin ein Berliner. It's the famous quote from John F. Kennedys and translates to Hello. I am a Berliner.
The written text will be synthesized during runtime by Microsoft's SAPI and played in VR. This example is already ready to go. Give it a try!
The default settings of your text-to-speech audio are in English. This is why you have just heard the voice reading the German text with an English accent. If you want your text-to-speech audio to be spoken in German with a German accent, you can set this via Tools > VR Builder > Settings.
In the VR Builder settings pop-up window, select Language > Application Language. You can change the language by editing the Application Language field. In my case, I added De for Deutsch (i.e. German).
In our current example, the gender of the voice is set to male. However, we have only heard female voices. This is because I don't have a male TTS voice on my operating system that is SAPI compatible, so it is replaced with female voices. If you want to add male voices to your Unity VR application, look for Settings > Change text-to-speech settings on Windows.
From the Voices sub-tab you can select your preferred voice from those already available on your computer. The Manage voices sub-tab lets you add additional voices to your operating system.
Please note that not all of these voices are SAPI-compatible. To check which voices are actually SAPI-compatible, go to Control Panel > Speech Recognition > Advanced Speech Options > Text to Speech.
In our example, I have currently three SAPI-supporting voices available. These are
In a nutshell, in case you require additional languages or voices, look for another SAPI compatible voice and add it to the synthesizer in the text-to-speech settings.
In this tutorial we use Microsoft SAPI text-to-speech synthesizer. The big advantage of using Microsoft SAPI is that you don't need an internet connection for synthesis. The disadvantage of using Microsoft SAPI is that you need the Windows operating system to run your VR application. If you are using an operating system other than Windows or want to run your VR Unity application on Android devices later, this will not work.
The good news is that there is a solution for standalone devices. If you develop your VR content on Windows, you can enable caching for the synthesized files. The cached files can still be used on other devices with different operating systems. All you need to do is go to Tools > VR Builder > Settings > Language and check the Save Audio Files To Stream box.
Before you synthesize your TTS audio files, you need to run the application once so that it creates all the necessary files. To have all files created, run the complete application at least once. After that, the text-to-speech synthesis files are created. The TTS audio files can be found under Project > Assets > StreamingAssets.
When you launch the application again, the software first checks whether the required TTS audio files already exist. Only if this is not the case, a new TTS audio file is created. For example, if I now run the application with German as the application language, a German synthesis file will be created.
You can view the newly synthesized TTS audio with two different tools. First, you can find your synthesized TTS audio files in the StreamingAssets folder in Unity.
Second, you can view them in the Explorer. To locate them, search for Project > Assets > Streaming Assets. Then right-click StreamingAssets and choose Show in Explorer.
In the Explorer, select StreamingAssets. A text-to-speech folder appears. Here you can find the TTS Microsoft SAPI text-to-speech file in German with a unique identifier.
If the application is running with English as the application language, the TTS audio files are generated in English.
Note that the identifier is the same because both versions refer to the same behavior in the same step of the VR application in Unity. To quickly generate the synthesized TTS audio files in the language of your choice, just run the application in this language again.
In this VR Builder tutorial, you learned how to guide your users intuitively by using VR Builder's TTS engine and customizing the synthesized text-to-speech audio files to your needs. However, audio instructions are just one way to guide your users more intuitively in VR. What if you want your users to place virtual objects in specific positions? Check out our tutorial on creating snap zones in Unity!