Convert text into lifelike speech using Amazon Polly

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly’s Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.

In this lab, you will use Amazon Polly to convert text into lifelike speech.

Create project

  1. Create a new .NET Core console application project.

Comprehend

  1. Add AWSSDK.Polly Nuget package to the project:

Nuget

  1. Add the following import statements to Program.cs:
using System;
using System.IO;
using System.Threading.Tasks;

using Amazon.Polly;
using Amazon.Polly.Model;
  1. Replace the Main method in Program.cs with the following async version:
static async Task Main(string[] args)
{
    if (args.Length != 3)
    {
        Console.WriteLine("Please provide text file, language code, and voice id.");

        return;
    }

    var fileName = args[0];
    var langCode = args[1];
    var voiceId = args[2];

    await ConvertTextToAudio(fileName, langCode, voiceId);
}
  1. Add method ConvertTextToAudio that creates an instance of the AmazonPollyClient and creates an object of type SynthesizeSpeechRequest, then it takes the text to speech/audio and saves the output in MP3 format. The ConvertTextToAudio method should look like this:

Please note that when you initialize AWS SDK’s AmazonPollyClient, you need to pass the RegionEndpoint of the region you are making labs in. The code below initializes AmazonPollyClient in the EUWest1 region.

static async Task ConvertTextToAudio(string fileName, string targetLanguageCode, string voiceId)
{
    var text = File.ReadAllText(fileName);

    var voice = VoiceId.FindValue(voiceId);

    using (var pollyClient = new AmazonPollyClient(Amazon.RegionEndpoint.EUWest1))
    {
        var speechRequest = new SynthesizeSpeechRequest
        {
            LanguageCode = targetLanguageCode,
            Text = text,
            OutputFormat = OutputFormat.Mp3,
            VoiceId = voice
        };

        var speechResponse = await pollyClient.SynthesizeSpeechAsync(speechRequest);

        string outputFileName = $"{fileName}-{targetLanguageCode}.mp3";

        FileStream output = File.Open(outputFileName, FileMode.Create);
        speechResponse.AudioStream.CopyTo(output);
        output.Close();
    }
}

Run application

Download book reviews using the following links, and make sure to save it locally:

book-review-01.txt

book-review-01_french.txt

Now you can build the application and run it by passing the path to the sample text files:

Polly.exe C:\projects\book-review-01.txt en-us Brian
Polly.exe C:\projects\book-review-01_french.txt fr-CA Chantal

Check the folder with original text files, you should see the following new files:

book-review-01.txt-en-us.mp3
book-review-01_french.txt-fr-CA.mp3 

Open the files and check the synthesized mp3 files in English and French.