Automatic speech recognition using Amazon Transcribe

Amazon Transcribe makes it easy for developers to add speech to text capability to their applications. Audio data is virtually impossible for computers to search and analyze. Therefore, recorded speech needs to be converted to text before it can be used in applications. Amazon Transcribe uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately. Amazon Transcribe can be used to transcribe customer service calls, to automate closed captioning and subtitling, and to generate metadata for media assets to create a fully searchable archive.

In this lab, you will use Amazon Transcribe to convert speech to text.

Create project

  1. Create a new .NET Core console application project.

Comprehend

  1. Add the following Nuget packages to the project:
  • AWSSDK.TranscribeService
  • AWSSDK.S3

Nuget Nuget

  1. Add the following import statements to Program.cs:
using System;
using System.IO;
using System.Net;

using System.Threading.Tasks;

using Amazon.S3;
using Amazon.S3.Model;

using Amazon.TranscribeService;
using Amazon.TranscribeService.Model;
  1. Replace the Main method in Program.cs with the following async version and add BucketUri and _bucketName variables:
private const string BucketUri = "https://s3.eu-west-1.amazonaws.com/{0}/{1}";
private static string _bucketName;

static async Task Main(string[] args)
{
    if (args.Length != 2)
    {
        Console.WriteLine("Please provide audio file and language code.");

        return;
    }

    var filename = args[0];
    var langCode = args[1];

    _bucketName = "assets-" + Guid.NewGuid().ToString();

    await UploadInputFileToS3(filename);

    await TranscribeInputFile(filename, langCode);

    Console.WriteLine("The process is complete");
}
  1. Add method UploadInputFileToS3 that creates new S3 bucket and uploads input .mp3 file into it.

Please note that when you initialize AWS SDK’s AmazonS3Client, you need to pass the RegionEndpoint of the region you are making labs in. The code below initializes AmazonS3Client in the EUWest1 region.

static async Task UploadInputFileToS3(string fileName)
{
    var objectName = Path.GetFileName(fileName);

    using (var s3Client = new AmazonS3Client(Amazon.RegionEndpoint.EUWest1))
    {
        var putBucketRequest = new PutBucketRequest()
        {
            BucketName = _bucketName
        };

        var putBucketResponse = await s3Client.PutBucketAsync(putBucketRequest);

        if (putBucketResponse.HttpStatusCode != HttpStatusCode.OK)
        {
            Console.WriteLine("Couldn't create the S3 bucket!");
        }

        var putObjectRequest = new PutObjectRequest
        {
            BucketName = _bucketName,
            Key = objectName,
            ContentType = "audio/mpeg",
            FilePath = fileName
        };

        await s3Client.PutObjectAsync(putObjectRequest);
    }
}
  1. Add method TranscribeInputFile that creates an instance of type AmazonTranscribeServiceClient to start the transcribe job that takes the uploaded audio file from the newly created S3 bucket.

Please note that when you initialize AWS SDK’s AmazonTranscribeServiceClient, you need to pass the RegionEndpoint of the region you are making labs in. The code below initializes AmazonTranscribeServiceClient in the EUWest1 region.

static async Task TranscribeInputFile(string fileName, string targetLanguageCode)
{
    var objectName = Path.GetFileName(fileName);

    using (var transcribeClient = new AmazonTranscribeServiceClient(Amazon.RegionEndpoint.EUWest1))
    {
        var media = new Media()
        {
            MediaFileUri = string.Format(BucketUri, _bucketName, objectName)
        };

        var transcriptionJobRequest = new StartTranscriptionJobRequest()
        {
            LanguageCode = targetLanguageCode,
            Media = media,
            MediaFormat = MediaFormat.Mp3,
            TranscriptionJobName = string.Format("transcribe-job-{0}", _bucketName),
            OutputBucketName = _bucketName
        };

        var transcriptionJobResponse = await transcribeClient.StartTranscriptionJobAsync(transcriptionJobRequest);

        if (transcriptionJobResponse.HttpStatusCode != HttpStatusCode.OK)
        {
            Console.WriteLine("Couldn't create transcription job");
        }
    }

    Console.WriteLine("The transcription job request has been created successfully.");
}

Run application

You will the use the output file from previous lab, book-review-01.txt-en-us.mp3 as an input file for this lab.

If you have not completed the previous lab, download file below and make sure to save it locally:

book-review-01.txt-en-us.mp3

Now you can build the application and run it by passing the path to the sample text file:

Transcribe.exe c:\projects\book-review-01.txt-en-us.mp3 en-us

You will see the following output:

The transcription job request has been created successfully.
The process is complete

Open the AWS Console and navigate to the Amazon Transcribe, and make sure you can see the job, it should take few minutes before seeing that the job status is complete.

Transcribe job

Now navigate to S3 and go the bucket that was used to hold the input file, this bucket is also used to hold the output file from Amazon Transcribe job that we created:

Transcribe S3

Click on the transcribe-job-XXXXXXXX.json file and download it to your machine, you should see something similar to this:

Transcribe output