<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=959396&amp;fmt=gif">

Leveraging the AWS AI Stack

Asad Chagtai 9 October 2018
Asad Chagtai

Artificial Intelligence (AI) is a term we are hearing more and more these days. Whether you believe it will ‘take our jobs’ or not (I personally believe it will change our jobs!), it will surely play an ever-growing role in the future of IT and computing.

Although people have been talking about AI since the late 1960s, it is still in its relative infancy. Looking as far back as the computer ‘Hal’ from 2001: A Space Odyssey (1968), it is proving to take many decades to make such intelligent technology a reality. Whilst amazing technical advancements have been made since then, demand has increased due to more possible use cases.

Download our guide to find out '3 Ways Amazon Connect Will Transform Your  Customer Service Capabilities' .




This post is designed to give a general overview on some of the lesser known AI services offered by Amazon Web Services (AWS). Their main service is Lex, which powers Alexa, and has many potential applications for web and voice communications. However, we will focus on: 

  • Amazon Transcribe
  • Amazon Comprehend
  • Amazon Translate


Amazon Transcribe

At a very high level, Amazon Transcribe takes in an audio file (WAV, MP3, MP4 or FLAC) and outputs a transcript in a commonly recognised format that computers can understand and handle, which is the JSON (JavaScript Object Notation) format. These JSON files include confidence scores (between 0 and 1) for each word it ‘heard’. Currently, only English and Spanish are supported for analysis, although other languages can be detected. In the Transcribe console within the AWS web interface, you can see just the text as below:


Amazon Transcribe example 1


Naturally, there are some inaccuracies in certain parts of the transcription, particularly with names (e.g. ‘for us’ is meant to be my colleague ‘Faraz’). However, Transcribe gives us the option to upload CSV or TXT custom vocabulary files to improve the accuracy by defining things (‘entities’) such as the names of places, people etc. Although Transcribe did get the essence of the conversation right.

In the context of a contact centre, if you enable ‘Channel identification’ when creating the Transcribe job, you can separate what the ‘caller’ (Channel 0) and ‘agent’ (Channel 1) said, as per below:


Amazon Transcribe example 2


Amazon Transcribe example 3


Amazon Comprehend

Amazon Comprehend can give some meaningful insights once you have your transcript. It can detect up to 10 different voices using the ‘speaker identification’ feature, although its accuracy is significantly increased if you specify the number of different speakers yourself.

In contact centres, the interaction between the caller and the agent is split between two audio channels, namely left and right. Transcribe separates these channels using the ‘channel identification’ feature. Comprehend has a feature called ‘sentiment analysis’ that can be carried out for both parties, which consists of a score based upon four aspects: 

  • Positive
  • Negative
  • Mixed
  • Neutral


The total of these scores adds up to 1, like a percentage analysis. The example above is of a dissatisfied customer calling to complain about receiving an order late, a lack of communication and the goods being damaged. The scores for the customer are as follows:



    "SentimentScore": {

        "Negative": 0.681151807308197,

        "Positive": 0.06493230164051056,

        "Mixed": 0.12935879826545715,

        "Neutral": 0.12455703318119049


    "Sentiment": "NEGATIVE"



Here it gives a dominant sentiment, which is clearly NEGATIVE in this case. Now, the agent is trying to find out what happened as well as reassure the customer:



    "Sentiment": "NEUTRAL",

    "SentimentScore": {

        "Positive": 0.06552381813526154,

        "Negative": 0.027074486017227173,

        "Neutral": 0.8760638236999512,

        "Mixed": 0.03133794292807579



The overwhelming NEUTRAL sentiment makes sense as the agent is trying to be impartial, with very little POSITIVE or NEGATIVE sentiment in there. A good listener should not offer any opinion or sentiment either way.


These scores can be saved as JSON files and stored using Amazon’s S3 service (think ‘block storage in the cloud’). In order to further manipulate and analyse the data, we will look at this in the Use Cases section below.


Amazon Translate

Amazon expects another six languages to be added by the end of 2018. Currently, Amazon Translate can translate between English and the following 12 languages: 

  • Arabic
  • Chinese (Simplified)
  • Chinese (Traditional)
  • Czech
  • French
  • German
  • Italian
  • Japanese
  • Portuguese
  • Russian
  • Spanish
  • Turkish


Translate can dynamically detect the source language, so there is no need to specify the source language – only the target language. It can also work on text files and real-time streams. However, the audio or video streams will need to be converted to, for example, WebVTT files, where WebVTT is a W3C standard for displaying timed text in connection with HTML5. Each line is translated with a time stamp in the stream, which enables tracking of who is saying what and when.

Amazon Translate

You can invoke Translate in a variety of ways; namely the web console interface, the Command Line Interface (CLI) or one of the available SDKs – which would involve programming code in JavaScript, Python, Ruby etc.

The web console is OK for a one-off single job. For a one-off batch of jobs, you could use a unix/linux script that incorporates CLI commands. However, for a truly automated system, it is best to use a Lambda function, as we will discuss next. 


Use Cases

A scenario that springs to mind in the contact centre world is one where call recording audio files stored in Amazon S3 can be grabbed and put through Transcribe to create a text-based transcript. This could be achieved by running a JavaScript or Python program using AWS Lambda, which is a service that allows code to be run without the use of servers. If you wish, you could use your script or program to extract the whole conversation with or without the channel separation.

The same code can then invoke Comprehend to get an idea of customer satisfaction as well as agent performance based on sentiment analysis. The data can be written to a variety of database solutions then be further manipulated using an analytics tool of your choice.

Another example is web chat for multinational companies. Both customer and agent can be using their respective native languages and Translate can make the conversation bilingual. Once again, Comprehend can be employed to gauge customer satisfaction and agent performance by looking at sentiment analysis scores.

  Amazon Connect CTA

Download the next generation customer contact webcast

Recent Posts

Get in touch today to discuss your requirements