Amazon Polly
Developer Guide

Visemes and Amazon Polly

A viseme represents the position of the face and mouth when saying a word. It is the visual equivalent of a phoneme, which is the basic acoustic unit from which a word is formed. Visemes are the basic visual building blocks of speech.

Each language has a set of viseme that correspond to their specific phonemes.. In a language, each phoneme has a corresponding viseme that represents the shape that the mouth makes when forming the sound. However, not all visemes can be mapped to a particular phoneme because numerous phonemes appear the same when spoken, even though they sound different. For example, in English, the words "pet" and "bet" are acoustically different. However, when observed visually (without sound), they look exactly the same.

The following chart shows a partial list of International Phonetic Alphabet (IPA) phonemes and Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols as well as their corresponding visemes for US English voices.

For the complete table and tables for all available languages, see Phoneme and Viseme Tables for Supported Languages.

IPA

X-SAMPA

Description

Example

Viseme

Consonants

b

b

Voiced bilabial plosive

bed

p

d

d

Voiced alveolar plosive

dig

t

d͡ʒ

dZ

Voiced postalveolar affricate

jump

S

ð

D

Voiced dental fricative

then

T

f

f

Voiceless labiodental fricative

five

f

g

g

Voiced velar plosive

game

k

h

h

Voiceless glottal fricative

house

k

...

...

...

...

...