Visemes and Amazon Polly

A viseme represents the position of the face and mouth when saying a word. It is the visual equivalent of a phoneme, which is the basic acoustic unit from which a word is formed. Visemes are the basic visual building blocks of speech.

Each language has a set of viseme that correspond to their specific phonemes. In a language, each phoneme has a corresponding viseme that represents the shape that the mouth makes when forming the sound. However, not all visemes can be mapped to a particular phoneme because numerous phonemes appear the same when spoken, even though they sound different. For example, in English, the words "pet" and "bet" are acoustically different. However, when observed visually (without sound), they look exactly the same.

The following chart shows a partial list of International Phonetic Alphabet (IPA) phonemes and Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols as well as their corresponding visemes for US English voices.

For the complete table and tables for all available languages, see Languages in Amazon Polly.

IPA	X-SAMPA	Description	Example	Viseme
Consonants
b	b	Voiced bilabial plosive	bed	p
d	d	Voiced alveolar plosive	dig	t
d͡ʒ	dZ	Voiced postalveolar affricate	jump	S
ð	D	Voiced dental fricative	then	T
f	f	Voiceless labiodental fricative	five	f
g	g	Voiced velar plosive	game	k
h	h	Voiceless glottal fricative	house	k
...	...	...	...	...

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Speech mark types

Speech mark output