Pronunciation | Andres Castano

These pages use TTS (text-to-speech) so highlight a word to hear it in Japanese; the TTS works better in PCs than in mobile devices, where the highlighting is sometimes mistaken for a command gesture.

The pronunciation of Japanese is very regular; for the most part, Japanese words sound as they are written in hiragana and katakana. Altogether, there are 110 different native sounds in Japanese, a walk in the park compared to the thousands that we have in English.

Vowel sounds

In Japanese, the order of the vowels is ‘a, i, u, e, o’; their sound is pure and sharp, similar to the vowels in Spanish, except for the ‘u’, which is sharper in Spanish than in Japanese.

the ‘a’ (hir. あ, kat. ア) sounds like the ‘a’ in ‘axe’

anata – あなた
formal ‘you’

atama – あたま
head

sakana – さかな
fish
the ‘i’ (hir. い, kat. イ) sounds like the ‘i’ in ‘ink’

migi – みぎ
right direction

kimi – きみ
casual ‘you’

nichi – にち
day
the ‘u’ (hir. う, kat. ウ) sounds like the ‘o’ in ‘who’, or the ‘u’ in the name ‘Uma’

uta – うた
song

umi – うみ
sea

kuruma – くるま
car
the ‘e’ (hir. え, kat. エ) sounds like the ‘e’ in ‘pen’, or ‘elf’

kesa – けさ
this morning

eki – えき
train station

ebi – えび
shrimp
the ‘o’ (hir. お, kat. オ) sounds like the ‘o’ in ‘ox’

kodomo – こども
child

tokoro – ところ
place

otoko no ko – おとこのこ
boy

Doubling vowels

There are no diphtongs in Japanese so each appearance of a vowel is pronounced as part of a different syllable, e.g., ‘tooi’ (とおい, ‘far’) has three syllables, and is pronounced in three beats: ‘to-o-i’.

in hiragana, doubling the vowel doubles its length:

English
your mother
your brother
week
your sister
ice

romaji
o-kaa-san
o-nii-san
shuu
o-nee-san
koori

kana
おかあさん
おにいさん
しゅう
おねえさん
こおり

sounds…
o-ka-a-sa-n
o-ni-i-sa-n
shu-u
o-ne-e-sa-n
ko-o-ri
in hiragana, an ‘i’ after an ‘e’ sound repeats the ‘e’ sound:

English
the English lang.
movie
teacher

romaji
eigo
eiga
sensei

kana
えいご
えいが
せんせい

sounds…
e-e-go
e-e-ga
se-n-se-e

A few exceptions are:

English
ray fish
sigh

romaji
ei
tame-iki

kana
えい
ためいき (ため息)

sounds…
e-i
ta-me-i-ki

‘tame-iki’ (ため息) is a word composed of two words that we pronounce separately: ‘tame’ (ため- to collect) and ‘iki’ (いき- breath).
in hiragana, a ‘u’ after an ‘o’ sound repeats the ‘o’ sound

English
good morning
very
thanks

romaji
ohayou
doumo
arigatou

kana
おはよう
どうも
ありがとう

sounds…
o-ha-yo-o
do-o-mo
a-ri-ga-to-o

A few exceptions are:

English
to think
to get lost

romaji
omou
mayou

kana
おもう
まよう

sounds…
o-mo-u
ma-yo-u
in katakana, a ‘ー’ (dash) repeats the previous vowel

English
ramen
beer
news
cake
cola
coffee

romaji
raamen
biiru
nyuusu
keeki
koora
koohii

kana
ラーメン
ビール
ニュース
ケーキ
コーラ
コーヒー

sounds…
ra-a-me-n
bi-i-ru
nyu-u-su
ke-e-ki
ko-o-ra
ko-o-hi-i

Vowel special cases

For the most part, every vowel is pronounced. However, it has become the norm to whisper or drop the ‘u’ and the ‘i’ in some cases; this is called devoicing:

sometimes the ‘u’ (う) sound is faint or omitted, specially in ‘ku’, ‘tsu’ and ‘su’:

English
taxi
your wife
many
moon
desk
to hold
a little
am, is, are
formal verb form
west; waist; waste

romaji
takushii
okusan
takusan
tsuki
tsukue
motsu
sukoshi
desu
masu
uesuto

kana
タクシー
おくさん
たくさん
つき
つくえ
もつ
すこし
です
ます
ウエスト

sounds…
ta-k-shi-i
o-k-sa-n
ta-k-sa-n
ts-ki
ts-ku-e
mo-ts
s-ko-shi
de-s
ma-s
u-e-s-to
sometimes the ‘i’ (い) sound is faint or omitted, specially in ‘shi’ (し) and ‘chi’ (ち):

English
we
tomorrow
why

romaji
watashitachi
ashita
doushite

kana
わたしたち
あした
どうして

sounds…
wa-ta-sh-ta-ch
a-sh-ta
do-o-sh-te

Another example is the disappearance of the い from the えい combination that forms when we follow a ‘te’ form verb, i.e., a verb that ends in て, って or んで, with いる/います or any or its conjugations, e.g., -ている becomes -てる, -っています becomes -ってます, -んでいた becomes -んでた, etc. The following vanishing acts of い are courtesy of the manga ふらいんぐうぃっち:
…しっている ⇒ …しってる (I know …)
…そらとんでいる (flying …)
…とどいています (reported …)
…みている ⇒ …みてる (watching)
みていた ⇒ みてた (I saw)

Consonant sounds

Most Japanese sounds approximate an English sound. Here are a few unusual ones.

the ‘r’ is like the Spanish ‘r’ in ‘cara’ or ‘toro’, not like the English ‘r’ in ‘ram’ or ‘car’.

English
color
noon
six

romaji
iro
hiru
roku

kana
いろ
ひる
ろく
fu (hir. ふ, kat. フ) sounds like a mix of ‘fu’ and ‘hu’, like the English word ‘who‘ spoken just blowing air, without changing the shape of the mouth:

English
boat
futon
bath

romaji
fune
futon
furo

kana
ふね
ふとん
ふろ

sounds…
‘who‘-ne
‘who‘-to-n
‘who‘-ro
the ‘n’ (hir. ん, kat. ン) is a separate syllable, so it takes an additional ‘beat’ to pronounce it:

English
teacher
three people
bookstore

romaji
sensei
sannin
honya

kana
せんせい
さんにん
ほんや

sounds…
se-n-se-e
sa-n-ni-n
ho-n-ya
the ‘tsu’ sound (hir. つ, kat. ツ) didn’t exist in English, but now we find it in some Japanese-borrowed words:

English
tsunami
ju-jutsu
shiatsu

meaning
tidal wave
martial art
acupressure

kana
つなみ
じゅじゅつ
しあつ

sounds…
tsu-na-mi
ju-ju-tsu
shi-a-tsu
when speaking casually, some ‘m’ and ‘n’ dissapear:

English
father
mother
excuse me

Japanese
o-to-o-sa-n
o-ka-a-sa-n
su-mi-ma-se-n

casual
o-to-o-sa
o-ka-a-sa
su-i-ma-se-n

sounds…
おとおさ
おかあさ
すいません

Consonant special cases

ha (は) is always pronounced ‘wa’ when used as a particle
he (へ) is always pronounced ‘e’ when used as a particle
wo (を) is often pronounced ‘o’ when used as a particle
We might think that ‘kingyo’ is pronounced ‘king-yo’, or ‘atsui’ is ‘at-sui’, but the sounds ‘ing’ and ‘at’, as well as many others, don’t exist in Japanese:

English
goldfish
hot

romaji
kingyo
atsui

kana
きんぎょ
あつい

sounds…
ki-n-gyo
a-tsu-i
the ‘n’ (ん) before a ‘b’, ‘m’, or ‘p’ sounds like an ‘m’, so in these cases, the roman version of such ん is not ‘n’ but ‘m’; this is an example of euphony, i.e., making a sound both pleasing to the ear and easier to pronounce:

English
dragonfly
stroll
3 flat things

romaji
tonbo
sanpo
sanmai

kana
とんぼ
さんぽ
さんまい

sounds…
to-m-bo
sa-m-po
sa-m-ma-i

Here are some examples of this special case:
なんば (nanba) sounds ‘namba’ (src: JPRail)
かんばら (kanbara) sounds ‘kambara’
てんま (tenma) sounds ‘temma’

Pitch accent

Many Japanese words truly have no pre-defined pitch accent, e.g., the word ‘ichi’ (one) is normally pronounced ‘ichi’ (flat), but it might be ‘ichi’ or ‘ichi‘ depending on the context, or the dialect. However, some words do have a specific pitch [wikipedia]. For example:

romaji
kami (sama)
kami

ame
ame

hashi
hashi

kaki
kaki

English
god, deity, spirit
hair

rain
hard candy

chopsticks
bridge

oyster
persimon

kana
かみ
かみ

あめ
あめ

はし
はし

かき
かき

kanji
神
髪

雨
飴

箸
橋

牡蠣
柿

The kana do not have accents that indicate pitch; the kanjis do not give a clue either; thus, there is no alternative but to listen to a native speaker and memorize the pitch, if any. Still, there are a few hints that can help in certain cases.

Compound words

In English, when we put together two or more words to form a compound word, the compound word preserves the pitches of its component words, e,g,

belly + button → belly-button
carry + over → carry-over

In spite that these compound words are now single words, we still pronounce each of its components with their original pitches, as if we were pronouncing two different words. Japanese does the same, i.e., the components of compound words are pronounced as if they were individual words:

kami (God) + sama (lord) → kami-sama (God)
ashi (foot) + kubi (neck) → ashi-kubi (ankle)
mizu (water) + umi (sea) → mizu–umi (lake)

If the component words happen to be one-syllable long, then we might end up with what appear to be different pronunciations of the same word, when in reality all we are doing is stressing one of the component words. In English, suppose that we have the word ‘twenty-five’; we could stress ‘twenty’ or ‘five’ to draw attention to that component of the word, or pronounce them flat. This is more difficult to see in Japanese where the compound words can be so small that we tend to think of them as single words (e.g., ‘gohan’) instead of multiple words (e.g., ‘go-han’):

English
meal
tonight
weather
telephone

1st syllable
go (honorific)
kon (this)
ten (sky)
den (electric)

2nd syllable
han (cooked rice)
ban (evening)
ki (atmosphere)
wa (talk)

compound word
go-han
kon-ban
ten-ki
den-wa

However, Japanese takes this a bit further. If we have a single word that is being modified, say, with a suffix, both the word and the suffix keep their pitches:

I drink
I don’t drink
I want to drink
I don’t want to drink

nomi + masu → nomi–masu
nomi + masen → nomi-masen
nomi + tai → nomi–tai
nomi + taku + nai → nomi–taku-nai

Hence, the pronunciation tends to be correct when we treat the components of a word as separate words (e.g., nomi–masu), each with its own pitch (if any), instead of considering the word as a single unit (e.g., nomimasu) and attempting to single out a particular syllable.

Dialects

Finally, native speakers from different regions of Japan might pronounce words in different ways. For example, the Japanese spoken in Tokyo, which is considered the ‘standard’ Japanese, tends to stress the first syllable, while the Kansai dialect (e.g., Kyoto, Osaka) tends to stress the last one:

region
Tokyo
Kansai region

thanks
arigatou
arigatou

The differences between dialects go way beyond pitch, though. A kansai-dialect speaker would pronounce ‘arigatou’ different from a Tokyoite but, actually, he or she is more likely to give thanks using the local dialect word, i.e., 大きに (ookini); even different regions with the same dialect will speak in different ways, e.g., we could say that the kansai dialect covers, say, Osaka, Hyogo, and Kyoto, but there are marked differences among their speech. Dialects like those of Hokkaido, Okinawa, and many others, have yet their own idiosyncrasies.