How we found it
When we were building AltSpeak, we evaluated every major TTS provider. ElevenLabs was the obvious choice. OpenAI TTS was in the mix. Then we dug into Google Cloud and found Chirp3-HD, a voice family most people scroll right past.
It is not marketed. There is no product page. It lives in the Google Cloud Text-to-Speech API docs alongside older voice families. If you are not specifically looking for it, you will miss it entirely.
Google never ran a launch campaign for it. The voices appeared in the API docs one day next to the older Standard and Neural2 families, and the tech press kept scrolling. We did not.
That is a mistake. Chirp3-HD covers 59 languages with native pronunciation, and on Google Cloud TTS it runs around $16 per million characters on the Neural tier, a fraction of what a per-seat voice subscription costs at volume.
What makes it different
This is Google's latest generation TTS model. It outputs audio at 24kHz, studio-quality territory. The voices sound natural without the over-polished smoothness that makes some AI speech feel off.
Chirp3-HD covers 59 languages with native pronunciation, and that count undersells it. The detail that matters: these voices are natively trained, not translated. One voice speaks English, Japanese, and Arabic, and in each one it sounds like a native speaker rather than a single voice reading a foreign script phonetically.
That is the technical difference that matters. Most multilingual TTS bolts one voice model onto different languages, and you get pronunciation that is technically correct but clearly accented. Native training fixes that at the source.
How it stacks up against ElevenLabs
ElevenLabs runs a proprietary model they have refined for years. Their voice cloning is the strongest commercial option around, and the output is excellent for English and European languages.
Chirp3-HD has a different strength. It handles Japanese, Arabic, Korean, and Hindi with native-quality output that ElevenLabs does not match as consistently. If you work in Asian or Middle Eastern languages, you hear the gap immediately.
For English, both are excellent. ElevenLabs wins on voice cloning. Chirp3-HD wins on multilingual quality and on cost at scale, where Google Cloud TTS runs around $16 per million characters on its Neural tier versus ElevenLabs Creator at $22 a month for a fixed character cap.
Why nobody is talking about it
Google does not build consumer products around this engine. You cannot go to a website, type text, and hear it. It is API-only, accessed through Google Cloud, so you need a developer account and a billing setup before you can even evaluate it.
ElevenLabs has a free tier with a web interface. You hear their voices in 30 seconds. Google requires Cloud Console, billing setup, and API calls. That friction keeps most people from discovering it.
AltSpeak removes that friction. We wrap Chirp3-HD in a web app so you can type text and hear 59 languages of native-quality output without touching a single API call.
The tech blogs have not picked it up either because there was no launch campaign. It just appeared in the API docs one day. That is very Google.
Why we built AltSpeak on it
When we tested Chirp3-HD against everything else, the output quality landed at the top of the pile. Google's infrastructure gives us reliability we can lean on. The 59-language native coverage let us build something that actually works for non-English users instead of faking it with translated voices.
We pair it with Inworld TTS-2 for character and gaming work, where expressive performance and crosslingual switching matter more than pure naturalness. Together they push AltSpeak past 200 voices and 100 languages. For most professional TTS jobs, Chirp3-HD is the backbone.
AltSpeak gives you 200+ studio-quality voices across 100+ languages, up to 50,000 characters per generation, with paid plans from $5 to $63 a month and a free tier of 10,000 credits that needs no card. It is the fastest way to hear what Chirp3-HD sounds like. You can try it yourself.
Chirp 3 HD bills per character through the Google Cloud Text-to-Speech API, in the high-fidelity HD voice band at roughly $16 per 1M characters (checked 2026-06-15). Google's pricing page renders in JavaScript and does not scrape cleanly, so confirm the exact HD-tier figure on cloud.google.com/text-to-speech/pricing before you quote it. Google's free tier covers 1M HD characters a month. Standard non-HD voices run about $4 per 1M characters.
For English, both sound excellent. ElevenLabs leads on voice cloning, which is its headline feature. Chirp 3 HD leads on native-trained multilingual audio (Japanese, Arabic, Korean, Hindi) and on cost at scale, since it bills per character with no monthly floor. ElevenLabs Creator is $22 a month standing price (the $11 you may see is a first-month promo only, checked 2026-06-15). Pick by the job, not the brand.
Chirp 3 HD is API-only, so Google ships no web text box of its own. The fastest no-code route is a hosted studio built on it. AltSpeak runs Google Chirp 3 HD alongside Inworld TTS-2 in the browser, with a free tier of 10,000 credits and no card needed, so you can hear the voices in under a minute at torpenhow.ai/altspeak.
Chirp 3 HD covers 59 languages, and the voices are natively trained rather than translated. One voice can speak English, Japanese, and Arabic and sound like a native in each. That native training is the technical reason it beats most multilingual TTS on non-English audio, where the usual approach points one English voice at every script.
No. They are separate voice families in the same Google Cloud Text-to-Speech API. Google added Gemini-TTS in 2026 for steerable, prompt-driven delivery, and per Google's docs Chirp 3 HD voices keep working with minimal changes. Chirp 3 HD is the studio-quality multilingual workhorse; Gemini-TTS is the newer, more directable sibling sitting next to it in the docs.
Direct through the Google Cloud API at roughly $16 per 1M HD characters (checked 2026-06-15) is cheapest if you can build the integration yourself, on top of a Cloud project and billing setup. If you want a UI plus extra engines and commercial rights baked in, AltSpeak Creator is $11 a month for 100,000 credits, about half of ElevenLabs Creator at $22 a month.