The Best Speech-To-Text For Cypriot Greek — Tested On Real Calls

Webinars

Learn how ASETO’s AI solutions can improve your business.

Stay in the Loop

Be the first to hear about updates, features, tips, and more.

If a voice agent can’t understand what a caller is saying, it can’t help them. That may seem obvious but it’s exactly the problem most businesses using voice AI in Cyprus are running into. Most speech-to-text systems will tell you they support Greek. Although that may be true, when it comes to Cypriot, that means different vocabulary, different sounds and almost no representation in the data they were trained on. Aseto.ai wanted to know just how big that gap is, so they tested six systems including the biggest names in commercial speech recognition, on 1,027 real Cypriot phone calls.

Why is Cypriot Greek So Difficult For Speech-To-Text?

Cypriot Greek isn’t just Standard Greek with a different accent. It differs in phonology, morphology, and everyday vocabulary and is barely used in the training data behind even the world’s biggest speech recognition systems. When a general-purpose model encounters a Cypriot Greek voice AI scenario, it’s effectively working with a dialect it has never properly learned. So it defaults to whatever Standard Greek word sounds closest, and that is where things go wrong.

Phone calls make it harder still. Telephony narrows the audio band, adds codec noise, and captures the spontaneous way people actually speak like natural pauses when speaking, background noise, or overlapping words. These are already tough conditions for any model, but combined with a dialect the system has barely seen, you end up with transcripts that lose the most important words in the call: local place names, medical symptoms, dialect verb forms. The words a voice agent depends on to do anything useful.

How Was This Benchmark Set Up?

Aseto evaluated six speech-to-text systems on 1,027 real phone-call samples in Cypriot Greek. 68 minutes of audio across more than 100 distinct speakers, drawn from customer support and IVR interactions. They were production-like calls, reflecting the actual conditions a Cypriot Greek voice AI agent operates in every day.

The systems tested were: Aseto’s own in-house model, three major commercial cloud services — Google Chirp 3, ElevenLabs Scribe v2, and Microsoft Azure Speech, and two widely used open models, faster-whisper large-v3 and NVIDIA Canary-1b-v2.

Every system was scored on Word Error Rate (WER) and Character Error Rate (CER), using an identical text-normalization pipeline so no system was at an advantage due to formatting differences. Then, reference transcripts were produced manually by a native Cypriot Greek speaker and reviewed in a second pass for accuracy.

What Did The Results Actually Show?

Aseto.ai came out the lowest of all six systems with a WER of 23.9%. The closest competitor, Google Chirp 3, came in at 29.1%. That’s an 18% relative reduction in word errors. Every other system sat above 36% WER, and the two weakest were Azure Speech at 47.8% and NVIDIA Canary-1b-v2 at 50.4%. They both had error rates roughly double Aseto’s. Character Error Rate told the same story, with Aseto sitting the lowest at 13.6% against Google Chirp 3 at 16.1%. This consistency across both metrics confirms the lead isn’t a quirk of how errors were counted.

Every other system mis-recognises Αλεθρικό, a village in Cyprus. The commercial systems substitute a common Standard Greek word (ηλεκτρικό, “electrical”, or αλιευτικό, “fishing”), while the open-weight models produce non-words (αλευρικό, αλεύριχο). Several systems also mishandle the repeated word at the start of the utterance. This is a named-entity failure: the village is precisely the information an IVR agent needs to route or answer the call, and only aseto.ai recovers it. A general-purpose system has no exposure to small Cypriot place names, so it defaults to whatever common word sounds closest and the routing signal is lost.Aseto.ai drops one short function word but preserves the entire clinical content, πονώ το πόδι μου (“my foot hurts”). Every other system degrades the part that matters most: ElevenLabs produces πώς το αποδίδω (“how I attribute it”), Google produces ποδήλατο (“bicycle”), and the two open-weight models break down further still, with Canary producing a largely unintelligible string. In each case the symptom, the actionable information in the call is lost. The caller here spoke with a heavy Cypriot accent over a phone line, which is what the competing systems failed on; for a Cypriot Greek voice agent, that is not an edge case but the normal operating condition of every call. For a voice agent routing a medical appointment, this is the difference between a usable transcript and a misrouted call.

Across both examples the pattern is consistent with the analysis in Section 2: the systems are not failing on audio quality but on Cypriot-specific language, local place names and dialect word forms defaulting to Standard Greek vocabulary that the caller did not use. In each case the substituted word carries the information the voice agent depends on, so the error is not cosmetic but operational. Read the full in depth study here.

So What’s the Takeaway?

Cypriot Greek performance is far from a solved problem across the industry. More than 20 percentage points separate the best and worst general-purpose system and none of them come close to a model that was built specifically for the dialect. Closing that gap requires data and modelling built around the dialect itself, not just Greek. For any voice agent operating over the phone in Cyprus, transcription accuracy isn’t a secondary concern, it sets up intent detection, routing, response generation, and the caller’s experience.

Aseto.ai runs its model on its own infrastructure. That means faster responses, costs you can actually predict, and your call data stays where it should. If you’re looking for voice and text solutions that actually work for Cypriot Greek speakers, visit aseto.ai and get in touch.

Unlock the Power of ASETO

Find out how ASETO helps your business automate calls, improve service quality, and scale effortlessly.