Lost in Translation: Why AI Struggles with South Slavic Languages (and What We Can Do About It)

Ciklopea 2 weeks ago 5 min.

As large language models (LLMs) like GPT redefine how we translate and communicate across borders, one persistent issue remains: AI still struggles with South Slavic languages — especially Croatian, Serbian, Bosnian, and Slovenian. 

Despite being spoken by millions, these languages are often misrepresented or blended together in AI-generated content. The results are frequently inaccurate, unidiomatic, or just unintentionally hilarious. 

Take this example: 

“Potrebujesz jednoduchsi sposob, ako získat zaplatené?”
A confusing mix of Polish, Slovak, and Czech — and completely off for native speakers. 

Or this one: 

“vybudovali dôveru a привиедли viac”
A Slovak sentence that unexpectedly jumps into Russian Cyrillic mid-thought. 

But it’s not just a Central European problem. South Slavic speakers regularly see their languages treated as interchangeable or corrupted: 

“Dragi korisnik, vaš račun je zatvoren zbog техничких razloga.”
(Croatian + Serbian Cyrillic in a single sentence. Not ideal.) 

South Slavic ≠ One Language 

To a language model, these languages may seem similar. But to native speakers — and especially in formal or professional contexts — mixing Bosnian, Croatian, Serbian, and Slovenian is not acceptable. Each language has its own norms, registers, idioms, and even writing systems. 

Yet AI often fails to make this distinction, leading to: 

  • Inconsistent use of scripts (Cyrillic vs. Latin)
  • Cross-contamination of vocabulary and grammar
  • Tone and register mismatches that hurt readability and trust.

Why Is This Happening? 

  1. Low-Resource Languages

Serbian, Croatian, Bosnian, and Slovenian are considered low-resource in the AI space. That means there’s less high-quality training data available — especially from verified, domain-specific, and native sources. 

  1. Language Proximity Confusion

AI models rely on statistical patterns, not true linguistic understanding. Similar structures across Slavic languages can cause the model to blend them together without understanding their differences. 

  1. Reinforcement Bias

A popular story from the AI community suggests that GPT “stopped speaking Croatian” because Croatian users downvoted too many outputs, unintentionally signaling the model to stop trying. This kind of feedback loop is especially harmful to smaller language communities.


Why This Matters
 

Bad translations aren’t just an inconvenience — they: 

  • Undermine professional communicatio
  • Harm brand credibilit
  • Alienate local audience
  • Introduce misinformation or legal risk in regulated industries
     

In multilingual regions like the Western Balkans, accuracy isn’t optional — it’s essential. 

Related Articles

Webinar Recap: Level Up – The Power of Localization in the Gaming World

1 week ago

At Ciklopea, we believe that great games deserve to be played and understood everywhere. That’s why we recently hosted a live panel discussion titled “Level Up: The Power of Localization in the Gaming World”, where we gathered an exceptional group of experts from across the gaming ecosystem to explore how localization fuels global game success. 

Continue reading

Quarterly Business Reviews (QBRs) and KPIs in Localization: A Strategic Approach for Regulated Industries

4 months ago

In regulated industries such as pharmaceuticals, medical devices, technology, fintech, and life sciences, compliance is paramount and documentation errors can have significant regulatory consequences. Localization isn’t merely translation—it’s a critical risk management function requiring rigorous oversight. This is precisely why structured Quarterly Business Reviews (QBRs) and validated Key Performance Indicators (KPIs) have become essential governance tools for procurement professionals managing localization services.

Continue reading