Science or Fiction: Machine Translation Explained
Once when I was a kid, I was passing by a car wash which had the big written sign “Machine washing and polishing” with a friend who asked me, all amazed, “Wow, they have machines to wash the cars?!” And the guy who worked there heard him and replied, mildly disappointed “Do I look like a machine to you?” We did not expect that, but he, indeed, was still a human being. Same goes with machine translation (MT).
Many people believe that computers are doing all the work nowadays, and translators are often asked if they are afraid of losing their jobs. And the answer could be the same as the one that car wash worker provided. Even though we’re nearing the end of 2017, machine translation tools are still not advanced enough to replace the humans. We designed them to make our jobs easier and to be more efficient. The tools are here to help us, not vice versa.
When it comes to machine translation there is still a lot of confusion, particularly for the people coming from the other industry fields. Naturally, they have a whole bunch of questions that need clarification, such as:
What does MT actually mean?
How does it work?
What do MT and CAT stand for?
Machine translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. On a basic level, it works on a principle of simply converting the word from one language into the word of another. Therefore, as you can imagine, it cannot provide convenient translation, because the art of translating is much more complex. Since every language has its own rules and ways of usage, it is difficult to make machine translation tools do better than they’re doing now.
That’s why the human intervention is inevitable. We’re still the ones who are doing most of the work and we’re making all the necessary corrections, ensuring that translation output is efficient, correct and ready to use.
When we’re talking about machine translation there are several types worth mentioning:
Rule-Based Machine Translation (RBMT)
This type of machine translation requires more information about the structure of the source and target languages. Morphological and syntactic rules and semantic analysis of both languages help define the frame of rule-based machine translation. The process involves linking structures of input and output sentences using a parser, generator and a transfer lexicon. The problem with this method is that everything needs to be defined explicitly, which can be time consuming. If we want to speed up the whole process, we would hardly want to use this one.
Statistical Machine Translation (SMT)
This is a paradigm of machine translation in which the translation output is generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The idea is to store as many similar documents as possible in the same place so that tools can detect patterns in the documents that have previously been translated by professional human translators and to make guesses based on those findings. Google Translate was probably the most popular machine translation service that was using this method, but they have recently switched to neural MT models.
Example-Based Machine Translation (EBMT)
This method also relies on the corpus of previously translated documents. When we enter a sentence we want to translate, the sentences that contain similar sub-sentential components are selected from the corpus. Those sentences are then used to translate the subsentential components of the original sentence into the target language. You can already see that this simply screams for additional human intervention.
Hybrid Machine Translation (HMT)
Just as its name suggests, some of the previously mentioned techniques have their fingers in this. Hybrid machine translation ties together rule-based and statistical machine translation in a way that translations are performed using a rules based engine, which is then followed by statistical attempt to adjust and/or correct the output from the rules engine.
Neural Machine Translation (NMT)
Obviously, it has something to do with those “neural networks” we’re always hearing about, but we’re not going to torture you with unnecessary details. Not today, at least.
This type of machine translation is based on deep learning (artificial intelligence) and it has made rapid progress in recent years. As we said it above, Google has announced its translation services are now using this technology, abandoning the previously used statistical approach.
Stay tuned for an in-depth overview of Computer-Assisted Translation (CAT) coming very soon.