Think Human Translators Will Be Replaced By Machines? Not So Fast!

In line with the previous piece about corporate narratives discouraging cultural exploration and language learning, there is a corollary that I hear more often and sadly some people whom I respect very deeply still believe it:

Namely, the idea that translation, along with many other jobs, will be replaced entirely by machines (again, a lot of misinformation that I’m going to get into momentarily)

My father went so far to say that my translation job wouldn’t be around in a few years’ time.

Iso an Jekob

I don’t blame him, he’s just misinformed by op-eds and journalists that seek to further an agenda of continued income inequality rather than actually looking at how machine translation is extremely faulty. After all, fewer people believing that learning languages is lucrative means that fewer people learn languages, right? And money is the sole value of any human being, right?

I am grateful for machine translation, but I see it as a glorified dictionary.

But right now even the most advanced machine translation in the world has hurdles that they haven’t even gotten over, but haven’t even been ADDRESSED.

I will mention this: if machine translation does end up reaching perfection, it will almost certainly be with very politically powerful languages very similar to English first. (The “Duolingo Five” of Spanish, French, Italian, German and Portuguese would be first in line. Other Germanic Languages, with the possible exceptions of Icelandic and Faroese, would be next.)

If the craft “dies” in part, it will be in this sector first (given as it is the “front line”). Even then, I deem it doubtful (although machine translation reaching perfection from English -> Italian is a thousand times more likely than it reaching perfection from English -> Vietnamese) But with most languages in the world, translators have no fear of having their jobs being replaced by machines in the slightest.

Because the less powerful you get and the further you get away from English, the more flaws show up in machine translation.

Let’s hop in:

 

  • Cultural References

 

Take a look at lyricstranslate.com (in which using machine translation is absolutely and completely forbidden). You’ll notice that a significant amount of the song texts come with asterisks, usually ones explaining cultural phenomena that would be familiar to a Russian- or a Finnish-speaker but not to a speaker of the target language. Rap music throughout the world relies heavily on many layers of meaning to a degree in which human translators need to rely on notes. Machine translation doesn’t even DO notes or asterisks.

Also, there’s the case in which names of places or people may be familiar to people who speak one language but not those who speak another. I remember in Stockholm’s Medieval Museum that the English translation rendered the Swedish word “Åbo” (a city known in English and most other languages by its Finnish name “Turku”) as “Turku, a city in southern Finland” (obviously the fluent readers of Scandinavian Languages needed no such clarification).

And then there are the references to religious texts, well-known literature, Internet memes and beyond. In Hebrew and in Modern Greek references to or quotes from ancient texts are common (especially in the political sphere) but machine translation doesn’t pick up on it!

When I put hip-hop song lyrics or a political speech into Google Translate and start to see a significant amount of asterisks and footnotes, then I’ll believe that machine translation is on the verge of taking over. Until then, this is a hole that hasn’t been addressed and anyone who works in translation of cultural texts is aware of it.

 

  • Gendered Speech

In Spanish, adjectives referring to yourself are different depending on your gender. In Hebrew and Arabic, you use different present-tense verb forms depending on your gender as well. In languages like Vietnamese, Burmese, and Japanese different forms of “I” and “you” contain gendered information and plenty of other coded information besides.

What happens with machine translation instead is that there are sexist implications (e.g. languages with a gender-neutral “he/she” pronoun such as Turkic or Finno-Ugric Languages are more likely to assume that doctors are male and secretaries are female).

Machine Translation doesn’t have a gender-meter at all (e.g. pick where “I” am a man, woman or other), so why would I trust it to take jobs away from human translators again?

On that topic, there’s also an issue with…

 

  • Formality (Pronouns)

 

Ah, yes, the pronouns that you use towards kids or the other pronouns you use towards emperors and monks. Welcome to East Asia!

A language like Japanese or Khmer has many articles and modes of address depending on where you are relative to the person or crowd to whom you are speaking.

Use the wrong one and interesting things can happen.

I just went on Google Translate and, as I expected, they boiled down these systems into a pinhead. (Although to their credit, there is a set of “safe” pronouns that can more readily be used, especially as a foreign speaker [students are usually taught one of these to “stick to”, especially if they look non-Asian]).

If I expect a machine to take away a human job, it has to do at least as well. And it seems to have an active knowledge of pronouns in languages like these the way a first-year student would, not like a professional translator with deep knowledge of the language.

A “formality meter” for machine translation would help. And it would also be useful for…

 

  • Formality (Verb Forms)

 

In Finnish the verb “to be” will conjugate differently if you want to speak colloquially (puhekieli). In addition to that, pronouns will also change significantly (and will become shorter). There was this one time I encountered a student who had read Finnish grammar books at length and had a great knowledge of the formal language but NONE of the informal language that’s regularly used in Finnish-Language vlogging and popular music.

Sometimes it goes well beyond the verbs. Samoan and Fijian have different modes of speaking as well (and usually one is used for foreigners and one for insiders). There’s Samoan in Google Translate (and Samoan has an exclusive and inclusive “we” and Google Translate does as well with that as you would expect). I’m not studying Samoan at the moment, nor have I even begun, but let me know if you have any knowledge of Samoan and if it manages to straddle the various forms of the language in a way that would be useful for an outsider. I’ll be waiting…

 

  • Difficult Transliterations

 

One Hebrew word without vowels can be vowelized in many different ways and with different meanings. Burmese transliteration is not user-friendly in the slightest. Persian and Urdu don’t even have it.

If I expect a machine to take my job, I expect it to render one alphabet to another. Without issues.

 

  • Translation Databases Rely on User Input

 

This obviously favors the politically powerful languages, especially those from Europe. Google Translate’s machine learning relies on input from the translator community. I’ve seen even extremely strange phrases approved by the community in a language like Spanish. While I’ve seen approved phrases in languages like Yiddish or Lao, they’re sparse (and even for the most basic words or small essential phrases).

In order for machine translation to be good, you need lots of people putting in phrases into the machine. The people who are putting phrases in the machine are those with access to computers, not ones who make $2 a day.

In San Francisco speakers of many languages throughout Asia are in demand for being interpreters. A lot of these languages come from poor regions that can’t send a bunch of people submitting phrases into Google Translate to Silicon Valley.

What’s more, there’s the issue of government support (e.g. Wales put its governmental bilingual documents into Google Translate, resulting in Welsh being better off with machine translation that Irish. The Nordic Countries want to preserve their languages and have been investing everything technological to keep them safe. Authoritarian regimes might not have the time or the energy to promote their languages on a global scale. Then again, you also get authoritarian regimes like Vietnam with huge communities of expatriates that make tech support of the language readily available in a way that would make thousands of languages throughout the world jealous).

 

  • Developing World Languages Are Not as Developed in Machine Translation

 

Solomon Islands Pijin would probably be easier to manage in machine translation that Spanish, but it hasn’t even been touched (as far as I know). A lot of languages are behind, and these are languages spoken in poor rural areas in which translators and interpreters are necessary (my parents worked in refugee camps in Sudan, you have NO IDEA how much interpreters of Tigre were sought after! To the degree in which charlatans became “improvisational interpreters”, you can guess how long that lasted.)

Yes, English may be the official language of a lot of countries in Africa and in the Pacific (not also to mention India) but huge swathes of people living here have weak command of English or, sometimes, no command.

The Peace Corps in particular has tons of resources for learning languages that it equips its volunteers with. Missionaries also have similar programs as well. Suffice it to say that these organizations are doing work with languages (spanning all continents) on a very deep level where machine translation hasn’t even VENTURED!

 

  • A Good Deal of Languages Haven’t Been Touched with Machine Translation At All

 

And some of this may also be in part due to the fact that some of them have no written format, or no standardized written format (e.g. Jamaican Patois).

 

  • Text-To-Speech Underdeveloped in Most Languages

 

I’m fairly impressed by Thai’s Text-to-Speech functionality in Google Translation, not also to mention those of the various European Languages that have them (did you know that if you put an English text into Dutch Google Translate and have it read out loud, it will read you English with a Dutch accent? No, really!)

 

And then you have Irish which has three different modes of pronunciation in addition to a hodge-podge “standard” that is mostly taught in schools and in apps. There is text-to-speech Irish out there, developed in Trinity College Dublin, It comes in multiple “flavors” depending on whether you want Connacht, Ulster or Munster Irish. While that technology exists, it hasn’t been integrated into Google Translate in part because I think customization options are scary for ordinary users (although more of them may come in the future, can’t say I know because I’m not on the development team).

 

For Lao, Persian, and a lot of Indian regional languages (among many others), text-to-speech hasn’t even been tried. In order to fully replace interpreters, machine translation NEEDS that and needs it PERFECTLY. (And here I am stuck with a Google Translate that routinely struggles with Hebrew vowelization…)

 

  • Parts of Speech Commonly Omitted in Comparison to Other Languages

 

Some languages, like Burmese or Japanese, often form sentences without any variety of pronoun in the most natural way of speech. Instead of saying “I understand” in Burmese, you would literally say “ear go-around present-tense-marker” (no “I”, although you could add a version of “I” and it would still make sense). In context, I could use that EXACT same phrase as the ear going around to indicate “you understand” “we understand” “the person behind the counter understands”.

In English, except in the very informal registers (“got it!”) we usually need to include a pronoun. But if machine translation should be good enough to use in sworn interviews and in legal proceedings, they should be able to manage when to use pronouns and when not to. Even in a language like Spanish adding “yo” (I) versus omitting it is another delicate game to play, as is the case with most languages in which person-information is coded into the verb (yo soy – I am, but soy could also mean “I am” as well)

Now take a language like Rapa Nui (“Easter Island Language”). Conjunctions usually aren’t used (their “but” comes from Spanish as a loan word! [pero]). Now let’s say a machine has to translate from Rapa Nui into English, how will the “and” ‘s and “but” ‘s be rendered in a way that is natural to an English speaker?

 

Maybe the future will prove me wrong and machine translation will be used in courts instead of human beings. But I’ll come closer to believing it when these ten points are done away with SQUARELY. Until then, I’ll be very skeptical and assure the translators of the world that they are safe in their profession.

 

 

ga

Advertisements

4 Reasons You Should Learn a Provincial Language from India

“I Speak English, Hindi and *pause* … a couple of Indian Languages”

If you have met someone from India and the topic of languages comes up, you may hear a sentence like this.

As the proud owner of an India Phrasebook, I am happy to say that I usually follow up the question with “which ones?”

So Many Languages, So Small a Book. And My Time Budget is even smaller.

And then I remember the one time I met someone from West Bengal at a video game design mixer. I asked him if Bengali was similar to Assamese (one of India’s languages that actually sounds like it is from Southeast Asia despite the fact that it is Indo-European). Stunned, he asked me three times how on earth a Jewish boy from Connecticut would have any knowledge of Indian local cultures at all

“You’re like one of three white people in the world who knows what Assamese IS!”

It is very far from the first time. And then there was the one time I correctly identified someone as a Malayalam speaker (I just guessed), and after a minute of a dropped jaw, I was told, stunned. “Oh. My. God. ARE YOU PSYCHIC?!!?”

Just knowing the names of the local Indian Languages set you apart. I’m probably the only member of my extended family that can name more than five Indian Languages.

As for Indian Languages I’ve studied…well…some Tamil…not very much at all…some Gujarati…not too much…and some Oriya…even less than both of the two of those put together.

Of the one that I am focusing my effort on (as far as Memrise.com is concerned), it is Gujarati (for the time being) still haven’t had a conversation in it (I’ve used a few sentences with native speakers!), but given as today is Gujarat Day and Maharashtra Day (which is actually the same day, when the “Bombay” state was divided into two pieces, and is celebrated in both provinces as their provincial day), I’m going to write this piece.

 

  1. India is a Fusion of Many, MANY Peoples and Recognizing that Will Earn Favor and Smiles. The Best Way to Recognize it is to Learn an Indian Regional Language.

 

Hindi and English do function as languages that tie most of the country together, but each area of India comes with a regional flavor (and many other sub-regional flavors) that many outside of that area of the world overlook.

I still remember the times when I needed someone to explain me what “Tamil” or “Marathi” was. In high school, I thought that Hindi functioned in India the way that English did in the United States. I had no clue how deeply important and used the regional languages were (and continue to be).

As of the time of writing, I don’t even list Gujarati or Tamil as languages that I know. At all. Given that my list is a bit large at the time (both in the languages learned and the languages to-be-learned department) I feel the pressure to abandon them.

Luckily I’ve stopped caring so much about pressure of any sort, although I’m not actively learning either. (I’m just picking up pieces on apps)

Anyhow, building connections with Indian Languages!

The various little things that I have said have been construed as demonstrations of the fact that I recognize that India is a collection of many, MANY cultures, and that I am very amused by some of them and I want to learn more about them!

In the case of talking to Native Speakers of these languages, it gets them to open up about what life in their province is like, what there is to see, what sort of fun words there are in the language, as well as endless praising of your skills, even if they are the most basic.

 

They tend to be used to people not even knowing that these local cultures exist! And then you come along!

I am very grateful to my Indian friends and acquaintances for their help!

 

  1. The Indo-Aryan Languages, as well as the Dravidian Languages, are similar to each other, sometimes even mutually intelligible!

 

In some areas of Europe (Scandinavia and the Balkans come to mind), languages became discrete entities based on national borders. Denmark and Sweden decided to alter their linguistic orthographies to become very much not like the other one.

 

The entire thing with the Balkan Languages is not something I feel too qualified to talk about at the moment, but feel free to treat yourself to a Google Search about Bosnian, Croatian and Serbian. Or Bulgarian and Macedonian.

Have Fun.

Tee Hee.

 

In India, a lot of languages, despite being discrete, actually blended with similar characteristics, as a result of Sanskrit influence. In nearly the whole North of India, similar words for “Thank You” are used, all based on the Sanskrit “Dhanyavaadaha”. Greetings are function similarly, as well as the usage of words from liturgical languages (Sanskrit and Arabic) playing their role.

Often it is common for Indians to learn another regional language when they head to another province of the country. (One person told me “I bet you could learn Kannada in a week with my help”). In the case of Kannada, its closest relatives are the other main Dravidian Languages of Telugu, Malayalam, and Tamil (These four are the primary languages of the South of India, distinct from their Indo-European compatriots). Learning any one will get you very close to learning any of the other three to fluency.

The Indo-Aryan Languages in the North, some of which are very similar to each other (like Hindi and Urdu being, as one of my Pakistani students put it, like Swedish and Norwegian) and others less so (Oriya and Gujarati are from opposite ends of the country but still have some similarities) can also be “collected” with similar ease, much like the Romance Languages.

There is the writing issue, which is more of an issue with some languages than others, but interestingly some character sets are close to each other or even identical. (Kannada’s script is also used for Konkani in Goa).

No wonder there is such an internal polyglot culture in India! And it is one that you can contribute to!

 

  1. Regional Media and Culture is more Accessible than ever, and will continue to endow privileges to L2 Learners!

 

India is a tech giant. Just look for apps to learn Indian Languages on the Google Play Store (or IOS). A lot of these apps have fantastic audio, very good phrase selections, and audiences for adult learners as well as for kids!

And that’s just the beginning.

Go into ANY YouTube search or any library in a major city. Look for the film section. Look for films in Indian Languages. I often find films not only Hindi but also every single Indian language I’ve mentioned in this article (although I don’t think I’ve seen Konkani so far).

India is home to the world’s largest film industry! Yes, Hindi and English dominate a lot of it, but that’s not the whole story!

All throughout India, film culture plays an extraordinary role, and coming to know its various regional aspects and flavors will make you think about what role regionalism and regional cultures could play in our increasingly global world, if only more of us were more adventurous!

Your Indian friends will be more than happy to give you recommendations!

Speaking of which…

  1. Native Speakers will be Super Helpful!

I haven’t received a single word of discouragement the way I have with some other languages, least of all from native speakers!

Sometimes I cringe whenever I think of the time that I was in a library in Sweden and was told “why bother learning Swedish if we all speak English anyhow?” (Answers: too many to list, but at the time it was “the letters written by my deceased family members were not going to translate themselves, one, and two…I’m surrounded by books I can’t read yet!”)

India is the world’s largest English-speaking nation, but despite that (or perhaps because of it) the Indians to whom I have spoken speak fondly about their regional cultures, and actively are thrilled with the possibility of you engaging with it!

Coming from a place with many, MANY regional languages, a lot of Indians are keenly aware of the struggle of learning another language! What we need in the struggle is more encouragement! And with a choice like an Indian languages, you’ll encountered plenty of it!

Hawaii Pidgin isn’t an Indian Language. Just letting you know that.

A Happy Gujarat Day / Maharashtra Day to all! I hope that one day I will be able to write more articles on Indian Languages! But first I actually have to … ummm … learn them better!