Current Challenges Philipp Koehn 3 November 2022 Philipp Koehn Machine Translation: Current Challenges 3 November 2022 WMT 2016 i human .6 .2 -- A -- Neural MT • uedin-nmt Statistical MT_* metamind uedin-syntax • • NYU-UMONTREAL ONLINE-B äOMT-RULE-BASED • KIT-LIMSI •• . . CAMBRIDGE Kiir ONLINE-A JHU-SYNTAX • . JHU-PBMT UEDIN-PBMT ONLINE-F # ONLINE-G BLEU H-1-1-1-1-1-1-1-1-1 18 20 22 24 26 28 30 32 34 36 (in 2017 barely any statistical machine translation submissions) Philipp Koehn Machine Translation: Current Challenges 3 November 2022 2017: Google: "Near Human Quality" 2 ^ 6 perfect translation human I* _ -neural (GNMT) phrase-based (PBMT) English English English Spanish French Chinese > > > > > > Spanish French Chinese English English English Translation model Philipp Koehn Machine Translation: Current Challenges 3 November 2022 2018: More Hype Microsoft Research Achieves Human Parity For Chinese English Translation Written by Sue Gee Wednesday, 21 March 2018 Researchers in Microsoft's labs in Beijing and in Redmond and Washington have developed an Al machine translation system that can translate with the same accuracy as a human from Chinese to English. SDL Cracks Russian to English Neural Machine Translation Global Enterprises to Capitalize on Near Perfect Russian to English Machine Translation as SDL Sets New Industry Standard '90% of the system's output labelled as perfect byprofessional Russian-English translators Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Just Better Fluency? Adequacy +1% 100 80 60 III CS^EN DE^EN RO^EN RU^EN llONLINE-Bll UEDIN-NMT Fluency +13% 100 80 60 nrrr CS^EN DE^EN RO^EN RU^EN IIONLINE-BI ■ UEDIN-NMT (from: Sennrich and Haddow, 2017) Philipp Koehn Machine Translation: Current Challenges 3 November 2022 lack of training data Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Amount of Training Data Corpus Size (English Words) English-Spanish systems trained on 0.4 million to 385.7 million words Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Translation Examples Source A Republican strategy to counter the re-election of Obama i 1024 Un órgano de coordinación para el anuncio de libre determinación 1 512 Lista de una estrategia para luchar contra la elección de hoj as de Ohio 256 Explosion realiza una estrategia divisiva de luchar contra las elecciones de autor 1 128 Una estrategia republicana para la eliminación de la reelección de Obama 1 64 Estrategia siria para contrarrestar la reelección del Obama . 1 i 32 1 Una estrategia republicana para contrarrestar la reelección de Obama Philipp Koehn Machine Translation: Current Challenges 3 November 2022 8 domain mismatch Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Domain Mismatch 9 ^jiy System | Law Medical IT Koran Subtitles All Data 30.532.8 45.142.2 35.344.7 17.917.9 26.420.8 Law 31.134.4 12.118.2 3.5 6.9 1.3 2.2 2.8 6.0 Medical 3.910.2 39.443.5 2.0 8.5 0.6 2.0 1.4 5.8 IT 1.9 3.7 6.5 5.3 42.139.8 1.8 1.6 3.9 4.7 Koran 0.4 1.8 0.0 2.1 0.0 2.3 15.918.8 1.0 5.5 ^^^^ Subtitles 7.0 9.9 9.317.8 9.213.6 9.0 8.4 25.922.1 Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Translation Examples Source Schaue um dich herum. Ref. Look around you. All NMT: Look around you. SMT: Look around you. Law NMT: Sughum gravecorn. SMT: In order to implement dich Schaue . Medical NMT: EMEA / MB / 049 / 01-EN-Final Work progamme for 2002 SMT: Schaue by dich around . IT NMT: Switches to paused. SMT: To Schaue by itself . \t \t Koran NMT: Take heed of your own souls. SMT: And you see. Subtitles NMT: Look around you. SMT: Look around you . Philipp Koehn Machine Translation: Current Challenges 3 November 2022 11 V rare words Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Rare Words • More frequent in training —>• more likely to get right in test • Let's measure thisl • One problem — frequency measured for input words — translation correctness measured for output words Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Translation Accuracy for Input Words 13 • Generate word alignment between input and output words • Look up count of input word in training • Link to output word via word alignment • Check if it is also in the reference translation! • A lot of tedious special cases — one-to-many alignment, only some output words in reference — input word not aligned to any target word — many-to-one alignment — output word occurs multiple time in output or reference sentence Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Count vs. Accuracy Philipp Koehn Machine Translation: Current Challenges 3 November 2022 15 word alignment Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Word Alignment § c ^ 5 qj to •2 g 3 OS PI ^ I J§ ^ CÜ 73 qj 03 qj C ^ > qj 03 03 qj h CD 5-1 - ^_ q qj 4^ ,£> cn <-m >-> 89 die 56 Beziehungen zwischen Obama und Netanjahu 72 16 26 96 79 98 sind 42 11 38 seit 22 54 10 Jahren 98 angespannt 84 • 11 14 23 49 Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Word Alignment? the relationship between Obama and Netanyahu has been stretched for years •1 c I -a :cö U CO CO 47 17 11 81 72 87 93 95 38 16 26 21 14 54 77 38 33 12 90 19 32 17 Philipp Koehn Machine Translation: Current Challenges 3 November 2022 18 beam search Philipp Koehn Machine Translation: Current Challenges 3 November 2022 1 2 4 8 12 20 30 50 100 200 500 1,000 Beam Size Philipp Koehn Machine Translation: Current Challenges 3 November 2022 20 noisy data Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Noise in Training Data • Crawled parallel data from the web (very noisy) SMT NMT WMT17 24.0 27.2 + Paracrawl 25.2 (+1.2) 17.3 (-9.9) (German-English, 90m words each of WMT17 and Crawl data) 5% 10% 20% 50% 100% Raw crawl data 27.4 24.2 26.6 24.2 24.7 24.4 20.9 24.s 17.3 +0.2 +0.2 +0.4 +0.8 -6.3 + 1.2 -0.9 +02 -2.5 ,q q • Corpus cleaning methods [Xu and Koehn, EMNLP 2017] give improvements Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Types of Noise • Misaligned sentences • Disfluent language (from MT, bad translations) • Wrong language data (e.g., French in German-English corpus) • Untranslated sentences • Short segments (e.g., dictionaries) • Mismatched domain Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Mismatched Sentences • Artificial created by randomly shuffling sentence order • Added to existing parallel corpus in different amounts 5% 10% 20% 50% 100% 24.0 24.0 23.9 26.1 23.9 25.3 23.4 -0.0 -0.0 -0.1 —-0.1 " -0.6 • Bigger impact on NMT (green, left) than SMT (blue, right) Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Misordered Words 24 • Artificial created by randomly shuffling words in each sentence 5% 10% 20% 50% 100% Source 24.0 23.6 23.9 26.6 23.6 25.5 23.7 -0.0 -0.4 -0.1 -0.6 -0.4 Target 24.0 24.0 23.4 26.7 23.2 26.1 22.9 -0.0 -0.0 -0.6 -0.5 -0.8 -1.1 -1.1 • Similar impact on NMT than SMT, worse for source reshuffle Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Untranslated Sentences 25 ^ 5% 10% 20% 50% 100% 17.6 23.8 11.2 23.9 5.6 23.8 3.2 23.4 3.2 21.1 -0.2 -0.1 -0.2 -0.6 -2.9 Source -9.8 -16.0 -21.6 -24.0 -24.0 Target 27.2 27.0 26.7 26.8 26.9 -0.0 -0.2 -0.5 -0.4 -0.3 Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Wrong Language ^ 5% 10% 20% 50% 100% fr source 26.9 24.0 -0.3 -0.0 26.8 23.9 -0.4 -0.1 26.8 23.9 -0.4 -0.1 26.8 23.9 -0.4 -0.1 26.8 23.8 -0.4 -0.2 fr target 26.7 24.0 26.6 23.9 26.7 23.8 26.2 23.5 25.0 23.4 -0.5 -0.0 -0.6 -0.1 -0.5 -0.2 -1.0 -0.5 -2.2 • Surprisingly robust, maybe due to domain mismatch of French data Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Short Sentences 5% 10% 20% 50% 1 -2 words 27.1 24.1 26.5 23.9 26.7 23.8 -0.1 +0.1 -0.7 -0.1 -0.5 -0.2 27.8 24.2 27.6 24.5 2M) 24.5 26.6 24.2 1 -5 words +0.6 +0.2 +0.4 +0.5 TdT +0.5 -0.6 +0-2 • No harm done Philipp Koehn Machine Translation: Current Challenges 3 November 2022 28 control over output Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Specifying Decoding Constraints 29 • Overriding the decisions of the decoder • Why? =4> translations have followed strict terminology =4> rule-based translation of dates, quantities, etc. Philipp Koehn Machine Translation: Current Challenges 3 November 2022 XML Schema The router is a model Psy X500 Pro . • The XML tags specify to the decoder that - the word router to be translated as Router - The router is, to be translated before the rest () - brand name Psy X500 Pro to be translated as a unit (, ) Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Formal Constraints • Subtitles — translation has to fit into space on screen (may have to be shortened) — input and output broken up into linesl • Speech translation — input often not well-formed — real time translation: start while sentence is spoken — subtitles: have be readable in limited time — dubbing: sync up with video of speaker's mouth movement! • Poetry — meter — rhyme Philipp Koehn Machine Translation: Current Challenges 3 November 2022 32 catastrophic errors Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Catastrophic Errors News | Science and Technology Facebook apologises for rude mistranslation of Xi Jinping's name Company blames technical glitch that 'caused incorrect translations'of Chinese leader's name from Burmese to English. Facebook's auto translation Al fail leads to a nightmare for a Palestinian man The Al feature had "Good morning" in Arabic wrongly translated as "attack them" in Hebrew. By Gianluca Mezzofiore on October 24, 2017 f V Q Industry News • By Marion Marking On 3 Aug 2020 Thai Mistranslation Shows Risk of Auto-Translating Social Media Content After a machine translation of a post from English into Thai about the King's birthday proved offensive to the Thai monarchy, Facebook Thailand said it was deactivating auto-translate on Facebook and Instagram, revamping machine translation (MT) quality, and offering the Thai people its "profound apology." Philipp Koehn Machine Translation: Current Challenges 3 November 2022 What are Catastrophic Errors? • Generation of profanity — first step: maintain list of offensive words for each language — only eliminate these words, if the input did not include such words — but: offensive language is not limited to specific words • Generation of violent / inciting content • Opposite meaning • Mistranslation of names =4> All this is hard to detect Philipp Koehn Machine Translation: Current Challenges 3 November 2022 35 robustness Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Robustness to User Generated Content 36 =4 English German daily content of #scaramouche from genshin impact #WM * mute #mouchecc for no cc tweets! not leak free http://dailymouch e.carrd.co x täglicher Inhalt von #scaramouche von genshin impact #JH# j{ stumm #mouchecc für keine CC-Tweets! nicht auslaufsicher ^ http://dailymouche.ca rrd.co Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Challenges • Jargon and acronyms • Misspellings (sometimes intended for effect) • Mangled grammar • Special symbols (emojis, etc.) • Hashtags, URLs,... • Use of dialectical languages • Use of non-standard writing systems (e.g., Latin script due to lack of keyboard) Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Some Methods • Special handling of non-words like emojis, hashtags, URLs • Creating synthetic noisy training data • Adversarial training • Resources — Machine translation of noisy text data set (MTNT) — WMT 2020 Shared Task on Machine Translation Robustness Philipp Koehn Machine Translation: Current Challenges 3 November 2022 bias Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Gender Bias The doctor asked the nurse to help her in the procedure El doctor le pidio a la enfermera que le ayudara con el procedimiento Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Gender Bias English ▼ Spanish ▼ the doctor said: x La doctora dijo: toma take the pill. la pildOra. (feminine) ^ ID El doctor dijo: toma la PildOra. (masculine) Open in Google Translate Feedback Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Robustness to Style "You Sound Just Like Your Father" Commercial Machine Translation Systems Include Stylistic Biases Dirk Hovy Federico Bianchi Tommaso Fornaciari Bocconi University ViaSarfatti 25, 20136 Milan, Italy {dirk.hovy, f.bianchi, fornaciari.tommaso} @unibocconi.it Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Dialect Bias • Models often trained only on standard languages (British, American) • Work less well on other dialects • Bigger problem for automatic speech recognition DialBd Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Evaluate Across Language Varieties 44 =4 BLEU score on standard language is not enough Also need test sets for each language variety Headline quality Acceptable degradation on important language varieties Philipp Koehn Machine Translation: Current Challenges 3 November 2022 document-level translation Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. • Machine translation translates one sentence at a time • But: surrounding context may help Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. Machine translation translates one sentence at a time But: surrounding context may help — translation of pronouns may require co-reference Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. • Machine translation translates one sentence at a time • But: surrounding context may help — translation of pronouns may require co-reference — ambiguous words may be informed by broader context Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Document-Level Translation The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. • Machine translation translates one sentence at a time • But: surrounding context may help — translation of pronouns may require co-reference — ambiguous words may be informed by broader context — consistent translation of repeated words Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Conditioning on Broader Context 50 =4 The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. Der Laden verkauft einen schönen Tisch. Full Document Translation Hierarchical attention — compute which previous sentences matter most — compute which words in these sentences matter most Philipp Koehn Machine Translation: Current Challenges 3 November 2022 Conditioning on Broader Context 51 The shop is selling a nice table. Jane is quite taken by it. The table would match the chairs in her living room. J I Der Laden yerMuft ejnen sdfrM^IJsch. Er gefaUt Jane sehr. ... • Concatenate all sentences together — document = very long sentence — special treatment for sentence boundaries — requires scaling of neural decoding implementation Philipp Koehn Machine Translation: Current Challenges 3 November 2022 52 questions? Philipp Koehn Machine Translation: Current Challenges 3 November 2022