Decoding Philipp Koehn 14 September 2023 Philipp Koehn Machine Translation: Decoding 14 September 2023 1Decoding • We have a mathematical model for translation p(e|f) • Task of decoding: find the translation ebest with highest probability ebest = argmaxe p(e|f) • Two types of error – the most probable translation is bad → fix the model – search does not find the most probably translation → fix the search • Decoding is evaluated by search error, not quality of translations (although these are often correlated) Philipp Koehn Machine Translation: Decoding 14 September 2023 2 translation process Philipp Koehn Machine Translation: Decoding 14 September 2023 3Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause Philipp Koehn Machine Translation: Decoding 14 September 2023 4Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er he • Pick phrase in input, translate Philipp Koehn Machine Translation: Decoding 14 September 2023 5Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not • Pick phrase in input, translate – it is allowed to pick words out of sequence reordering – phrases may have multiple words: many-to-many translation Philipp Koehn Machine Translation: Decoding 14 September 2023 6Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go • Pick phrase in input, translate Philipp Koehn Machine Translation: Decoding 14 September 2023 7Translation Process • Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home • Pick phrase in input, translate Philipp Koehn Machine Translation: Decoding 14 September 2023 8Computing Translation Probability • Probabilistic model for phrase-based translation: ebest = argmaxe I i=1 φ( ¯fi|¯ei) d(starti − endi−1 − 1) pLM(e) • Score is computed incrementally for each partial hypothesis • Components Phrase translation Picking phrase ¯fi to be translated as a phrase ¯ei → look up score φ( ¯fi|¯ei) from phrase translation table Reordering Previous phrase ended in endi−1, current phrase starts at starti → compute d(starti − endi−1 − 1) Language model For n-gram model, need to keep track of last n − 1 words → compute score pLM(wi|wi−(n−1), ..., wi−1) for added words wi Philipp Koehn Machine Translation: Decoding 14 September 2023 9 decoding process Philipp Koehn Machine Translation: Decoding 14 September 2023 10Translation Options he er geht ja nicht nach hause it , it , he is are goes go yes is , of course not do not does not is not after to according to in house home chamber at home not is not does not do not home under house return home do not it is he will be it goes he goes is are is after all does to following not after not to , not is not are not is not a • Many translation options to choose from – in Europarl phrase table: 2727 matching phrase pairs for this sentence – by pruning to the top 20 per phrase, 202 translation options remain Philipp Koehn Machine Translation: Decoding 14 September 2023 11Translation Options he er geht ja nicht nach hause it , it , he is are goes go yes is , of course not do not does not is not after to according to in house home chamber at home not is not does not do not home under house return home do not it is he will be it goes he goes is are is after all does to following not after not to not is not are not is not a • The machine translation decoder does not know the right answer – picking the right translation options – arranging them in the right order → Search problem solved by heuristic beam search Philipp Koehn Machine Translation: Decoding 14 September 2023 12Decoding: Precompute Translation Options er geht ja nicht nach hause consult phrase translation table for all input phrases Philipp Koehn Machine Translation: Decoding 14 September 2023 13Decoding: Start with Initial Hypothesis er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Philipp Koehn Machine Translation: Decoding 14 September 2023 14Decoding: Hypothesis Expansion er geht ja nicht nach hause are pick any translation option, create new hypothesis Philipp Koehn Machine Translation: Decoding 14 September 2023 15Decoding: Hypothesis Expansion er geht ja nicht nach hause are it he create hypotheses for all other translation options Philipp Koehn Machine Translation: Decoding 14 September 2023 16Decoding: Hypothesis Expansion er geht ja nicht nach hause are it he goes does not yes go to home home also create hypotheses from created partial hypothesis Philipp Koehn Machine Translation: Decoding 14 September 2023 17Decoding: Find Best Path er geht ja nicht nach hause are it he goes does not yes go to home home backtrack from highest scoring complete hypothesis Philipp Koehn Machine Translation: Decoding 14 September 2023 18 dynamic programming Philipp Koehn Machine Translation: Decoding 14 September 2023 19Computational Complexity • The suggested process creates exponential number of hypothesis • Machine translation decoding is NP-complete • Reduction of search space: – recombination (risk-free) – pruning (risky) Philipp Koehn Machine Translation: Decoding 14 September 2023 20Recombination • Two hypothesis paths lead to two matching hypotheses – same foreign words translated – same English words in the output it is it is • Worse hypothesis is dropped it is Philipp Koehn Machine Translation: Decoding 14 September 2023 21Recombination • Two hypothesis paths lead to hypotheses indistinguishable in subsequent search – same foreign words translated – same last two English words in output (assuming trigram language model) – same last foreign word translated it he does not does not • Worse hypothesis is dropped it he does not Philipp Koehn Machine Translation: Decoding 14 September 2023 22Restrictions on Recombination • Translation model: Phrase translation independent from each other → no restriction to hypothesis recombination • Language model: Last n − 1 words used as history in n-gram language model → recombined hypotheses must match in their last n − 1 words • Reordering model: Distance-based reordering model based on distance to end position of previous input phrase → recombined hypotheses must have that same end position • Other feature function may introduce additional restrictions Philipp Koehn Machine Translation: Decoding 14 September 2023 23 pruning Philipp Koehn Machine Translation: Decoding 14 September 2023 24Pruning • Recombination reduces search space, but not enough (we still have a NP complete problem on our hands) • Pruning: remove bad hypotheses early – put comparable hypothesis into stacks (hypotheses that have translated same number of input words) – limit number of hypotheses in each stack Philipp Koehn Machine Translation: Decoding 14 September 2023 25Stacks are it he goes does not yes no word translated one word translated two words translated three words translated • Hypothesis expansion in a stack decoder – translation option is applied to hypothesis – new hypothesis is dropped into a stack further down Philipp Koehn Machine Translation: Decoding 14 September 2023 26Stack Decoding Algorithm 1: place empty hypothesis into stack 0 2: for all stacks 0...n − 1 do 3: for all hypotheses in stack do 4: for all translation options do 5: if applicable then 6: create new hypothesis 7: place in stack 8: recombine with existing hypothesis if possible 9: prune stack if too big 10: end if 11: end for 12: end for 13: end for Philipp Koehn Machine Translation: Decoding 14 September 2023 27Pruning • Pruning strategies – histogram pruning: keep at most k hypotheses in each stack – stack pruning: keep hypothesis with score α × best score (α < 1) • Computational time complexity of decoding with histogram pruning O(max stack size × translation options × sentence length) • Number of translation options is linear with sentence length, hence: O(max stack size × sentence length2 ) • Quadratic complexity Philipp Koehn Machine Translation: Decoding 14 September 2023 28Reordering Limits • Limiting reordering to maximum reordering distance • Typical reordering distance 5–8 words – depending on language pair – larger reordering limit hurts translation quality • Reduces complexity to linear O(max stack size × sentence length) • Speed / quality trade-off by setting maximum stack size Philipp Koehn Machine Translation: Decoding 14 September 2023 29 future cost estimation Philipp Koehn Machine Translation: Decoding 14 September 2023 30Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische tm:-1.16,lm:-2.93 d:0, all:-4.09 the first time das erste mal tm:-0.56,lm:-2.81 d:-0.74. all:-4.11 initiative initiative tm:-1.21,lm:-4.67 d:0, all:-5.88 both hypotheses translate 3 words worse hypothesis has better score Philipp Koehn Machine Translation: Decoding 14 September 2023 31Estimating Future Cost • Future cost estimate: how expensive is translation of rest of sentence? • Optimistic: choose cheapest translation options • Cost for each translation option – translation model: cost known – language model: output words known, but not context → estimate without context – reordering model: unknown, ignored for future cost estimation Philipp Koehn Machine Translation: Decoding 14 September 2023 32Cost Estimates from Translation Options the tourism initiative addresses this for the first time -1.0 -2.0 -1.5 -2.4 -1.0 -1.0 -1.9 -1.6-1.4 -4.0 -2.5 -1.3 -2.2 -2.4 -2.7 -2.3 -2.3 -2.3 cost of cheapest translation options for each input span (log-probabilities) Philipp Koehn Machine Translation: Decoding 14 September 2023 33Cost Estimates for all Spans • Compute cost estimate for all contiguous spans by combining cheapest options first future cost estimate for n words (from first) word 1 2 3 4 5 6 7 8 9 the -1.0 -3.0 -4.5 -6.9 -8.3 -9.3 -9.6 -10.6 -10.6 tourism -2.0 -3.5 -5.9 -7.3 -8.3 -8.6 -9.6 -9.6 initiative -1.5 -3.9 -5.3 -6.3 -6.6 -7.6 -7.6 addresses -2.4 -3.8 -4.8 -5.1 -6.1 -6.1 this -1.4 -2.4 -2.7 -3.7 -3.7 for -1.0 -1.3 -2.3 -2.3 the -1.0 -2.2 -2.3 first -1.9 -2.4 time -1.6 • Function words cheaper (the: -1.0) than content words (tourism -2.0) • Common phrases cheaper (for the first time: -2.3) than unusual ones (tourism initiative addresses: -5.9) Philipp Koehn Machine Translation: Decoding 14 September 2023 34Combining Score and Future Cost the first time das erste mal tm:-0.56,lm:-2.81 d:-0.74. all:-4.11 the tourism initiative die touristische initiative tm:-1.21,lm:-4.67 d:0, all:-5.88 -6.1 -9.3 this for ... time für diese zeit tm:-0.82,lm:-2.98 d:-1.06. all:-4.86 -6.9 -2.2 -5.88 -11.98 -6.1 + = -4.11 -13.41 -9.3 + = -4.86 -13.96 -9.1 + = • Hypothesis score and future cost estimate are combined for pruning – left hypothesis starts with hard part: the tourism initiative score: -5.88, future cost: -6.1 → total cost -11.98 – middle hypothesis starts with easiest part: the first time score: -4.11, future cost: -9.3 → total cost -13.41 – right hypothesis picks easy parts: this for ... time score: -4.86, future cost: -9.1 → total cost -13.96 Philipp Koehn Machine Translation: Decoding 14 September 2023 60Summary • Translation process: produce output left to right • Translation options • Decoding by hypothesis expansion • Reducing search space – recombination – pruning (requires future cost estimate) • Other decoding algorithms Philipp Koehn Machine Translation: Decoding 14 September 2023