Machine Translation PV061 Pavel Rychlý NLP Centre, FI MU 20 Sep 2023 Pavel Rychlý ·Machine Translation ·20 Sep 2023 1 / 7 Pavel Rychlý ·Machine Translation ·20 Sep 2023 2 / 7 BPE Subword Neural Machine Translation https://github.com/rsennrich/subword-nmt pip install subword-nmt Pavel Rychlý ·Machine Translation ·20 Sep 2023 3 / 7 SentencePiece https://github.com/google/sentencepiece pip install sentencepiece python wrapper: https://github.com/google/sentencepiece/blob/master/python/README Pavel Rychlý ·Machine Translation ·20 Sep 2023 4 / 7 Shared vocabulary encoder-decoder each part separate word embeddings decoder: separate input/output embeddings Pavel Rychlý ·Machine Translation ·20 Sep 2023 5 / 7 Fairseq https://github.com/facebookresearch/fairseq MT example: https://github.com/facebookresearch/fairseq/tree/main/examples/trans checkpoint is a single file Pavel Rychlý ·Machine Translation ·20 Sep 2023 6 / 7 HuggingFace https://huggingface.co/ pip install transformers https://github.com/huggingface/transformers/tree/main/examples/pyto pretrained models datasets: export HF_DATASETS_CACHE=/big-disk/datasets Pavel Rychlý ·Machine Translation ·20 Sep 2023 7 / 7