J 2017

Large aligned treebanks for syntax-based machine translation

KOTZÉ, Gideon, Vincent VANDEGHINSTE, Scott MARTENS a Jörg TIEDEMANN

Základní údaje

Originální název

Large aligned treebanks for syntax-based machine translation

Autoři

KOTZÉ, Gideon, Vincent VANDEGHINSTE, Scott MARTENS a Jörg TIEDEMANN

Vydání

LANGUAGE RESOURCES AND EVALUATION, NETHERLANDS, SPRINGER, 2017, 1574-020X

Další údaje

Typ výsledku

Článek v odborném periodiku

Utajení

není předmětem státního či obchodního tajemství

Impakt faktor

Impact factor: 0.656

Klíčová slova anglicky

parallel treebank, parallel corpus, machine translation, syntax-based machine translation, constituent alignment, tree alignment, resource development

Příznaky

Mezinárodní význam, Recenzováno
Změněno: 1. 11. 2022 12:11, Gideon Kotzé, PhD

Anotace

V originále

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the non-terminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present non-terminal alignment evaluation scores for a variety of tree alignment approaches. Finally, based on the parallel treebanks created by these approaches, we evaluate the MT system itself and compare the scores with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.