Increasing Coverage of Translation Memories with Linguistically
Motivated Segment Combination Methods

D 2015

Increasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods

MEDVEĎ, Marek, Vít BAISA a Aleš HORÁK

Základní údaje

Originální název

Increasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods

Autoři

MEDVEĎ, Marek (703 Slovensko, domácí), Vít BAISA (203 Česká republika, domácí) a Aleš HORÁK (203 Česká republika, garant, domácí)

Vydání

Bulgaria, Proceedings of The Workshop on Natural Language Processing for Translation Memories (NLP4TM), od s. 31-35, 5 s. 2015

Nakladatel

INCOMA Ltd. Shoumen

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Bulharsko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

The workshop on Natural Language Processing for Translation Memories

Kód RIV

RIV/00216224:14330/15:00081035

Organizační jednotka

Fakulta informatiky

ISBN

978-954-452-032-8

Klíčová slova anglicky

transaltion memories; DGT; MemoQ; Moses; segment; CAT

Změněno: 5. 1. 2016 15:14, RNDr. Marek Medveď, Ph.D.

Anotace

V originále

Translation memories (TMs) used in computer-aided translation (CAT) systems are the highest-quality source of parallel texts since they consist of segment translation pairs approved by professional human translators. The obvious problem is their size and coverage of new document segments when compared with other parallel data. In this paper, we describe several methods for expanding translation memories using linguistically motivated segment combining approaches concentrated on preserving the high translational quality. The evaluation of the methods was done on a medium-size real-world translation memory and documents provided by a Czech translation company as well as on a large publicly available DGT translation memory published by European Commission. The asset of the TM expansion methods were evaluated by the pre-translation analysis of widely used MemoQ CAT system and the METEOR metric was used for measuring the quality of fully expanded new translation segments.

Návaznosti

GA15-13277S, projekt VaV

Název: Hyperintensionální logika pro analýzu přirozeného jazyka

Investor: Grantová agentura ČR, Hyperintensionální logika pro analýzu přirozeného jazyka

LM2010013, projekt VaV

Název: LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat (Akronym: LINDAT-Clarin)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum

7F14047, projekt VaV

Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

Citovat

MEDVEĎ, Marek, Vít BAISA a Aleš HORÁK. Increasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods. In Constantin Orasan and Rohit Gupta. Proceedings of The Workshop on Natural Language Processing for Translation Memories (NLP4TM). Bulgaria: INCOMA Ltd. Shoumen, 2015, s. 31-35. ISBN 978-954-452-032-8.

@inproceedings{1311833,
   author = {Medveď, Marek and Baisa, Vít and Horák, Aleš},
   address = {Bulgaria},
   booktitle = {Proceedings of The Workshop on Natural Language Processing for Translation Memories (NLP4TM)},
   editor = {Constantin Orasan and Rohit Gupta},
   keywords = {transaltion memories; DGT; MemoQ; Moses; segment; CAT},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Bulgaria},
   isbn = {978-954-452-032-8},
   pages = {31-35},
   publisher = {INCOMA Ltd. Shoumen},
   title = {Increasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods},
   url = {http://rgcl.wlv.ac.uk/events/NLP4TM/3_Paper.pdf},
   year = {2015}
}

TY  - JOUR
ID  - 1311833
AU  - Medveď, Marek - Baisa, Vít - Horák, Aleš
PY  - 2015
TI  - Increasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods
PB  - INCOMA Ltd. Shoumen
CY  - Bulgaria
SN  - 9789544520328
KW  - transaltion memories
KW  - DGT
KW  - MemoQ
KW  - Moses
KW  - segment
KW  - CAT
UR  - http://rgcl.wlv.ac.uk/events/NLP4TM/3_Paper.pdf
L2  - http://rgcl.wlv.ac.uk/events/NLP4TM/3_Paper.pdf
N2  - Translation memories (TMs) used in computer-aided translation (CAT) systems are the highest-quality source of parallel texts since they consist of segment translation pairs approved by professional human translators. The obvious problem is their size and coverage of new document segments when compared with other parallel data. In this paper, we describe several methods for expanding translation memories using linguistically motivated segment combining approaches concentrated on preserving the high translational quality. The evaluation of the methods was done on a medium-size real-world translation memory and documents provided by a Czech translation company as well as on a large publicly available DGT translation memory published by European Commission. The asset of the TM expansion methods were evaluated by the pre-translation analysis of widely used MemoQ CAT system and the METEOR metric was used for measuring the quality of fully expanded new translation segments.
ER  -

MEDVEĎ, Marek, Vít BAISA a Aleš HORÁK. Increasing Coverage of Translation Memories with Linguistically Motivated Segment Combination Methods. In Constantin Orasan and Rohit Gupta. \textit{Proceedings of The Workshop on Natural Language Processing for Translation Memories (NLP4TM)}. Bulgaria: INCOMA Ltd. Shoumen, 2015, s.~31-35. ISBN~978-954-452-032-8.

Podrobný výpis o publikaci