Corpus-based Disambiguation for Machine Translation

D 2011

Corpus-based Disambiguation for Machine Translation

BAISA, Vít

Základní údaje

Originální název

Corpus-based Disambiguation for Machine Translation

Autoři

BAISA, Vít

Vydání

první. Brno, Recent Advances in Slavonic Natural Language Processing, od s. 81-87, 7 s. 2011

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

URL

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/11:00054035

Organizační jednotka

Fakulta informatiky

ISBN

978-80-263-0077-9

Klíčová slova anglicky

word sense disambiguation machine translation word sketch collocations

Příznaky

Recenzováno

Změněno: 8. 6. 2021 09:15, Mgr. et Mgr. Vít Baisa, Ph.D.

Anotace

V originále

This paper deals with problem of choosing a proper translation for polysemous words. We describe an original method for partial word sense disambiguation of such words using word sketches extracted from large-scale corpora and using simple English-Czech dictionary. Each word is translated from English to Czech and a word sketch for the word is compared with all word sketches of its appropriate Czech equivalents. These comparisons serve for choosing a proper translation of the word: given a context containing one of collocates from the English word sketch, result data can serve directly in the process of machine translation of the English word and at the same time it can be considered as a partial disambiguation of that word. Moreover, the results may be used for clustering word sketches according to distinct meanings of their headwords.

Návaznosti

LC536, projekt VaV

Název: Centrum komputační lingvistiky

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Centrum komputační lingvistiky

248307, interní kód MU

Název: Pattern Recognition-based Statistically Enhanced MT (Akronym: PRESEMT)

Investor: Evropská unie, Pattern Recognition-based Statistically Enhanced MT, Spolupráce

Přehled o publikaci