D 2016

Between Comparable and Parallel: English-Czech Corpus from Wikipedia

ŠTROMAJEROVÁ, Adéla, Vít BAISA and Marek BLAHUŠ

Basic information

Original name

Between Comparable and Parallel: English-Czech Corpus from Wikipedia

Authors

ŠTROMAJEROVÁ, Adéla (203 Czech Republic, guarantor, belonging to the institution), Vít BAISA (203 Czech Republic, belonging to the institution) and Marek BLAHUŠ (203 Czech Republic, belonging to the institution)

Edition

Brno, RASLAN 2016 Recent Advances in Slavonic Natural Language Processing, p. 3-8, 6 pp. 2016

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/16:00091974

Organization unit

Faculty of Informatics

ISBN

978-80-263-1095-2

ISSN

UT WoS

000466886400001

Keywords (in Czech)

paralelní korpus; srovnatelný korpus; Wikipedie

Keywords in English

parallel corpora; comparable corpora; Wikipedia

Tags

International impact, Reviewed
Změněno: 27/5/2021 09:10, Mgr. et Mgr. Vít Baisa, Ph.D.

Abstract

V originále

We describe the process of creating a parallel corpus from Czech and English Wikipedias using methods which are language independent. The corpus consists of Czech and English Wikipedia articles, the Czech ones being translations of the English ones, is aligned on sentence level and is accessible in Sketch Engine corpus manager.

Links

LM2015071, research and development project
Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0863/2015, interní kód MU
Name: Čeština v jednotě synchronie a diachronie - 2016
Investor: Masaryk University, Category A