BAISA, Vít, Jan MICHELFEIT, Marek MEDVEĎ and Miloš JAKUBÍČEK. European Union Language Resources in Sketch Engine. Online. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia: European Language Resources Association (ELRA), 2016, p. 2799-2803. ISBN 978-2-9517408-9-1.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name European Union Language Resources in Sketch Engine
Authors BAISA, Vít (203 Czech Republic, belonging to the institution), Jan MICHELFEIT (203 Czech Republic, belonging to the institution), Marek MEDVEĎ (703 Slovakia, belonging to the institution) and Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution).
Edition Portorož, Slovenia, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), p. 2799-2803, 5 pp. 2016.
Publisher European Language Resources Association (ELRA)
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Slovenia
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
RIV identification code RIV/00216224:14330/16:00087949
Organization unit Faculty of Informatics
ISBN 978-2-9517408-9-1
Keywords in English JRC-Acquis; DCEP; DGT-TM; Europarl; EUR-Lex; Sketch Engine; parallel corpus; word sketch; parallel concordance
Tags firank_B
Tags International impact, Reviewed
Changed by Changed by: RNDr. Marek Medveď, Ph.D., učo 359226. Changed: 3/1/2017 11:12.
Abstract
Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs).
Links
GA15-13277S, research and development projectName: Hyperintensionální logika pro analýzu přirozeného jazyka
Investor: Czech Science Foundation
LM2015071, research and development projectName: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0945/2015, interní kód MUName: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V.
Investor: Masaryk University, Category A
7F14047, research and development projectName: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 1/9/2024 01:52