European Union Language Resources in Sketch Engine

D 2016

European Union Language Resources in Sketch Engine

BAISA, Vít, Jan MICHELFEIT, Marek MEDVEĎ a Miloš JAKUBÍČEK

Základní údaje

Originální název

European Union Language Resources in Sketch Engine

Autoři

BAISA, Vít (203 Česká republika, domácí), Jan MICHELFEIT (203 Česká republika, domácí), Marek MEDVEĎ (703 Slovensko, domácí) a Miloš JAKUBÍČEK (203 Česká republika, domácí)

Vydání

Portorož, Slovenia, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), od s. 2799-2803, 5 s. 2016

Nakladatel

European Language Resources Association (ELRA)

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Slovinsko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

elektronická verze "online"

Odkazy

URL

Kód RIV

RIV/00216224:14330/16:00087949

Organizační jednotka

Fakulta informatiky

ISBN

978-2-9517408-9-1

Klíčová slova anglicky

JRC-Acquis; DCEP; DGT-TM; Europarl; EUR-Lex; Sketch Engine; parallel corpus; word sketch; parallel concordance

Štítky

firank_B

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 3. 1. 2017 11:12, RNDr. Marek Medveď, Ph.D.

Anotace

V originále

Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs).

Návaznosti

GA15-13277S, projekt VaV

Název: Hyperintensionální logika pro analýzu přirozeného jazyka

Investor: Grantová agentura ČR, Hyperintensionální logika pro analýzu přirozeného jazyka

LM2015071, projekt VaV

Název: Jazyková výzkumná infrastruktura v České republice (Akronym: LINDAT-Clarin)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum

MUNI/A/0945/2015, interní kód MU

Název: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V.

Investor: Masarykova univerzita, Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V., DO R. 2020_Kategorie A - Specifický výzkum - Studentské výzkumné projekty

7F14047, projekt VaV

Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

Citovat

BAISA, Vít, Jan MICHELFEIT, Marek MEDVEĎ a Miloš JAKUBÍČEK. European Union Language Resources in Sketch Engine. Online. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia: European Language Resources Association (ELRA), 2016, s. 2799-2803. ISBN 978-2-9517408-9-1.

@inproceedings{1346032,
   author = {Baisa, Vít and Michelfeit, Jan and Medveď, Marek and Jakubíček, Miloš},
   address = {Portorož, Slovenia},
   booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
   editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
   keywords = {JRC-Acquis; DCEP; DGT-TM; Europarl; EUR-Lex; Sketch Engine; parallel corpus; word sketch; parallel concordance},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Portorož, Slovenia},
   isbn = {978-2-9517408-9-1},
   pages = {2799-2803},
   publisher = {European Language Resources Association (ELRA)},
   title = {European Union Language Resources in Sketch Engine},
   url = {http://www.lrec-conf.org/proceedings/lrec2016/pdf/572_Paper.pdf},
   year = {2016}
}

TY  - JOUR
ID  - 1346032
AU  - Baisa, Vít - Michelfeit, Jan - Medveď, Marek - Jakubíček, Miloš
PY  - 2016
TI  - European Union Language Resources in Sketch Engine
PB  - European Language Resources Association (ELRA)
CY  - Portorož, Slovenia
SN  - 9782951740891
KW  - JRC-Acquis
KW  - DCEP
KW  - DGT-TM
KW  - Europarl
KW  - EUR-Lex
KW  - Sketch Engine
KW  - parallel corpus
KW  - word sketch
KW  - parallel concordance
UR  - http://www.lrec-conf.org/proceedings/lrec2016/pdf/572_Paper.pdf
N2  - Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs).
ER  -

BAISA, Vít, Jan MICHELFEIT, Marek MEDVEĎ a Miloš JAKUBÍČEK. European Union Language Resources in Sketch Engine. Online. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis. \textit{Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}. Portorož, Slovenia: European Language Resources Association (ELRA), 2016, s.~2799-2803. ISBN~978-2-9517408-9-1.

Podrobný výpis o publikaci