Finding Definitions in Large Corpora with Sketch Engine

KOVÁŘ, Vojtěch, Monika MOČIARIKOVÁ a Pavel RYCHLÝ. Finding Definitions in Large Corpora with Sketch Engine. In Nicoletta Calzolari (Conference Chair) et al. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia: European Language Resources Association (ELRA), 2016, s. 391-394. ISBN 978-2-9517408-9-1.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Finding Definitions in Large Corpora with Sketch Engine
Autoři	KOVÁŘ, Vojtěch (203 Česká republika, garant, domácí), Monika MOČIARIKOVÁ (703 Slovensko, domácí) a Pavel RYCHLÝ (203 Česká republika, domácí).
Vydání	Portorož, Slovenia, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), od s. 391-394, 4 s. 2016.
Nakladatel	European Language Resources Association (ELRA)

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Francie
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	paměťový nosič (CD, DVD, flash disk)
Kód RIV	RIV/00216224:14330/16:00088334
Organizační jednotka	Fakulta informatiky
ISBN	978-2-9517408-9-1
Klíčová slova anglicky	Sketch Engine; definition; definitions; CQL; corpora
Štítky	firank_B
Změnil	Změnil: doc. Mgr. Pavel Rychlý, Ph.D., učo 3692. Změněno: 20. 12. 2016 13:55.

Anotace

The paper describes automatic definition finding implemented within the leading corpus query and management tool, Sketch Engine. The implementation exploits complex pattern-matching queries in the corpus query language (CQL) and the indexing mechanism of word sketches for finding and storing definition candidates throughout the corpus. The approach is evaluated for Czech and English corpora, showing that the results are usable in practice: precision of the tool ranges between 30 and 75 percent (depending on the major corpus text types) and we were able to extract nearly 2 million definition candidates from an English corpus with 1.4 billion words. The feature is embedded into the interface as a concordance filter, so that users can search for definitions of any query to the corpus, including very specific multi-word queries. The results also indicate that ordinary texts (unlike explanatory texts) contain rather low number of definitions, which is perhaps the most important problem with automatic definition finding in general.

Návaznosti
GA15-13277S, projekt VaV	Název: Hyperintensionální logika pro analýzu přirozeného jazyka
GA15-13277S, projekt VaV	Investor: Grantová agentura ČR, Hyperintensionální logika pro analýzu přirozeného jazyka
7F14047, projekt VaV	Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)
7F14047, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

VytisknoutZobrazeno: 9. 5. 2024 16:35

Finding Definitions in Large Corpora with Sketch Engine

Další aplikace