Longest-commonest Match

KILGARRIFF, Adam, Vít BAISA, Miloš JAKUBÍČEK a Pavel RYCHLÝ. Longest-commonest Match. Online. In Kosem, I., Jakubíček, M., Kallas, J., Krek, S. Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. Jlubljana: Trojina, Institute for Applied Slovene Studies, 2015. s. 397-404. ISBN 978-961-93594-3-3. [citováno 2024-04-24]

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Longest-commonest Match
Autoři	KILGARRIFF, Adam (826 Velká Británie a Severní Irsko), Vít BAISA (203 Česká republika, garant, domácí), Miloš JAKUBÍČEK (203 Česká republika, domácí) a Pavel RYCHLÝ (203 Česká republika, domácí)
Vydání	Jlubljana, Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. od s. 397-404, 8 s. 2015.
Nakladatel	Trojina, Institute for Applied Slovene Studies

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Slovinsko
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	elektronická verze "online"
WWW	URL
Kód RIV	RIV/00216224:14330/15:00080952
Organizační jednotka	Fakulta informatiky
ISBN	978-961-93594-3-3
Klíčová slova anglicky	multiword expresion; collocation; word sketch; Sketch Engine
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: Mgr. et Mgr. Vít Baisa, Ph.D., učo 139654. Změněno: 6. 1. 2016 11:35.

Anotace

Finding two-word collocations is a well-studied task within natural language processing. The result of this task for a given headword is usually a list of collocations sorted by a salience score. In corpus manager Sketch Engine, these pairs are extracted from data using a word sketch grammar relation rules and log-dice statistics resulting in a sorted list of triples . The longest–commonest match is a straightforward extension of these two-word collocations into multiword expressions. The resulting expressions are also very useful for representing the most common realisation of the collocational pair and to facilitate the interpretation of the raw triplet because sometimes, for such a triple, it is not clear from what texts it comes. We present here an algorithm behind the longest–commonest match together with a simple evaluation. The longest–commonest match is already implemented in Sketch Engine.

Návaznosti
GA15-13277S, projekt VaV	Název: Hyperintensionální logika pro analýzu přirozeného jazyka
GA15-13277S, projekt VaV	Investor: Grantová agentura ČR, Hyperintensionální logika pro analýzu přirozeného jazyka
LM2010013, projekt VaV	Název: LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat (Akronym: LINDAT-Clarin)
LM2010013, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum
7F14047, projekt VaV	Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)
7F14047, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

VytisknoutZobrazeno: 24. 4. 2024 10:00

Longest-commonest Match

Další aplikace