Detailed Information on Publication Record
2015
Longest-commonest Match
KILGARRIFF, Adam, Vít BAISA, Miloš JAKUBÍČEK and Pavel RYCHLÝBasic information
Original name
Longest-commonest Match
Authors
KILGARRIFF, Adam (826 United Kingdom of Great Britain and Northern Ireland), Vít BAISA (203 Czech Republic, guarantor, belonging to the institution), Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution)
Edition
Jlubljana, Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. p. 397-404, 8 pp. 2015
Publisher
Trojina, Institute for Applied Slovene Studies
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Slovenia
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/15:00080952
Organization unit
Faculty of Informatics
ISBN
978-961-93594-3-3
Keywords in English
multiword expresion; collocation; word sketch; Sketch Engine
Tags
International impact, Reviewed
Změněno: 6/1/2016 11:35, Mgr. et Mgr. Vít Baisa, Ph.D.
Abstract
V originále
Finding two-word collocations is a well-studied task within natural language processing. The result of this task for a given headword is usually a list of collocations sorted by a salience score. In corpus manager Sketch Engine, these pairs are extracted from data using a word sketch grammar relation rules and log-dice statistics resulting in a sorted list of triples . The longest–commonest match is a straightforward extension of these two-word collocations into multiword expressions. The resulting expressions are also very useful for representing the most common realisation of the collocational pair and to facilitate the interpretation of the raw triplet because sometimes, for such a triple, it is not clear from what texts it comes. We present here an algorithm behind the longest–commonest match together with a simple evaluation. The longest–commonest match is already implemented in Sketch Engine.
Links
GA15-13277S, research and development project |
| ||
LM2010013, research and development project |
| ||
7F14047, research and development project |
|