Detailed Information on Publication Record
2014
Extrinsic Corpus Evaluation with a Collocation Dictionary Task
KILGARRIFF, Adam, Pavel RYCHLÝ, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Vít BAISA et. al.Basic information
Original name
Extrinsic Corpus Evaluation with a Collocation Dictionary Task
Authors
KILGARRIFF, Adam (826 United Kingdom of Great Britain and Northern Ireland), Pavel RYCHLÝ (203 Czech Republic, guarantor, belonging to the institution), Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic, belonging to the institution), Vít BAISA (203 Czech Republic, belonging to the institution) and Lucia KOCINCOVÁ (703 Slovakia, belonging to the institution)
Edition
Reykjavik, Iceland, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), p. 1-8, 8 pp. 2014
Publisher
European Language Resources Association (ELRA)
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/14:00073227
Organization unit
Faculty of Informatics
ISBN
978-2-9517408-8-4
UT WoS
000355611002024
Keywords in English
corpus; evaluation; collocation
Tags
Změněno: 20/7/2018 14:43, Mgr. Michal Petr
Abstract
V originále
The NLP researcher or application-builder often wonders ``what corpus should I use, or should I build one of my own? If I build one of my own, how will I know if I have done a good job?'' Currently there is very little help available for them. They are in need of a framework for evaluating corpora. We develop such a framework, in relation to corpora which aim for good coverage of `general language'. The task we set is automatic creation of a publication-quality collocations dictionary. For a sample of 100 headwords of Czech and 100 of English, we identify a gold standard dataset of (ideally) all the collocations that should appear for these headwords in such a dictionary. The datasets are being made available alongside this paper. We then use them to determine precision and recall for a range of corpora, with a range of parameters.
Links
LM2010013, research and development project |
| ||
VF20102014003, research and development project |
|