JAKUBÍČEK, Miloš, Emma ROMANI, Pavel RYCHLÝ and Ondřej HERMAN. Development of HAMOD: a High Agreement Multi-lingual Outlier Detection dataset. In Horák, Rychlý, Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2021). Brno: Tribun EU, 2021, p. 177-183. ISBN 978-80-263-1670-1.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Development of HAMOD: a High Agreement Multi-lingual Outlier Detection dataset
Authors JAKUBÍČEK, Miloš (203 Czech Republic, guarantor, belonging to the institution), Emma ROMANI (380 Italy, belonging to the institution), Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Ondřej HERMAN (203 Czech Republic, belonging to the institution).
Edition Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2021), p. 177-183, 7 pp. 2021.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10200 1.2 Computer and information sciences
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW Domovská stránka workshopu Full text PDF
RIV identification code RIV/00216224:14330/21:00123255
Organization unit Faculty of Informatics
ISBN 978-80-263-1670-1
ISSN 2336-4289
Keywords in English HAMOD; Distributional thesaurus; Outlier detection; Word embeddings; Sketch Engine
Changed by Changed by: RNDr. Miloš Jakubíček, Ph.D., učo 172962. Changed: 22/10/2023 01:48.
Abstract
In this paper we describe further development of a High Agreement Multi- lingual Outlier Detection dataset (HAMOD) outlier that is used for the purpose of evaluation of automatic distributional thesauri. We briefly introduce the task and methodological motivation for developing such a dataset, then we present the current status of the dataset and related tools as well as results measured on the dataset so far (both in terms of agreement rates and thesauri eveluation). Finally we discuss future developments of HAMOD.
Links
LM2018101, research and development projectName: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 1/5/2024 22:21