D 2021

Development of HAMOD: a High Agreement Multi-lingual Outlier Detection dataset

JAKUBÍČEK, Miloš, Emma ROMANI, Pavel RYCHLÝ and Ondřej HERMAN

Basic information

Original name

Development of HAMOD: a High Agreement Multi-lingual Outlier Detection dataset

Authors

JAKUBÍČEK, Miloš (203 Czech Republic, guarantor, belonging to the institution), Emma ROMANI (380 Italy, belonging to the institution), Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Ondřej HERMAN (203 Czech Republic, belonging to the institution)

Edition

Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2021), p. 177-183, 7 pp. 2021

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10200 1.2 Computer and information sciences

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/21:00123255

Organization unit

Faculty of Informatics

ISBN

978-80-263-1670-1

ISSN

Keywords in English

HAMOD; Distributional thesaurus; Outlier detection; Word embeddings; Sketch Engine
Změněno: 15/5/2024 10:24, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

In this paper we describe further development of a High Agreement Multi- lingual Outlier Detection dataset (HAMOD) outlier that is used for the purpose of evaluation of automatic distributional thesauri. We briefly introduce the task and methodological motivation for developing such a dataset, then we present the current status of the dataset and related tools as well as results measured on the dataset so far (both in terms of agreement rates and thesauri eveluation). Finally we discuss future developments of HAMOD.

Links

LM2018101, research and development project
Name: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)
Investor: Ministry of Education, Youth and Sports of the CR