D 2020

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁK

Basic information

Original name

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

Authors

MEDVEĎ, Marek (703 Slovakia, guarantor, belonging to the institution), Radoslav SABOL (703 Slovakia, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)

Edition

Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 23-34, 12 pp. 2020

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/20:00114687

Organization unit

Faculty of Informatics

ISBN

978-80-263-1600-8

ISSN

UT WoS

000655471300003

Keywords in English

question answering; dataset management; machine learning; optimization

Tags

International impact
Změněno: 16/5/2022 15:07, Mgr. Michal Petr

Abstract

V originále

Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.

Links

GA18-23891S, research and development project
Name: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation