Efficient Management and Optimization of Very Large Machine
Learning Dataset for Question Answering

MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 23-34. ISBN 978-80-263-1600-8.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
Authors	MEDVEĎ, Marek (703 Slovakia, guarantor, belonging to the institution), Radoslav SABOL (703 Slovakia, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution).
Edition	Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 23-34, 12 pp. 2020.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	PDF ve sborníku Domovská stránka workshopu
RIV identification code	RIV/00216224:14330/20:00114687
Organization unit	Faculty of Informatics
ISBN	978-80-263-1600-8
ISSN	2336-4289
UT WoS	000655471300003
Keywords in English	question answering; dataset management; machine learning; optimization
Tags	dataset management, machine learning, Optimization, question answering
Tags	International impact
Changed by	Changed by: Mgr. Michal Petr, učo 65024. Changed: 16/5/2022 15:07.

Abstract

Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.

Links
GA18-23891S, research and development project	Name: Hyperintensionální usuzování nad texty přirozeného jazyka
GA18-23891S, research and development project	Investor: Czech Science Foundation

PrintDisplayed: 1/5/2024 04:06

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

Other applications