MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 23-34. ISBN 978-80-263-1600-8.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
Authors MEDVEĎ, Marek (703 Slovakia, guarantor, belonging to the institution), Radoslav SABOL (703 Slovakia, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution).
Edition Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 23-34, 12 pp. 2020.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW PDF ve sborníku Domovská stránka workshopu
RIV identification code RIV/00216224:14330/20:00114687
Organization unit Faculty of Informatics
ISBN 978-80-263-1600-8
ISSN 2336-4289
UT WoS 000655471300003
Keywords in English question answering; dataset management; machine learning; optimization
Tags dataset management, machine learning, Optimization, question answering
Tags International impact
Changed by Changed by: Mgr. Michal Petr, učo 65024. Changed: 16/5/2022 15:07.
Abstract
Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.
Links
GA18-23891S, research and development projectName: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation
PrintDisplayed: 1/5/2024 04:06