Detailed Information on Publication Record
2020
Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁKBasic information
Original name
Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
Authors
MEDVEĎ, Marek (703 Slovakia, guarantor, belonging to the institution), Radoslav SABOL (703 Slovakia, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)
Edition
Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 23-34, 12 pp. 2020
Publisher
Tribun EU
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
RIV identification code
RIV/00216224:14330/20:00114687
Organization unit
Faculty of Informatics
ISBN
978-80-263-1600-8
ISSN
UT WoS
000655471300003
Keywords in English
question answering; dataset management; machine learning; optimization
Tags
International impact
Změněno: 16/5/2022 15:07, Mgr. Michal Petr
Abstract
V originále
Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.
Links
GA18-23891S, research and development project |
|