Efficient Management and Optimization of Very Large Machine
Learning Dataset for Question Answering

D 2020

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁK

Basic information

Original name

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

Authors

MEDVEĎ, Marek (703 Slovakia, guarantor, belonging to the institution), Radoslav SABOL (703 Slovakia, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)

Edition

Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 23-34, 12 pp. 2020

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

Publication form

printed version "print"

References:

PDF ve sborníku Domovská stránka workshopu

RIV identification code

RIV/00216224:14330/20:00114687

Organization unit

Faculty of Informatics

ISBN

978-80-263-1600-8

ISSN

UT WoS

000655471300003

Keywords in English

question answering; dataset management; machine learning; optimization

Abstract

V originále

Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.

Links

GA18-23891S, research and development project

Name: Hyperintensionální usuzování nad texty přirozeného jazyka

Investor: Czech Science Foundation

Citovat

MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 23-34. ISBN 978-80-263-1600-8.

@inproceedings{1729476,
   author = {Medveď, Marek and Sabol, Radoslav and Horák, Aleš},
   address = {Brno},
   booktitle = {Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020},
   editor = {Aleš Horák},
   keywords = {question answering; dataset management; machine learning; optimization},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   isbn = {978-80-263-1600-8},
   pages = {23-34},
   publisher = {Tribun EU},
   title = {Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering},
   url = {https://nlp.fi.muni.cz/raslan/raslan20.pdf#page=21},
   year = {2020}
}

TY  - CONF
ID  - 1729476
AU  - Medveď, Marek - Sabol, Radoslav - Horák, Aleš
PY  - 2020
TI  - Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
PB  - Tribun EU
CY  - Brno
SN  - 9788026316008
KW  - question answering
KW  - dataset management
KW  - machine learning
KW  - optimization
UR  - https://nlp.fi.muni.cz/raslan/raslan20.pdf#page=21
N2  - Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.
ER  -

MEDVEĎ, Marek, Radoslav SABOL and Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. \textit{Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020}. Brno: Tribun EU, 2020, p.~23-34. ISBN~978-80-263-1600-8.

Přehled o publikaci