Efficient Management and Optimization of Very Large Machine
Learning Dataset for Question Answering

D 2020

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

MEDVEĎ, Marek, Radoslav SABOL a Aleš HORÁK

Základní údaje

Originální název

Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering

Autoři

MEDVEĎ, Marek (703 Slovensko, garant, domácí), Radoslav SABOL (703 Slovensko, domácí) a Aleš HORÁK (203 Česká republika, domácí)

Vydání

Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, od s. 23-34, 12 s. 2020

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

PDF ve sborníku Domovská stránka workshopu

Kód RIV

RIV/00216224:14330/20:00114687

Organizační jednotka

Fakulta informatiky

ISBN

978-80-263-1600-8

ISSN

UT WoS

000655471300003

Klíčová slova anglicky

question answering; dataset management; machine learning; optimization

Štítky

dataset management, machine learning, Optimization, question answering

Příznaky

Mezinárodní význam

Změněno: 16. 5. 2022 15:07, Mgr. Michal Petr

Anotace

V originále

Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.

Návaznosti

GA18-23891S, projekt VaV

Název: Hyperintensionální usuzování nad texty přirozeného jazyka

Investor: Grantová agentura ČR, Hyperintensionální usuzování nad texty přirozeného jazyka

Citovat

MEDVEĎ, Marek, Radoslav SABOL a Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, s. 23-34. ISBN 978-80-263-1600-8.

@inproceedings{1729476,
   author = {Medveď, Marek and Sabol, Radoslav and Horák, Aleš},
   address = {Brno},
   booktitle = {Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020},
   editor = {Aleš Horák},
   keywords = {question answering; dataset management; machine learning; optimization},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   isbn = {978-80-263-1600-8},
   pages = {23-34},
   publisher = {Tribun EU},
   title = {Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering},
   url = {https://nlp.fi.muni.cz/raslan/raslan20.pdf#page=21},
   year = {2020}
}

TY  - JOUR
ID  - 1729476
AU  - Medveď, Marek - Sabol, Radoslav - Horák, Aleš
PY  - 2020
TI  - Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
PB  - Tribun EU
CY  - Brno
SN  - 9788026316008
KW  - question answering
KW  - dataset management
KW  - machine learning
KW  - optimization
UR  - https://nlp.fi.muni.cz/raslan/raslan20.pdf#page=21
N2  - Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated.
ER  -

MEDVEĎ, Marek, Radoslav SABOL a Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. \textit{Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020}. Brno: Tribun EU, 2020, s.~23-34. ISBN~978-80-263-1600-8.

Podrobný výpis o publikaci