An Architecture for Scientific Document Retrieval Using Textual
and Math Entailment Modules

D 2014

An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules

PAKRAY, Partha a Petr SOJKA

Základní údaje

Originální název

An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules

Autoři

PAKRAY, Partha (356 Indie, domácí) a Petr SOJKA (203 Česká republika, garant, domácí)

Vydání

Brno, Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2014, od s. 107-117, 11 s. 2014

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

preprint článku DOI

Kód RIV

RIV/00216224:14330/14:00077458

Organizační jednotka

Fakulta informatiky

ISSN

DOI

http://dx.doi.org/10.13140/2.1.4036.2561

UT WoS

000374560500014

Klíčová slova česky

reprezentace jazyka; výběr významu; výběr významového slova; výběr významu slova; diskretizace reprezentace; reprezentace významu; empirická lingvistika

Klíčová slova anglicky

natural language representation; priming; lexical priming; semantic priming; data discretization; language modelling; representation of meaning; personal mental lexicon; empirical linguistics

Příznaky

Mezinárodní význam

Změněno: 11. 1. 2017 09:50, doc. RNDr. Petr Sojka, Ph.D.

Anotace

V originále

We present an architecture for scientific document retrieval. An existing system for textual and math-ware retrieval Math Indexer and Searcher MIaS is designed for extensions by modules for textual and math-aware entailment. The goal is to increase quality of retrieval (precision and recall) by handling natural languge variations of expressing semantically the same in texts and/or formulae. Entailment modules are designed to use several, ordered layers of processing on lexical, syntactic and semantic levels using natural language processing tools adapted for handling tree structures like mathematical formulae. If these tools are not able to decide on the entailment, generic knowledge databases are used deploying distributional semantics methods and tools. It is shown that sole use of distributional semantics for semantic textual entailment decisions on sentence level is surprisingly good. Finally, further research plans to deploy results in the digital mathematical libraries are outlined.

Návaznosti

LG13010, projekt VaV

Název: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Akronym: ERCIM-CZ)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Zastoupení ČR v European Research Consortium for Informatics and Mathematics

250503, interní kód MU

Název: The European Digital Mathematics Library (Akronym: EuDML)

Investor: Evropská unie, The European Digital Mathematics Library

Citovat

PAKRAY, Partha a Petr SOJKA. An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2014. Brno: Tribun EU, 2014, s. 107-117. ISSN 2336-4289. Dostupné z: https://dx.doi.org/10.13140/2.1.4036.2561.

@inproceedings{1210448,
   author = {Pakray, Partha and Sojka, Petr},
   address = {Brno},
   booktitle = {Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2014},
   doi = {http://dx.doi.org/10.13140/2.1.4036.2561},
   keywords = {natural language representation; priming; lexical priming; semantic priming; data discretization; language modelling; representation of meaning; personal mental lexicon; empirical linguistics},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   pages = {107-117},
   publisher = {Tribun EU},
   title = {An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules},
   url = {http://www.fi.muni.cz/usr/sojka/papers/pakray-sojka-raslan2014.pdf},
   year = {2014}
}

TY  - CONF
ID  - 1210448
AU  - Pakray, Partha - Sojka, Petr
PY  - 2014
TI  - An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules
PB  - Tribun EU
CY  - Brno
KW  - natural language representation
KW  - priming
KW  - lexical priming
KW  - semantic priming
KW  - data discretization
KW  - language modelling
KW  - representation of meaning
KW  - personal mental lexicon
KW  - empirical linguistics
UR  - http://www.fi.muni.cz/usr/sojka/papers/pakray-sojka-raslan2014.pdf
L2  - https://dx.doi.org/10.13140/2.1.4036.2561
N2  - We present an architecture for scientific document retrieval. An existing system for textual and math-ware retrieval Math Indexer and Searcher MIaS is designed for extensions by modules for textual and math-aware entailment. The goal is to increase quality of retrieval (precision and recall) by handling natural languge variations of expressing semantically the same in texts and/or formulae. Entailment modules are designed to use several, ordered layers of processing on lexical, syntactic and semantic levels using natural language processing tools adapted for handling tree structures like mathematical formulae. If these tools are not able to decide on the entailment, generic knowledge databases are used deploying distributional semantics methods and tools. It is shown that sole use of distributional semantics for semantic textual entailment decisions on sentence level is surprisingly good. Finally, further research plans to deploy results in the digital mathematical libraries are outlined.
ER  -

PAKRAY, Partha a Petr SOJKA. An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules. In \textit{Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2014}. Brno: Tribun EU, 2014, s.~107-117. ISSN~2336-4289. Dostupné z: https://dx.doi.org/10.13140/2.1.4036.2561.

Přehled o publikaci