Informační systém MU
PAKRAY, Partha and Petr SOJKA. An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2014. Brno: Tribun EU. p. 107-117. ISSN 2336-4289. doi:10.13140/2.1.4036.2561. 2014.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name An Architecture for Scientific Document Retrieval Using Textual and Math Entailment Modules
Authors PAKRAY, Partha (356 India, belonging to the institution) and Petr SOJKA (203 Czech Republic, guarantor, belonging to the institution).
Edition Brno, Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2014, p. 107-117, 11 pp. 2014.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW preprint článku DOI
RIV identification code RIV/00216224:14330/14:00077458
Organization unit Faculty of Informatics
ISSN 2336-4289
Doi http://dx.doi.org/10.13140/2.1.4036.2561
UT WoS 000374560500014
Keywords (in Czech) reprezentace jazyka; výběr významu; výběr významového slova; výběr významu slova; diskretizace reprezentace; reprezentace významu; empirická lingvistika
Keywords in English natural language representation; priming; lexical priming; semantic priming; data discretization; language modelling; representation of meaning; personal mental lexicon; empirical linguistics
Tags International impact
Changed by Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 11/1/2017 09:50.
Abstract
We present an architecture for scientific document retrieval. An existing system for textual and math-ware retrieval Math Indexer and Searcher MIaS is designed for extensions by modules for textual and math-aware entailment. The goal is to increase quality of retrieval (precision and recall) by handling natural languge variations of expressing semantically the same in texts and/or formulae. Entailment modules are designed to use several, ordered layers of processing on lexical, syntactic and semantic levels using natural language processing tools adapted for handling tree structures like mathematical formulae. If these tools are not able to decide on the entailment, generic knowledge databases are used deploying distributional semantics methods and tools. It is shown that sole use of distributional semantics for semantic textual entailment decisions on sentence level is surprisingly good. Finally, further research plans to deploy results in the digital mathematical libraries are outlined.
Links
LG13010, research and development projectName: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
250503, interní kód MUName: The European Digital Mathematics Library (Acronym: EuDML)
Investor: European Union
Displayed: 29/3/2024 14:16