D 2011

Building Corpora of Technical Texts : Approaches and Tools

SOJKA, Petr, Martin LÍŠKA and Michal RŮŽIČKA

Basic information

Original name

Building Corpora of Technical Texts : Approaches and Tools

Name in Czech

Budování korpusů technických textů : přístupy a nástroje

Authors

SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution), Martin LÍŠKA (703 Slovakia, belonging to the institution) and Michal RŮŽIČKA (203 Czech Republic, belonging to the institution)

Edition

první. Brno, Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011, p. 71--82, 11 pp. 2011

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/11:00053999

Organization unit

Faculty of Informatics

ISBN

978-80-263-0077-9

Keywords (in Czech)

reprezentace matematických formulí; matematické korpusy;podobnost;MREC;DML-CZ;EuDML

Keywords in English

language of mathematics;mathematics of language;math representation;m-term;similarity;DML-CZ;EuDML

Tags

International impact, Reviewed
Změněno: 21/1/2013 00:51, doc. RNDr. Petr Sojka, Ph.D.

Abstract

V originále

Building corpora of technical texts in Science, Technology, Engineering, and Mathematics (STEM) domain has its specific needs, especially the handling of mathematical formulae. In particular, there is no widely accepted format to represent and handle math. We present an approach based on multiple representations of mathematical formulae that has been used for math retrieval, similarity and clustering of mathematical corpus. We provide an overview of our toolset, summarize our experiments to date and propose further research directions and approaches.

Links

LC536, research and development project
Name: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
250503, interní kód MU
Name: The European Digital Mathematics Library (Acronym: EuDML)
Investor: European Union