SOJKA, Petr, Martin LÍŠKA and Michal RŮŽIČKA. Building Corpora of Technical Texts : Approaches and Tools. In Aleš Horák, Pavel Rychlý. Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011. první. Brno: Tribun EU, 2011, p. 71--82, 11 pp. ISBN 978-80-263-0077-9.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Building Corpora of Technical Texts : Approaches and Tools
Name in Czech Budování korpusů technických textů : přístupy a nástroje
Authors SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution), Martin LÍŠKA (703 Slovakia, belonging to the institution) and Michal RŮŽIČKA (203 Czech Republic, belonging to the institution).
Edition první. Brno, Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011, p. 71--82, 11 pp. 2011.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW slides full paper in PDF Workshop web page
RIV identification code RIV/00216224:14330/11:00053999
Organization unit Faculty of Informatics
ISBN 978-80-263-0077-9
Keywords (in Czech) reprezentace matematických formulí; matematické korpusy;podobnost;MREC;DML-CZ;EuDML
Keywords in English language of mathematics;mathematics of language;math representation;m-term;similarity;DML-CZ;EuDML
Tags International impact, Reviewed
Changed by Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 21/1/2013 00:51.
Abstract
Building corpora of technical texts in Science, Technology, Engineering, and Mathematics (STEM) domain has its specific needs, especially the handling of mathematical formulae. In particular, there is no widely accepted format to represent and handle math. We present an approach based on multiple representations of mathematical formulae that has been used for math retrieval, similarity and clustering of mathematical corpus. We provide an overview of our toolset, summarize our experiments to date and propose further research directions and approaches.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
250503, interní kód MUName: The European Digital Mathematics Library (Acronym: EuDML)
Investor: European Union
PrintDisplayed: 24/4/2024 16:19