Information System of Masaryk University 

Building Corpora of Technical Texts : Approaches and Tools

česky | in English

SOJKA, Petr, Martin LÍŠKA and Michal RŮŽIČKA. Building Corpora of Technical Texts : Approaches and Tools. In Aleš Horák, Pavel Rychlý. Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011. první. Brno: Tribun EU, 2011. p. 71--82, 11 pp. ISBN 978-80-263-0077-9.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Building Corpora of Technical Texts : Approaches and Tools
Name in Czech Budování korpusů technických textů : přístupy a nástroje
Authors SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution), Martin LÍŠKA (703 Slovakia, belonging to the institution) and Michal RŮŽIČKA (203 Czech Republic, belonging to the institution).
Edition první. Brno, Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2011, p. 71--82, 11 pp. 2011.
Publisher Tribun EU
Other information
Original language English
Type of outcome article in proceedings
Field of Study Informatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW slides full paper in PDF Workshop web page
RIV identification code RIV/00216224:14330/11:00053999
Organization unit Faculty of Informatics
ISBN 978-80-263-0077-9
Keywords (in Czech) reprezentace matematických formulí; matematické korpusy;podobnost;MREC;DML-CZ;EuDML
Keywords in English language of mathematics;mathematics of language;math representation;m-term;similarity;DML-CZ;EuDML
Tags International impact, Reviewed
Changed by Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 21. 1. 2013 00:51.
Abstract
Building corpora of technical texts in Science, Technology, Engineering, and Mathematics (STEM) domain has its specific needs, especially the handling of mathematical formulae. In particular, there is no widely accepted format to represent and handle math. We present an approach based on multiple representations of mathematical formulae that has been used for math retrieval, similarity and clustering of mathematical corpus. We provide an overview of our toolset, summarize our experiments to date and propose further research directions and approaches.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Basic Research Center
250503, internal MU codeName: The European Digital Mathematics Library (Acronym: EuDML)
Investor: European Union, Competitiveness and inovation framework programme
PrintDisplayed: 21. 9. 2017 05:25

Other references 


Go to top | Current date and time: 21. 9. 2017 05:25, Week 38 (even)

Contact: istech(zavináč/atsign)fi(tečka/dot)muni(tečka/dot)cz, Office for Studies, access rights administrators, is-technicians, e-technicians, IT support | Use of cookies | learn more about Information System