u 2010

State of the Art of Augmenting Metadata Techniques and Technology: Deliverable 7.1 of project EuDML

SOJKA, Petr; Josef BAKER; Alan SEXTON and Volker SORGE

Basic information

Original name

State of the Art of Augmenting Metadata Techniques and Technology: Deliverable 7.1 of project EuDML

Authors

SOJKA, Petr (203 Czech Republic, guarantor, belonging to the institution); Josef BAKER (826 United Kingdom of Great Britain and Northern Ireland); Alan SEXTON (826 United Kingdom of Great Britain and Northern Ireland) and Volker SORGE (826 United Kingdom of Great Britain and Northern Ireland)

Edition

1.2 as of 2nd November 2010. 40 pp. Deliverable D7.1, 2010

Publisher

EU CIP-ICT-PSP project 250503 EuDML: The European Digital Mathematics Library

Other information

Language

English

Type of outcome

Special-purpose publication

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

References:

RIV identification code

RIV/00216224:14330/10:00062158

Organization unit

Faculty of Informatics

Keywords in English

The European Digital Mathematics Library; EuDML; DML-CZ; digitisation workflow; metadata enhancements; Digital mathematics library; DML; scanning; MathML; math retrieval; metadata; metadata editor; FineReader; OCRopus; Tesseract

Tags

International impact, Reviewed
Changed: 5/12/2012 16:25, Mgr. Lucia Kocincová

Abstract

V originále

Identification of main issues and challenges on augmenting metadata techniques and technologies appropriate for using on a corpora of mathematical scientific documents. For most partial tasks tools were identified that are able to cover basic functionalities that are expected to be needed by a digital library of EuDML type, as in other projects like PubMed Central or Portico. Generic standard techniques for metadata enhancement and normalization are applicable there. Deliverable also reviews and identifies expertize and tools from some project partners (MU, CMD, ICM, FIZ, IU, and IMI-BAS). Main (unresolved) challenges posed are OCR of mathematics and reliable and robust converting between different math formats (TEX and MathML) to normalize in one primary metadata format (NLM Archiving DTD Suite) to allow services like math indexing and search. In a follow up deliverable D7.2, tools and techniques will be chosen for usage in the EuDML core engine (combining YADDA and REPOX), or as a (loosely coupled) set of enhancement tools in a linked data fashion.

Links

250503, interní kód MU
Name: The European Digital Mathematics Library (Acronym: EuDML)
Investor: European Union

Files attached