STARÝ, Michal, Zuzana NEVĚŘILOVÁ and Jakub VALČÍK. Multilingual Recognition of Temporal Expressions. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 67-78. ISBN 978-80-263-1600-8.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Multilingual Recognition of Temporal Expressions
Authors STARÝ, Michal (203 Czech Republic, guarantor, belonging to the institution), Zuzana NEVĚŘILOVÁ (203 Czech Republic, belonging to the institution) and Jakub VALČÍK (203 Czech Republic).
Edition Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 67-78, 12 pp. 2020.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW Domovská stránka sborníku PDF ve sborníku
RIV identification code RIV/00216224:14330/20:00117840
Organization unit Faculty of Informatics
ISBN 978-80-263-1600-8
ISSN 2336-4289
UT WoS 000655471300007
Keywords in English temporal expressions; multilingual; date recognition
Tags date recognition, multilingual, temporal expressions
Tags International impact
Changed by Changed by: Mgr. Michal Petr, učo 65024. Changed: 16/5/2022 15:10.
Abstract
The paper presents a multilingual approach to temporal expression recognition (TER) using existing tools and their combination. We observe that the rules based methods perform well on documents using wellformed temporal expressions in a narrower domain (e.g., news), while data driven methods are more stable within less standard language and texts across domains. With combination of the two approaches, we achieved F1 of 0.73 and 0.9 for strict and relaxed evaluations respectively on one English dataset. Although these results do not achieve the state-of-the-art on English, the same method outperformed the state-of-the-art results in a multilingual setting not only in recall but also in F1. We see this as a strong indication that combining rule based systems with data driven models such as BERT is a valid approach to improve the overall performance in TER, especially for languages other than English. Further observations indicate that in the domain of office documents, the combined method is able to recognize general temporal expressions as well as domain specific ones (e.g., those used in financial documents).
Links
LM2018101, research and development projectName: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 19/9/2024 18:37