JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ, Pavel RYCHLÝ and Vít SUCHOMEL. Current Challenges in Web Corpus Building. Online. In Adrien Barbaresi, Felix Bildhauer, Roland Schafer and Egon Stemle. Proceedings of the 12th Web as Corpus Workshop. Marseille, France: European Language Resources Association, 2020, p. 1-4. ISBN 979-10-95546-68-9.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Current Challenges in Web Corpus Building
Authors JAKUBÍČEK, Miloš (203 Czech Republic, guarantor, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic, belonging to the institution), Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Vít SUCHOMEL (203 Czech Republic, belonging to the institution).
Edition Marseille, France, Proceedings of the 12th Web as Corpus Workshop, p. 1-4, 4 pp. 2020.
Publisher European Language Resources Association
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10200 1.2 Computer and information sciences
Country of publisher France
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW článek ve sborníku
RIV identification code RIV/00216224:14330/20:00114153
Organization unit Faculty of Informatics
ISBN 979-10-95546-68-9
Keywords in English Web corpora; corpus building
Tags International impact, Reviewed
Changed by Changed by: RNDr. Vít Suchomel, Ph.D., učo 139723. Changed: 28/5/2020 13:06.
Abstract
In this paper we discuss some of the current challenges in web corpus building that we faced in the recent years when expanding the corpora in Sketch Engine. The purpose of the paper is to provide an overview and raise discussion on possible solutions, rather than bringing ready solutions to the readers. For every issue we try to assess its severity and briefly discuss possible mitigation options.
Links
GA18-23891S, research and development projectName: Hyperintensionální usuzování nad texty přirozeného jazyka
Investor: Czech Science Foundation
LM2018101, research and development projectName: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 26/7/2024 07:23