Corpus Annotation Tool

RYCHLÝ, Pavel. Corpus Annotation Tool. 2017.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Corpus Annotation Tool
Authors	RYCHLÝ, Pavel (203 Czech Republic, guarantor, belonging to the institution).
Edition	2017.

Other information
Original language	English
Type of outcome	Software
Field of Study	60200 6.2 Languages and Literature
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
WWW	URL
RIV identification code	RIV/00216224:14330/17:00096859
Organization unit	Faculty of Informatics
Keywords in English	text corpora; corpus annotation; part of speech tagging
Technical parameters	The tool was used to annotate texts in 6 languages by 16 annotators in total. Czech and Norwegian corpora were annotated mainly for evaluation reasons, there are several PoS annotated (including UD tag set) corpora for both languages. Annotation of 4 Ethiopian languages (Amharic, Oromo, Somali, Tigrinya) was used to build respective PoS taggers which are part of the HaBiT system.
Changed by	Changed by: doc. Mgr. Pavel Rychlý, Ph.D., učo 3692. Changed: 1/6/2017 13:53.

Abstract

The main goal of this work is an annotation tool for easy and fast production of small annotated corpora for languages without linguistics resources. The tool is optimised for the following priorities: Simple tool for instant usage: The client part of the tool is a web application which works in any browser, it requires small amount of all resources (memory, internet connection, screen size), it could be used even from mobile devices (tablets, phones). Using small standard tag set: If possible, we have used Universal Dependencies (UD) tag set (version 2). It is well documented and used for many different languages. The main description of the tags is directly included in the tool with links to the UD site. Fast navigation: Users can use mouse or keyboard for navigation, only one or two mouse clicks or key strokes are needed for annotation of one token. All sentences are pre-annotated via an builtin adaptive tagger and only incorrect annotation have to be corrected by users. Clean texts: Users can (and are instructed to) reject to annotate a sentence if the sentence is not clear.

Links
7F14047, research and development project	Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
7F14047, research and development project	Investor: Ministry of Education, Youth and Sports of the CR

PrintDisplayed: 23/7/2024 02:36

Corpus Annotation Tool

Other applications