Detailed Information on Publication Record
2016
Annotated Amharic Corpora
RYCHLÝ, Pavel and Vít SUCHOMELBasic information
Original name
Annotated Amharic Corpora
Authors
RYCHLÝ, Pavel (203 Czech Republic, belonging to the institution) and Vít SUCHOMEL (203 Czech Republic, guarantor, belonging to the institution)
Edition
Switzerland, Text, Speech, and Dialogue 19th International Conference, TSD 2016 Brno, Czech Republic, September 12–16, 2016 Proceedings, p. 295-302, 8 pp. 2016
Publisher
Springer International Publishing
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
60200 6.2 Languages and Literature
Country of publisher
Switzerland
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
Impact factor
Impact factor: 0.402 in 2005
RIV identification code
RIV/00216224:14330/16:00088120
Organization unit
Faculty of Informatics
ISBN
978-3-319-45509-9
ISSN
UT WoS
000389707400034
Keywords in English
Amharic; text corpus; web corpus; under-resourced language; corpus annotation; morphological tagger
Tags
Tags
International impact, Reviewed
Změněno: 1/11/2017 11:02, RNDr. Vít Suchomel, Ph.D.
Abstract
V originále
Amharic is one of under-resourced languages. The paper presents two text corpora. The first one is a substantially cleaned version of existing morphologically annotated WIC Corpus (210,000 words). The second one is the largest Amharic text corpus (17 million words). It was created from Web pages automatically crawled in 2013, 2015 and 2016. It is part-of-speech annotated by a tagger trained and evaluated on the WIC Corpus.
Links
GA15-13277S, research and development project |
| ||
7F14047, research and development project |
|