Další formáty:
BibTeX
LaTeX
RIS
@misc{1381970, author = {Suchomel, Vít and Rychlý, Pavel}, keywords = {text corpora; Ethiopian languages}, language = {eng}, institution = {Masarykova univerzita}, organization = {Masarykova univerzita}, title = {Set of Ethiopian Web Corpora}, url = {http://habit-project.eu/wiki/SetOfEthiopianWebCorpora}, year = {2016} }
TY - ID - 1381970 AU - Suchomel, Vít - Rychlý, Pavel PY - 2016 TI - Set of Ethiopian Web Corpora KW - text corpora KW - Ethiopian languages UR - http://habit-project.eu/wiki/SetOfEthiopianWebCorpora N2 - A set of 5 corpora for 4 Ethiopian languages: Amharic, Oromo, Somali and Tigrinya. The Amharic WIC corpus is a reprocessed existing corpus with part of speech annotation. The released version contains cleaning (especially numeric expressions) and unification of two versions with different scripts (Geez and SERA transliteration). The web corpora were built using automatic tools from Internet texts. They contain from 2.5 million words (Tigrinya) to 80 million words (Somali) ER -
SUCHOMEL, Vít a Pavel RYCHLÝ. \textit{Set of Ethiopian Web Corpora}. Online. 2016, [citováno 2024-04-24]
|