Other formats:
BibTeX
LaTeX
RIS
@inproceedings{1589940, author = {Nevěřilová, Zuzana and Stará, Marie}, address = {Brno}, booktitle = {Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019}, editor = {Aleš Horák, Pavel Rychlý, Adam Rambousek}, keywords = {Czech Tagger; Multi-word Expressions; Pretrained WordEmbeddings}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Brno}, isbn = {978-80-263-1517-9}, pages = {23-32}, publisher = {Tribun EU}, title = {Neural Tagger for Czech Language: Capturing Linguistic Phenomena in Web Corpora}, url = {https://nlp.fi.muni.cz/raslan/2019/paper10-neverilova.pdf}, year = {2019} }
TY - JOUR ID - 1589940 AU - Nevěřilová, Zuzana - Stará, Marie PY - 2019 TI - Neural Tagger for Czech Language: Capturing Linguistic Phenomena in Web Corpora PB - Tribun EU CY - Brno SN - 9788026315179 KW - Czech Tagger KW - Multi-word Expressions KW - Pretrained WordEmbeddings UR - https://nlp.fi.muni.cz/raslan/2019/paper10-neverilova.pdf L2 - https://nlp.fi.muni.cz/raslan/2019/paper10-neverilova.pdf N2 - We propose a new tagger for the Czech language and particu-larly for the tagset used for annotation of corpora of the TenTen family.The tagger is based on neural networks with pretrained word embed-dings. We selected the newest Czech Web corpus of the TenTen familyas training data, but we removed sentences with phenomena that wereoften annotated incorrectly. We let the tagger to learn the annotation ofthese phenomena on its own. We also experimented with the recognitionof multi-word expressions since this information can support the correcttagging.We evaluated the tagger on 6,950 sentences (84,023 tokens) from thecstenten17corpus and achieved 75.25% accuracy when compared bytags. When compared by attributes, we achieved 91.62% accuracy; theaccuracy of POS tag prediction is 96.5%. ER -
NEVĚŘILOVÁ, Zuzana and Marie STARÁ. Neural Tagger for Czech Language: Capturing Linguistic Phenomena in Web Corpora. In Aleš Horák, Pavel Rychlý, Adam Rambousek. \textit{Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019}. Brno: Tribun EU, 2019, p.~23-32. ISBN~978-80-263-1517-9.
|