Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1663744, author = {Suchomel, Vít}, address = {Vancouver, Canada}, booktitle = {Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1}, doi = {http://dx.doi.org/10.1007/978-3-030-63128-4_55}, editor = {Kohei Arai, Supriya Kapoor, Rahul Bhatia}, keywords = {Corpus annotation; Inter-annotator agreement; Text genre; Web corpora}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Vancouver, Canada}, isbn = {978-3-030-63127-7}, pages = {738-754}, publisher = {Springer Nature Switzerland AG}, title = {Genre Annotation of Web Corpora: Scheme and Issues}, url = {https://link.springer.com/book/10.1007/978-3-030-63128-4}, year = {2021} }
TY - JOUR ID - 1663744 AU - Suchomel, Vít PY - 2021 TI - Genre Annotation of Web Corpora: Scheme and Issues PB - Springer Nature Switzerland AG CY - Vancouver, Canada SN - 9783030631277 KW - Corpus annotation KW - Inter-annotator agreement KW - Text genre KW - Web corpora UR - https://link.springer.com/book/10.1007/978-3-030-63128-4 L2 - https://link.springer.com/book/10.1007/978-3-030-63128-4 N2 - Unlike traditional corpora made from printed media in the past decades, sources of web corpora are not categorised and described well, thus making it difficult to control the content of the corpus. This paper presents an attempt to classify genres in a large English web corpus through supervised learning. A set of genres suitable for web corpora users is defined based on a research of related work. A genre annotation scheme with active learning rounds is introduced. A collection of web pages representing various genres that was created for this task and a scheme of consequent human annotation of the data set is described. Measuring the inter-annotator agreement revealed that either the problem may not be well defined, or that our expectations concerning the precision and recall of the classifier cannot be met. Eventually, the project was postponed at that point. Possible solutions of the issue are discussed at the end of the paper. ER -
SUCHOMEL, Vít. Genre Annotation of Web Corpora: Scheme and Issues. In Kohei Arai, Supriya Kapoor, Rahul Bhatia. \textit{Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1}. Vancouver, Canada: Springer Nature Switzerland AG, 2021, s.~738-754. ISBN~978-3-030-63127-7. Dostupné z: https://dx.doi.org/10.1007/978-3-030-63128-4\_{}55.
|