Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1181590, author = {Kilgarriff, Adam and Jakubíček, Miloš and Kovář, Vojtěch and Rychlý, Pavel and Suchomel, Vít}, address = {Gothenburg, Sweden}, booktitle = {Proceedings of the Demonstrations at the 14th Conferencethe European Chapter of the Association for Computational Linguistics}, keywords = {terminology; terms; corpora; sketch engine}, howpublished = {elektronická verze "online"}, language = {eng}, location = {Gothenburg, Sweden}, isbn = {978-1-937284-75-6}, pages = {53-56}, publisher = {The Association for Computational Linguistics}, title = {Finding Terms in Corpora for Many Languages with the Sketch Engine}, url = {http://aclweb.org/anthology/E/E14/E14-2014.pdf}, year = {2014} }
TY - JOUR ID - 1181590 AU - Kilgarriff, Adam - Jakubíček, Miloš - Kovář, Vojtěch - Rychlý, Pavel - Suchomel, Vít PY - 2014 TI - Finding Terms in Corpora for Many Languages with the Sketch Engine PB - The Association for Computational Linguistics CY - Gothenburg, Sweden SN - 9781937284756 KW - terminology KW - terms KW - corpora KW - sketch engine UR - http://aclweb.org/anthology/E/E14/E14-2014.pdf L2 - http://aclweb.org/anthology/E/E14/E14-2014.pdf N2 - Term candidates for a domain, in a language, can be found by • taking a corpus for the domain, and a refer- ence corpus for the language • identifying the grammatical shape of a term in the language • tokenising, lemmatising and POS-tagging both corpora • identifying (and counting) the items in each corpus which match the grammatical shape • for each item in the domain corpus, compar- ing its frequency with its frequency in the refence corpus. Then, the items with the highest frequency in the domain corpus in comparison to the reference cor- pus will be the top term candidates. None of the steps above are unusual or innova- tive for NLP (see, e. g., (Aker et al., 2013), (Go- jun et al., 2012)). However it is far from trivial to implement them all, for numerous languages, in an environment that makes it easy for non- programmers to find the terms in a domain. This is what we have done in the Sketch Engine (Kilgarriff et al., 2004), and will demonstrate. In this abstract we describe how we addressed each of the stages above. ER -
KILGARRIFF, Adam, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Pavel RYCHLÝ a Vít SUCHOMEL. Finding Terms in Corpora for Many Languages with the Sketch Engine. Online. In \textit{Proceedings of the Demonstrations at the 14th Conferencethe European Chapter of the Association for Computational Linguistics}. Gothenburg, Sweden: The Association for Computational Linguistics, 2014, s.~53-56. ISBN~978-1-937284-75-6.
|