Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1120433, author = {Kilgarriff, Adam and Suchomel, Vít}, booktitle = {Proceedings of the 8th Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013}, editor = {Stefan Evert , Egon Stemle, Paul Rayson}, howpublished = {elektronická verze "online"}, pages = {46-52}, title = {Web Spam}, url = {http://sigwac.org.uk/raw-attachment/wiki/WAC8/wac8-proceedings.pdf}, year = {2013} }
TY - JOUR ID - 1120433 AU - Kilgarriff, Adam - Suchomel, Vít PY - 2013 TI - Web Spam UR - http://sigwac.org.uk/raw-attachment/wiki/WAC8/wac8-proceedings.pdf N2 - Web spam is getting worse. The biggest difference between our 2008 and 2012 corpora, both crawled in the same way, is web spam. In this paper we talk about what it is, with examples and a discussion of the overlap with ‘legitimate’ marketing material, and present some ideas about how we might identify it automatically in order to filter it out of our web corpora. We also present some linguistic observations that could prove useful for spam identification, and some data relating to changes we have observed between 2008 and 2012. ER -
KILGARRIFF, Adam a Vít SUCHOMEL. Web Spam. Online. In Stefan Evert , Egon Stemle, Paul Rayson. \textit{Proceedings of the 8th Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013}. 2013, s.~46-52.
|