Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{800674, author = {Materna, Jiří}, address = {Brno}, booktitle = {Recent Advances in Slavonic Natural Language Processing}, keywords = {automatic classification; machine learning; thesaurus}, howpublished = {elektronická verze "online"}, language = {eng}, location = {Brno}, isbn = {978-80-210-4741-9}, publisher = {Faculty of Informatics, Masaryk University}, title = {Automatic Web Page Classification}, url = {https://nlp.fi.muni.cz/raslan/2008/papers/6.pdf}, year = {2008} }
TY - JOUR ID - 800674 AU - Materna, Jiří PY - 2008 TI - Automatic Web Page Classification PB - Faculty of Informatics, Masaryk University CY - Brno SN - 9788021047419 KW - automatic classification KW - machine learning KW - thesaurus UR - https://nlp.fi.muni.cz/raslan/2008/papers/6.pdf N2 - Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification, there are often problems with short web pages. In this paper we proposed two approaches to eliminate the lack of information. In the first one we consider a wider context of a web page. That means we analyze web pages referenced from the investigated page. The second approach is based on sophisticated term clustering by their similar grammatical context. This is done using statistic corpora tool the Sketch Engine. ER -
MATERNA, Jiří. Automatic Web Page Classification. Online. In \textit{Recent Advances in Slavonic Natural Language Processing}. Brno: Faculty of Informatics, Masaryk University, 2008, 10 s. ISBN~978-80-210-4741-9.
|