Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{493330, author = {Hroza, Jiří and Žižka, Jan and Bourek, Aleš}, address = {Germany}, booktitle = {Computational linguistics and Intelligent Text Processing}, keywords = {machine learning; text categorization; text filtration; text similarity}, language = {eng}, location = {Germany}, isbn = {3-540-21006-7}, pages = {511-520}, publisher = {Springer-Verlag Berlin Heidelberg}, title = {Filtering Very Similar Text Documents: A Case Study}, year = {2004} }
TY - JOUR ID - 493330 AU - Hroza, Jiří - Žižka, Jan - Bourek, Aleš PY - 2004 TI - Filtering Very Similar Text Documents: A Case Study PB - Springer-Verlag Berlin Heidelberg CY - Germany SN - 3540210067 KW - machine learning KW - text categorization KW - text filtration KW - text similarity N2 - This paper describes problems with classification and filtration of similar relevant and irrelevant real medical documents from one very specific domain, obtained from the Internet resources. Besides the similarity, the documents are often unbalanced-a lack of irrelevant documents for the training. A definition of similarity is suggested. For the classification, six algorithms are tested from the document similarity point of view. The best results are provided by the back propagation-based neural network and by the radial basis function-based support vector machine. ER -
HROZA, Jiří, Jan ŽIŽKA a Aleš BOUREK. Filtering Very Similar Text Documents: A Case Study. In \textit{Computational linguistics and Intelligent Text Processing}. Germany: Springer-Verlag Berlin Heidelberg, 2004, s.~511-520. ISBN~3-540-21006-7.
|