D 2014

Rapid prototyping of a web categorization tool

NAVRÁTIL, Jaromír and Lubomír POPELÍNSKÝ

Basic information

Original name

Rapid prototyping of a web categorization tool

Authors

NAVRÁTIL, Jaromír (203 Czech Republic, guarantor, belonging to the institution) and Lubomír POPELÍNSKÝ (203 Czech Republic, belonging to the institution)

Edition

NY, USA, IDEAS '14 Proceedings of the 18th International Database Engineering & Applications Symposium, p. 294-297, 4 pp. 2014

Publisher

ACM New York

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

RIV identification code

RIV/00216224:14330/14:00076180

Organization unit

Faculty of Informatics

ISBN

978-1-4503-2627-8

UT WoS

000471152000036

Keywords in English

web mining;categorization of web pages;machine learning;landmarking
Změněno: 5/3/2018 20:31, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

This paper introduces a new method for fast prototyping of web page categorization tool based on Random Forests. The result of this work is three-fold. We describe a fast feature extraction method first. Afterwards, we introduce a system that enables a user to perform experiments manually and visualize the results via visual analytics module. The last part of this work concerns a way how to perform experiments efficiently. It is partially inspired by landmarking that allows limiting the number of experiments. This method has been used for building a new commercial system for web categorization that significantly outperforms the system already being used.