D 2006

The selection of electronic text documents supported by only positive examples

ŽIŽKA, Jan; Jiří HROZA; Bruno POULIQUEN; Camelia IGNAT; Ralf STEINBERGER et al.

Základní údaje

Originální název

The selection of electronic text documents supported by only positive examples

Název česky

Selekce elektronických textových dokumentů podporovaná pouze pozitivními příklady

Autoři

ŽIŽKA, Jan; Jiří HROZA; Bruno POULIQUEN; Camelia IGNAT a Ralf STEINBERGER

Vydání

Besancon, France, JADT'06, od s. 1001-1010, 10 s. 2006

Nakladatel

Presses Universitaires de Franche-Comte

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Francie

Utajení

není předmětem státního či obchodního tajemství

Označené pro přenos do RIV

Ne

Organizační jednotka

Přírodovědecká fakulta

ISBN

2-84867130-0

Klíčová slova anglicky

machine learning text categorization relevance ranking k-nearest neighbors support vector machines positive examples feature selection feature optimization domain-independence

Příznaky

Mezinárodní význam, Recenzováno
Změněno: 4. 1. 2007 17:37, doc. Ing. Jan Žižka, CSc.

Anotace

V originále

The European Commission has a freely accessible news monitoring system called the Europe Media Monitor NewsBrief (http://press.jrc.it/), which is available for all twenty official languages of the European Union, plus some more languages. Among other things, NewsBrief categorizes articles through routing procedures and it alerts users interested in a large variety of different subject domains automatically. In the effort to improve the multilingual categorization and relevance ranking functionality for some complex interest profiles, for which only positive examples are currently available, we implemented a modified k-NN (k-nearest neighbors) algorithm and empirically detected parameters and parameter settings that produce good results for rather different subject areas (news on the EU-Constitution, on Iraq, and on Terrorism). Experiments on this real-life data yielded very satisfying results: a precision of over 90% for a recall of up to 70%. These results were then compared to others achieved with one-class SVM and with SVM that was trained on both positive and artificially generated negative example sets. Efforts are currently underway to incorporate this new functionality within NewsBrief and to make it available to the users.

Česky

The European Commission has a freely accessible news monitoring system called the Europe Media Monitor NewsBrief (http://press.jrc.it/), which is available for all twenty official languages of the European Union, plus some more languages. Among other things, NewsBrief categorizes articles through routing procedures and it alerts users interested in a large variety of different subject domains automatically. In the effort to improve the multilingual categorization and relevance ranking functionality for some complex interest profiles, for which only positive examples are currently available, we implemented a modified k-NN (k-nearest neighbors) algorithm and empirically detected parameters and parameter settings that produce good results for rather different subject areas (news on the EU-Constitution, on Iraq, and on Terrorism). Experiments on this real-life data yielded very satisfying results: a precision of over 90% for a recall of up to 70%. These results were then compared to others achieved with one-class SVM and with SVM that was trained on both positive and artificially generated negative example sets. Efforts are currently underway to incorporate this new functionality within NewsBrief and to make it available to the users.