Selecting Interesting Articles Using Their Similarity Based
Only on Positive Examples

D 2005

Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples

HROZA, Jiří a Jan ŽIŽKA

Základní údaje

Originální název

Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples

Název česky

Výběr zajímavých článků pomocí jejich podobnosti k relevantním příkladům

Autoři

HROZA, Jiří (203 Česká republika, garant) a Jan ŽIŽKA (203 Česká republika)

Vydání

Germany, Computational linguistics and Intelligent Text Processing, od s. 608-611, 4 s. 2005

Nakladatel

Springer-Verlag Berlin Heidelberg

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Německo

Utajení

není předmětem státního či obchodního tajemství

Kód RIV

RIV/00216224:14330/05:00013632

Organizační jednotka

Fakulta informatiky

ISBN

3-540-24523-5

UT WoS

000228725100065

Klíčová slova anglicky

machine learning; text categorization; text filtration; text similarity; k-NN; ranking

Štítky

k-NN, machine learning, ranking, text categorization, text filtration, text similarity

Změněno: 1. 3. 2005 13:35, RNDr. Jiří Hroza

Anotace

ORIG CZ

V originále

The task of automated searching for interesting text documents frequently suffers from a~very poor balance among documents representing both positive and negative examples or from one completely missing class. This paper suggests the ranking approach based on the k-NN algorithm adapted for determining the similarity degree of new documents just to the representative positive collection. From the viewpoint of the precision-recall relation, a~user can decide in advance how many and how similar articles should be released through a filter.

Česky

Automatizované vyhledávání zajímavých článků často trpí nevyvážeností tříd reprezentujících pozitivní a negativní příklady nebo přímo chybějící třídou. Tento článek navrhuje přístup založený na algoritmu k-NN modifikovaném pro seřazení neznámých dokumentů jen na základě pozitivních příkladů. Z pohledu přesnosti a pokrytí se může uživatel rozhodnout, jak mnoho zajímavých článků má být tímto algoritmem propuštěno.

Návaznosti

MSM 143300003, záměr

Název: Interakce člověka s počítačem, dialogové systémy a asistivní technologie

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Interakce člověka s počítačem, dialogové systémy a asistivní technologie

Podrobný výpis o publikaci