Similarity Ranking as Attribute for Machine Learning Approach
to Authorship Identification

D 2012

Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification

RYGL, Jan a Aleš HORÁK

Základní údaje

Originální název

Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification

Autoři

RYGL, Jan a Aleš HORÁK

Vydání

Istanbul (Turkey), Proceedings of the Eight International Conference on Language Resources and Evaluation, od s. nestránkováno, 4 s. 2012

Nakladatel

European Language Resources Association

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

60200 6.2 Languages and Literature

Stát vydavatele

Turecko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

URL

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/12:00060279

Organizační jednotka

Fakulta informatiky

ISBN

978-2-9517408-7-7

UT WoS

000323927700117

Klíčová slova anglicky

authorship identification; machine learning; similarity ranking

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 4. 7. 2014 16:34, RNDr. Jan Rygl

Anotace

V originále

In the authorship identification task, examples of short writings of N authors and an anonymous document written by one of these N authors are given. The task is to determine the authorship of the anonymous text. Practically all approaches solved this problem with machine learning methods. The input attributes for the machine learning process are usually formed by stylistic or grammatical properties of individual documents or a defined similarity between a document and an author. In this paper, we present the results of an experiment to extend the machine learning attributes by ranking the similarity between a document and an author: we transform the similarity between an unknown document and one of the N authors to the order in which the author is the most similar to the document in the set of N authors. The comparison of similarity probability and similarity ranking was made using the Support Vector Machines algorithm. The results show that machine learning methods perform slightly better with attributes based on the ranking of similarity than with previously used similarity between an author and a document.

Návaznosti

VF20102014003, projekt VaV

Název: Analýza přirozeného jazyka v prostředí internetu (Akronym: APJI)

Investor: Ministerstvo vnitra ČR, Analýza přirozeného jazyka v prostředí internetu

Citovat

RYGL, Jan a Aleš HORÁK. Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification. In Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis. Proceedings of the Eight International Conference on Language Resources and Evaluation. Istanbul (Turkey): European Language Resources Association, 2012, s. nestránkováno, 4 s. ISBN 978-2-9517408-7-7.

@inproceedings{985956,
   author = {Rygl, Jan and Horák, Aleš},
   address = {Istanbul (Turkey)},
   booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation},
   editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
   keywords = {authorship identification; machine learning; similarity ranking},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Istanbul (Turkey)},
   isbn = {978-2-9517408-7-7},
   pages = {nestránkováno},
   publisher = {European Language Resources Association},
   title = {Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification},
   url = {http://www.lrec-conf.org/proceedings/lrec2012/summaries/618.html},
   year = {2012}
}

TY  - CONF
ID  - 985956
AU  - Rygl, Jan - Horák, Aleš
PY  - 2012
TI  - Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification
PB  - European Language Resources Association
CY  - Istanbul (Turkey)
SN  - 9782951740877
KW  - authorship identification
KW  - machine learning
KW  - similarity ranking
UR  - http://www.lrec-conf.org/proceedings/lrec2012/summaries/618.html
N2  - In the authorship identification task, examples of short writings of N authors and an anonymous document written by one of these N authors are given. The task is to determine the authorship of the anonymous text. Practically all approaches solved this problem with machine learning methods. The input attributes for the machine learning process are usually formed by stylistic or grammatical properties of individual documents or a defined similarity between a document and an author. In this paper, we present the results of an experiment to extend the machine learning attributes by ranking the similarity between a document and an author: we transform the similarity between an unknown document and one of the N authors to the order in which the author is the most similar to the document in the set of N authors. The comparison of similarity probability and similarity ranking was made using the Support Vector Machines algorithm. The results show that machine learning methods perform slightly better with attributes based on the ranking of similarity than with previously used similarity between an author and a document.
ER  -

RYGL, Jan a Aleš HORÁK. Similarity Ranking as Attribute for Machine Learning Approach to Authorship Identification. In Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis. \textit{Proceedings of the Eight International Conference on Language Resources and Evaluation}. Istanbul (Turkey): European Language Resources Association, 2012, s.~nestránkováno, 4 s. ISBN~978-2-9517408-7-7.

Přehled o publikaci