RYGL, Jan. Automatic Adaptation of Author's Stylometric Features to Document Types. In Petr Sojka; Aleš Horák; Ivan Kopeček; Karel Pala. Text, Speech, and Dialogue - 17th International Conference. 8655th ed. Switzerland: Springer International Publishing, 2014, p. 53-61. ISBN 978-3-319-10815-5. Available from: https://dx.doi.org/10.1007/978-3-319-10816-2_7.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Automatic Adaptation of Author's Stylometric Features to Document Types
Name in Czech Automatická adaptace stylometrických rysů autora podle typu dokumentů
Authors RYGL, Jan (203 Czech Republic, guarantor, belonging to the institution).
Edition 8655. vyd. Switzerland, Text, Speech, and Dialogue - 17th International Conference, p. 53-61, 9 pp. 2014.
Publisher Springer International Publishing
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Switzerland
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
Impact factor Impact factor: 0.402 in 2005
RIV identification code RIV/00216224:14330/14:00073237
Organization unit Faculty of Informatics
ISBN 978-3-319-10815-5
ISSN 0302-9743
Doi http://dx.doi.org/10.1007/978-3-319-10816-2_7
Keywords (in Czech) verifikace autorství; výběr atributů; strojové učení; stylom; stylometrické rysy
Keywords in English authorship verification; feature selection; machine learning; stylome; stylometric features
Tags firank_B
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 1/4/2015 10:35.
Abstract
Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: books, blogs, discussions, comments and tweets. A method of an automatic selection of authors’ stylometric features using a double-layer machine learning is proposed and evaluated. Experiments are conducted on ten disjunct train and test sets and a method of an efficient training of large number of machine learning models is introduced (163,700 models were trained).
Links
VF20102014003, research and development projectName: Analýza přirozeného jazyka v prostředí internetu (Acronym: APJI)
Investor: Ministry of the Interior of the CR
PrintDisplayed: 4/5/2024 03:51