D 2011

A Framework for Authorship Identification in the Internet Environment

RYGL, Jan and Aleš HORÁK

Basic information

Original name

A Framework for Authorship Identification in the Internet Environment

Authors

RYGL, Jan (203 Czech Republic, belonging to the institution) and Aleš HORÁK (203 Czech Republic, guarantor, belonging to the institution)

Edition

1st ed. Brno (Czech Republic), Proceedings of Fifth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2011, p. 117-124, 8 pp. 2011

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

20200 2.2 Electrical engineering, Electronic engineering, Information engineering

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/11:00054037

Organization unit

Faculty of Informatics

ISBN

978-80-263-0077-9

Keywords (in Czech)

určování autorství;podobnost autorství

Keywords in English

authorship identification;authorship similarity

Tags

International impact
Změněno: 26/5/2021 18:06, RNDr. Jan Rygl

Abstract

V originále

Misuse of anonymous online communication for illegal purposes has become a major concern. In this paper, we present a framework named ART (Authorship Recognition Tool), that is designed to minimize manual procedures and maximize the efficiency of authorship identification based on the content of Internet electronic documents. The framework covers the phases of document retrieval and database document management. ART provides implementations of efficient authorship identification algorithm and authorship similarity algorithm including the possibility to obtain extra data for learning and tests. The framework also determines whether or not different author’s identities are interlinked. The authorship is analysed by machine learning and natural language processing methods. Technical information such as IP address is considered only as an optional attribute for the machine learning because it can be easily forged or devalued if the author communicates from public places or through proxy servers.

Links

LC536, research and development project
Name: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
VF20102014003, research and development project
Name: Analýza přirozeného jazyka v prostředí internetu (Acronym: APJI)
Investor: Ministry of the Interior of the CR