Style & Identity Recognition

RYGL, Jan. Style & Identity Recognition. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Ninth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2015, s. 3-10. ISBN 978-80-263-0974-1.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Style & Identity Recognition
Autoři	RYGL, Jan (203 Česká republika, garant, domácí).
Vydání	Brno, Ninth Workshop on Recent Advances in Slavonic Natural Language Processing, od s. 3-10, 8 s. 2015.
Nakladatel	Tribun EU

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Česká republika
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	tištěná verze "print"
WWW	paper
Kód RIV	RIV/00216224:14330/15:00085163
Organizační jednotka	Fakulta informatiky
ISBN	978-80-263-0974-1
ISSN	2336-4289
Klíčová slova anglicky	stylometry; authorship recognition; machine learning; open-source
Změnil	Změnil: RNDr. Jan Rygl, učo 208072. Změněno: 26. 5. 2021 18:07.

Anotace

Knowledge of the author’s identity and style can by used in the fight against forged and and anonymous documents and illegal actions in the Internet. Nowadays, there are many systems dedicated to solving stylometric tasks, but they are predominantly designed only for a specific task; they are used exclusively by their owners; or they do not natively support any Slavic languages. Therefore, we present new open-source modular system Style & Identity Recognition (SIR). The system is designed to support any stylometric tasks with minimal efforts (or event by default) by combining dynamic stylometry features selection and prediction driven by input data labels. The system is free for non-commercial applications and easy to use, therefore it can be helpful for people dealing with threatening e-mails or sms, children forum protection against pedophiles and other tasks. Being customizable and freely accessible, it can be also used as a baseline for other systems solving stylometry tasks. System combines machine learning techniques and nature language processing tools. It is written in Python and it is dependent on other open-source Python libraries.

Návaznosti
LM2010013, projekt VaV	Název: LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat (Akronym: LINDAT-Clarin)
LM2010013, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum

VytisknoutZobrazeno: 6. 10. 2024 14:43

Style & Identity Recognition

Další aplikace