SUCHOMEL, Šimon and Michal BRANDEJS. Determining Window Size from Plagiarism Corpus for Stylometric Features. In Mothe, Josiane and Savoy, Jacques and Kamps, Jaap and Pinel-Sauvagnat, Karen and Jones, GarethJ.F. and SanJuan, Eric and Cappellato, Linda and Ferro, Nicola. Experimental IR Meets Multilinguality, Multimodality, and Interaction. Toulouse, France: Springer International Publishing, 2015. p. 293-299. ISBN 978-3-319-24026-8. doi:10.1007/978-3-319-24027-5_31.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Determining Window Size from Plagiarism Corpus for Stylometric Features
Authors SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, guarantor, belonging to the institution).
Edition Toulouse, France, Experimental IR Meets Multilinguality, Multimodality, and Interaction, p. 293-299, 7 pp. 2015.
Publisher Springer International Publishing
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher France
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW Springer Link
Impact factor Impact factor: 0.402 in 2005
RIV identification code RIV/00216224:14330/15:00084706
Organization unit Faculty of Informatics
ISBN 978-3-319-24026-8
ISSN 0302-9743
Doi http://dx.doi.org/10.1007/978-3-319-24027-5_31
UT WoS 000364677800034
Keywords in English plagiarism; average word frequency class; stylometry; text classification; intrinsic plagiarism
Tags firank_B, intrinsic plagiarism, Plagiarism, plagiarism detection, stylometric features, stylometry
Tags International impact, Reviewed
Changed by Changed by: RNDr. Šimon Suchomel, Ph.D., učo 98949. Changed: 16/11/2015 11:33.
Abstract
The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called ‘average word frequency class’ using the PAN 2015 source retrieval training corpus for plagiarism detection. The paper shows the pros and cons of the stop words removal for the sliding window document profiling and discusses the utilization of the selected feature for intrinsic plagiarism detection. The experiment resulted in the recommendation of setting the sliding windows to around 100 words in length for computing the text profile using the average word frequency class stylometric feature.
Links
LG13010, research and development projectName: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
Type Name Uploaded/Created by Uploaded/Created Rights
Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf Licence Creative Commons  File version Suchomel, Š. 16/11/2015

Properties

Address within IS
https://is.muni.cz/auth/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf
Address for the users outside IS
https://is.muni.cz/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf
Address within Manager
https://is.muni.cz/auth/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf?info
Address within Manager for the users outside IS
https://is.muni.cz/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf?info
Uploaded/Created
Mon 16/11/2015 11:31, RNDr. Šimon Suchomel, Ph.D.

Rights

Right to read
  • anyone on the Internet
Right to upload
 
Right to administer:
  • a concrete person doc. Ing. Michal Brandejs, CSc., učo 2116
  • a concrete person RNDr. Šimon Suchomel, Ph.D., učo 98949
Attributes
 

Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf
Address for the users outside IS
http://is.muni.cz/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.pdf
File type
PDF (application/pdf)
Size
272,5 KB
Hash md5
f756015f85e300ee0e3670485a138dee
Uploaded/Created
Mon 16/11/2015 11:31

Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.txt

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.txt
Address for the users outside IS
http://is.muni.cz/publication/1317554/Determining_Window_Size_from_Plagiarism_Corpus_for_Stylometric_Features.txt
File type
plain text (text/plain)
Size
17 KB
Hash md5
63b9ecd8e5c558c1468190492f8c2017
Uploaded/Created
Mon 16/11/2015 11:34
Print
Report a file uploaded without authorization. Displayed: 8/8/2022 15:56