SUCHOMEL, Šimon a Michal BRANDEJS. Determining Window Size from Plagiarism Corpus for Stylometric Features. In Mothe, Josiane and Savoy, Jacques and Kamps, Jaap and Pinel-Sauvagnat, Karen and Jones, GarethJ.F. and SanJuan, Eric and Cappellato, Linda and Ferro, Nicola. Experimental IR Meets Multilinguality, Multimodality, and Interaction. Toulouse, France: Springer International Publishing, 2015, s. 293-299. ISBN 978-3-319-24026-8. Dostupné z: https://dx.doi.org/10.1007/978-3-319-24027-5_31. |
Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1317554, author = {Suchomel, Šimon and Brandejs, Michal}, address = {Toulouse, France}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction}, doi = {http://dx.doi.org/10.1007/978-3-319-24027-5_31}, editor = {Mothe, Josiane and Savoy, Jacques and Kamps, Jaap and Pinel-Sauvagnat, Karen and Jones, GarethJ.F. and SanJuan, Eric and Cappellato, Linda and Ferro, Nicola}, keywords = {plagiarism; average word frequency class; stylometry; text classification; intrinsic plagiarism}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Toulouse, France}, isbn = {978-3-319-24026-8}, pages = {293-299}, publisher = {Springer International Publishing}, title = {Determining Window Size from Plagiarism Corpus for Stylometric Features}, url = {http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31}, year = {2015} }
TY - JOUR ID - 1317554 AU - Suchomel, Šimon - Brandejs, Michal PY - 2015 TI - Determining Window Size from Plagiarism Corpus for Stylometric Features PB - Springer International Publishing CY - Toulouse, France SN - 9783319240268 KW - plagiarism KW - average word frequency class KW - stylometry KW - text classification KW - intrinsic plagiarism UR - http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31 L2 - http://link.springer.com/chapter/10.1007%2F978-3-319-24027-5_31 N2 - The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called ‘average word frequency class’ using the PAN 2015 source retrieval training corpus for plagiarism detection. The paper shows the pros and cons of the stop words removal for the sliding window document profiling and discusses the utilization of the selected feature for intrinsic plagiarism detection. The experiment resulted in the recommendation of setting the sliding windows to around 100 words in length for computing the text profile using the average word frequency class stylometric feature. ER -
SUCHOMEL, Šimon a Michal BRANDEJS. Determining Window Size from Plagiarism Corpus for Stylometric Features. In Mothe, Josiane and Savoy, Jacques and Kamps, Jaap and Pinel-Sauvagnat, Karen and Jones, GarethJ.F. and SanJuan, Eric and Cappellato, Linda and Ferro, Nicola. \textit{Experimental IR Meets Multilinguality, Multimodality, and Interaction}. Toulouse, France: Springer International Publishing, 2015, s.~293-299. ISBN~978-3-319-24026-8. Dostupné z: https://dx.doi.org/10.1007/978-3-319-24027-5\_{}31.
|