SOTOLÁŘ, Ondřej, Jaromír PLHÁK and David ŠMAHEL. Towards Personal Data Anonymization for Social Messaging. In Kamil Ekštein, František Pártl, Miloslav Konopík. Text, Speech, and Dialogue. Cham: Springer, Cham, 2021, p. 281-292. ISBN 978-3-030-83526-2. Available from: https://dx.doi.org/10.1007/978-3-030-83527-9_24.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Towards Personal Data Anonymization for Social Messaging
Authors SOTOLÁŘ, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Jaromír PLHÁK (203 Czech Republic, belonging to the institution) and David ŠMAHEL (203 Czech Republic, belonging to the institution).
Edition Cham, Text, Speech, and Dialogue, p. 281-292, 12 pp. 2021.
Publisher Springer, Cham
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Switzerland
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
Impact factor Impact factor: 0.402 in 2005
RIV identification code RIV/00216224:14330/21:00119196
Organization unit Faculty of Informatics
ISBN 978-3-030-83526-2
ISSN 0302-9743
Doi http://dx.doi.org/10.1007/978-3-030-83527-9_24
Keywords in English Text anonymization; Personal data; Sanitization; De-identification; Privacy protection
Tags firank_B
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 9/9/2021 13:23.
Abstract
We present a method for building text corpora for the supervised learning of text-to-text anonymization while maintaining a strict privacy policy. In our solution, personal data entities are detected, classified, and anonymized. We use available machine-learning methods, like named-entity recognition, and improve their performance by grouping multiple entities into larger units based on the theory of tabular data anonymization. Experimental results on annotated Czech Facebook Messenger conversations reveal that our solution has recall comparable to human annotators. On the other hand, precision is much lower because of the low efficiency of the named entity recognition in the domain of social messaging conversations. The resulting anonymized text is of high utility because of the replacement methods that produce natural text.
Links
GX19-27828X, research and development projectName: Pohled do budoucnosti: Porozumění vlivu technologií na “well-being” adolescentů (Acronym: FUTURE)
Investor: Czech Science Foundation
PrintDisplayed: 24/7/2024 11:32