Towards Personal Data Anonymization for Social Messaging

SOTOLÁŘ, Ondřej, Jaromír PLHÁK and David ŠMAHEL. Towards Personal Data Anonymization for Social Messaging. In Kamil Ekštein, František Pártl, Miloslav Konopík. Text, Speech, and Dialogue. Cham: Springer, Cham, 2021, p. 281-292. ISBN 978-3-030-83526-2. Available from: https://dx.doi.org/10.1007/978-3-030-83527-9_24.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Towards Personal Data Anonymization for Social Messaging
Authors	SOTOLÁŘ, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Jaromír PLHÁK (203 Czech Republic, belonging to the institution) and David ŠMAHEL (203 Czech Republic, belonging to the institution).
Edition	Cham, Text, Speech, and Dialogue, p. 281-292, 12 pp. 2021.
Publisher	Springer, Cham

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Switzerland
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	URL
Impact factor	Impact factor: 0.402 in 2005
RIV identification code	RIV/00216224:14330/21:00119196
Organization unit	Faculty of Informatics
ISBN	978-3-030-83526-2
ISSN	0302-9743
Doi	http://dx.doi.org/10.1007/978-3-030-83527-9_24
Keywords in English	Text anonymization; Personal data; Sanitization; De-identification; Privacy protection
Tags	firank_B
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 9/9/2021 13:23.

Abstract

We present a method for building text corpora for the supervised learning of text-to-text anonymization while maintaining a strict privacy policy. In our solution, personal data entities are detected, classified, and anonymized. We use available machine-learning methods, like named-entity recognition, and improve their performance by grouping multiple entities into larger units based on the theory of tabular data anonymization. Experimental results on annotated Czech Facebook Messenger conversations reveal that our solution has recall comparable to human annotators. On the other hand, precision is much lower because of the low efficiency of the named entity recognition in the domain of social messaging conversations. The resulting anonymized text is of high utility because of the replacement methods that produce natural text.

Links
GX19-27828X, research and development project	Name: Pohled do budoucnosti: Porozumění vlivu technologií na “well-being” adolescentů (Acronym: FUTURE)
GX19-27828X, research and development project	Investor: Czech Science Foundation

PrintDisplayed: 24/7/2024 11:32

Towards Personal Data Anonymization for Social Messaging

Other applications