Detailed Information on Publication Record
2016
Semi-automatic mining of correlated data from a complex database: Correlation network visualization
LEXA, Matej and Radovan LAPÁRBasic information
Original name
Semi-automatic mining of correlated data from a complex database: Correlation network visualization
Authors
LEXA, Matej (703 Slovakia, guarantor, belonging to the institution) and Radovan LAPÁR (703 Slovakia, belonging to the institution)
Edition
New York, Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on, p. 1-2, 2 pp. 2016
Publisher
IEEE
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
United States of America
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
RIV identification code
RIV/00216224:14330/16:00092617
Organization unit
Faculty of Informatics
ISBN
978-1-5090-4199-2
ISSN
UT WoS
000392416700019
Keywords in English
data mining; biomedical database; denormalization; visualization; correlation network
Tags
International impact, Reviewed
Změněno: 13/5/2020 19:33, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
In previous work we have addressed the issue of frequent ad-hoc queries in deeply-structured databases. We wrote a library of functions AutodenormLib.py for issuing proper JOIN commands to denormalize an arbitrary subset of stored data for downstream processing. This may include statistical analysis, visualization or machine learning. Here, we visualize the content of the Thalamoss biomedical database as a correlation network. The network is created by calculating pairwise correlations through all pairs of variables, whether they be numerical, ordinal or nominal. We subsequently construct the network over the entire set of variables, clustering variables with similar effects to discover group relationships between the various biomedical characteristics. We use a semi-automatic procedure that makes the selection of all pairs possible and discuss issues of dealing with different types of variables. This is done either by limiting the analysis to numerical and ordinal ones, or by binning their values into intervals of values. Knowledge extracted from the data in this mode can be used to select variables for statistical models, or as markers of medically interesting conditions.
Links
7E13011, research and development project |
|