Semi-automatic mining of correlated data from a complex
database: Correlation network visualization

D 2016

Semi-automatic mining of correlated data from a complex database: Correlation network visualization

LEXA, Matej and Radovan LAPÁR

Basic information

Original name

Semi-automatic mining of correlated data from a complex database: Correlation network visualization

Authors

LEXA, Matej (703 Slovakia, guarantor, belonging to the institution) and Radovan LAPÁR (703 Slovakia, belonging to the institution)

Edition

New York, Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on, p. 1-2, 2 pp. 2016

Publisher

IEEE

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

URL URL

RIV identification code

RIV/00216224:14330/16:00092617

Organization unit

Faculty of Informatics

ISBN

978-1-5090-4199-2

ISSN

DOI

http://dx.doi.org/10.1109/ICCABS.2016.7802783

UT WoS

000392416700019

Keywords in English

data mining; biomedical database; denormalization; visualization; correlation network

Abstract

V originále

In previous work we have addressed the issue of frequent ad-hoc queries in deeply-structured databases. We wrote a library of functions AutodenormLib.py for issuing proper JOIN commands to denormalize an arbitrary subset of stored data for downstream processing. This may include statistical analysis, visualization or machine learning. Here, we visualize the content of the Thalamoss biomedical database as a correlation network. The network is created by calculating pairwise correlations through all pairs of variables, whether they be numerical, ordinal or nominal. We subsequently construct the network over the entire set of variables, clustering variables with similar effects to discover group relationships between the various biomedical characteristics. We use a semi-automatic procedure that makes the selection of all pairs possible and discuss issues of dealing with different types of variables. This is done either by limiting the analysis to numerical and ordinal ones, or by binning their values into intervals of values. Knowledge extracted from the data in this mode can be used to select variables for statistical models, or as markers of medically interesting conditions.

Links

7E13011, research and development project

Name: THALAssaemia MOdular Stratification System for personalized therapy of beta-thalassemia (Acronym: THALAMOSS)

Investor: Ministry of Education, Youth and Sports of the CR

Citovat

LEXA, Matej and Radovan LAPÁR. Semi-automatic mining of correlated data from a complex database: Correlation network visualization. Online. In Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on. New York: IEEE, 2016, p. 1-2. ISBN 978-1-5090-4199-2. Available from: https://dx.doi.org/10.1109/ICCABS.2016.7802783.

@inproceedings{1366364,
   author = {Lexa, Matej and Lapár, Radovan},
   address = {New York},
   booktitle = {Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on},
   doi = {http://dx.doi.org/10.1109/ICCABS.2016.7802783},
   keywords = {data mining; biomedical database; denormalization; visualization; correlation network},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {New York},
   isbn = {978-1-5090-4199-2},
   pages = {1-2},
   publisher = {IEEE},
   title = {Semi-automatic mining of correlated data from a complex database: Correlation network visualization},
   url = {https://www.researchgate.net/publication/309157638_Semi-automatic_mining_of_correlated_data_from_a_complex_database_Correlation_network_visualization},
   year = {2016}
}

TY  - JOUR
ID  - 1366364
AU  - Lexa, Matej - Lapár, Radovan
PY  - 2016
TI  - Semi-automatic mining of correlated data from a complex database: Correlation network visualization
PB  - IEEE
CY  - New York
SN  - 9781509041992
KW  - data mining
KW  - biomedical database
KW  - denormalization
KW  - visualization
KW  - correlation network
UR  - https://www.researchgate.net/publication/309157638_Semi-automatic_mining_of_correlated_data_from_a_complex_database_Correlation_network_visualization
L2  - http://ieeexplore.ieee.org/document/7802783/
N2  - In previous work we have addressed the issue of frequent ad-hoc queries in deeply-structured databases. We wrote a library of functions AutodenormLib.py for issuing proper JOIN commands to denormalize an arbitrary subset of stored data for downstream processing. This may include statistical analysis, visualization or machine learning. Here, we visualize the content of the Thalamoss biomedical database as a correlation network. The network is created by calculating pairwise correlations through all pairs of variables, whether they be numerical, ordinal or nominal. We subsequently construct the network over the entire set of variables, clustering variables with similar effects to discover group relationships between the various biomedical characteristics. We use a semi-automatic procedure that makes the selection of all pairs possible and discuss issues of dealing with different types of variables. This is done either by limiting the analysis to numerical and ordinal ones, or by binning their values into intervals of values. Knowledge extracted from the data in this mode can be used to select variables for statistical models, or as markers of medically interesting conditions.
ER  -

LEXA, Matej and Radovan LAPÁR. Semi-automatic mining of correlated data from a complex database: Correlation network visualization. Online. In \textit{Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on}. New York: IEEE, 2016, p.~1-2. ISBN~978-1-5090-4199-2. Available from: https://dx.doi.org/10.1109/ICCABS.2016.7802783.

Detailed Information on Publication Record