D 2015

A flexible denormalization technique for data analysis above a deeply-structured relational database: biomedical applications

ŠTEFANIČ, Stanislav and Matej LEXA

Basic information

Original name

A flexible denormalization technique for data analysis above a deeply-structured relational database: biomedical applications

Authors

ŠTEFANIČ, Stanislav (703 Slovakia, belonging to the institution) and Matej LEXA (703 Slovakia, guarantor, belonging to the institution)

Edition

Cham, Lecture Notes in Computer Science 9043, Bioinformatics and Biomedical Engineering, Third International Conference, IWBBIO 2015, Granada, Spain, April 15-17 2015, Proceedings, Part I, p. 120-133, 14 pp. 2015

Publisher

Springer International Publishing

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Switzerland

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/15:00082481

Organization unit

Faculty of Informatics

ISBN

978-3-319-16482-3

ISSN

Keywords in English

relational database; PostgreSQL; NoSQL; data flattening; automatic data denormalization

Tags

International impact, Reviewed
Změněno: 3/9/2015 13:37, doc. Ing. Matej Lexa, Ph.D.

Abstract

V originále

Relational databases are sometimes used to store biomedical and patient data in large clinical or international projects. This data is inherently deeply structured, records for individual patients contain varying number of variables. When ad-hoc access to data subsets is needed, standard database access tools do not allow for rapid command prototyping and variable selection to create flat data tables. In the context of Thalamoss, an international research project on beta-thalassemia, we developed and experimented with an interactive variable selection method addressing these needs. Our newly-developed Python library sqlAutoDenorm.py automatically generates SQL commands to denormalize a subset of database tables and their relevant records, effectively generating a flat table from arbitrarily structured data. The denormalization process can be controlled by a small number of user-tunable parameters. Python and R/Bioconductor are used for any subsequent data processing steps, including visualization, and Weka is used for machine-learning above the generated data.

Links

7E13011, research and development project
Name: THALAssaemia MOdular Stratification System for personalized therapy of beta-thalassemia (Acronym: THALAMOSS)
Investor: Ministry of Education, Youth and Sports of the CR