a 2019

ValTrendsDB: bringing Protein Data Bank validation information closer to the user

HORSKÝ, Vladimír, Radka SVOBODOVÁ VAŘEKOVÁ, Veronika BENDOVÁ, Dominik TOUŠEK, Jaroslav KOČA et. al.

Basic information

Original name

ValTrendsDB: bringing Protein Data Bank validation information closer to the user

Authors

HORSKÝ, Vladimír (203 Czech Republic, guarantor, belonging to the institution), Radka SVOBODOVÁ VAŘEKOVÁ (203 Czech Republic, belonging to the institution), Veronika BENDOVÁ (203 Czech Republic, belonging to the institution), Dominik TOUŠEK (203 Czech Republic, belonging to the institution) and Jaroslav KOČA (203 Czech Republic, belonging to the institution)

Edition

ELIXIR - EXCELERATE All Hands meeting 2019, 2019

Other information

Language

English

Type of outcome

Konferenční abstrakt

Field of Study

10608 Biochemistry and molecular biology

Country of publisher

United Kingdom of Great Britain and Northern Ireland

Confidentiality degree

není předmětem státního či obchodního tajemství

RIV identification code

RIV/00216224:14740/19:00110335

Organization unit

Central European Institute of Technology

Keywords in English

PDB; PDBe; Protein Data Bank; three-dimensional macromolecular structure; validation; wwPDB validation pipeline; ligands; ValTrendsDB; X-ray crystallography; NMR spectroscopy; 3DEM; database; trends in quality; visualization; statistical analysis

Tags

Tags

International impact
Změněno: 26/3/2020 16:51, Mgr. Pavla Foltynová, Ph.D.

Abstract

V originále

Biomacromolecular structural data is one of the most interesting and important results of modern life sciences. However, this treasure trove is inevitably plagued by errors and discrepancies. The issue of structure data reliability has stimulated the research community to concentrate more on data quality improvement. This provoked us to ask a number of questions that concern the macro perspective of structure quality: How these validation efforts influence the real quality of structural data? And how is structure quality changing over time and which factors affect it? The micro perspective is, however, equally interesting to the community. We wanted to provide an interactive web-based tool that would enable users to visualize quality and features of one or more structures that represent, e.g., a protein family, a fold, structures of an author, or structures published in a journal. We have carried out an analysis of the state of data quality and validation trends. Our research has been based on data from the Protein Data Bank (PDB) and ligand validation data from our validation database ValidatorDB. All entries in the PDB database have been considered. 1,852 meaningful pairs of factors have been assessed for existence of correlation between them. 88 factors have been considered, including structure metadata factors (e.g., year of release, ligand count, residue count), structure quality factors (e.g., clashscore, Ramachandran outlier ratio), and ligand quality factors (e.g., ratio of ligands with topological and chiral problems, average RSCC and RSR). Results are available in the weekly updated ValTrendsDB database.

Links

LQ1601, research and development project
Name: CEITEC 2020 (Acronym: CEITEC2020)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1503/2018, interní kód MU
Name: Matematické statistické modelování 3 (Acronym: MaStaMo3)
Investor: Masaryk University, Category A