x 2024

3DZD: Protein structural embeddings of ESM Atlas

NOVOTNÁ, Lucie; Terézia SLANINÁKOVÁ; David PROCHÁZKA; Lukáš HEJTMÁNEK; Adrián ROŠINEC et. al.

Basic information

Original name

3DZD: Protein structural embeddings of ESM Atlas

Edition

2024

Other information

Language

English

Type of outcome

Research and development projects

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

References:

Organization unit

Faculty of Informatics

Keywords in English

protein structure; similarity search; AlphaFold; embeddings; AlphaFind

Tags

International impact
Changed: 31/3/2025 09:41, Mgr. Eva Špillingová

Abstract

In the original language

The dataset contains proteins from AlphaFold DB v4 (https://alphafold.ebi.ac.uk/) encoded into one-dimensional vectors (embeddings). The embeddings encode proteins by their tertiary structure. Additional information is available at our GitHub (https://github.com/Coda-Research-Group/ProteinEmbeddingBenchmark).

Links

GF23-07040K, research and development project
Name: Naučené indexy pro podobností hledání
Investor: Czech Science Foundation, Learned Indexing for Similarity Searching, Lead Agency
LM2023055, research and development project
Name: Česká národní infrastruktura pro biologická data
Investor: Ministry of Education, Youth and Sports of the CR, ELIXIR-CZ: Czech National Infrastructure for Biological Data
MUNI/A/1590/2023, interní kód MU
Name: Využití technik umělé inteligence pro zpracování dat, komplexní analýzy a vizualizaci rozsáhlých dat
Investor: Masaryk University, Using artificial intelligence techniques for data processing, complex analysis and visualization of large-scale data
752/2024, interní kód MU
Name: Nástroj na automatickou anotaci a prohledávání velkých sad proteinů na základě podobnosti jejich struktur
Investor: CESNET
90254, large research infrastructures
Name: e-INFRA CZ II