logo_embl_6eck logo_embl_6eck ebi_new_white0805 The EMBL-European Bioinformatics Institute A whistlestop tour ‹#› ebi_new_white0805 2 What is bioinformatics? • •The science of storing, retrieving and analysing large amounts of biological information •An interdisciplinary science, involving biologists, computer scientists and mathematicians •At the heart of modern biology ‹#› ebi_new_white0805 3 Biology is changing •Data explosion and new types of data •High-throughput biology • •Emphasis on systems, not reductionism •Growth of applied biology – molecular medicine, agriculture, food, environmental sciences… At the EBI we are increasingly aware of the explosion in biological data, due to high throughput technologies, new data types and the move towards integrative systems level analysis. For example, in just three weeks in 2008, the 1000 Genomes Project deposited as much data as the EBI had previously gathered in its 13 year history. As technologies advance, we can only expect to receive more and more data. Bioinformatics is at the heart of modern molecular biology and this has applications in medicine, the environment and food – all issues in which science can deliver benefits for society. ‹#› ebi_new_white0805 4 Arrow_09_for_EBI_loop New types of data Genomes DNA & RNA sequence Gene expression Protein sequence Protein families, motifs and domains Protein structure Protein interactions Chemical entities Pathways Systems Literature and ontologies The EBI is probably unique in the world for its range of data resources and tools, spanning everything from DNA and protein sequence to complex pathways and networks. At the EBI, we separate resource development and provision, which we call services, and research although these two are closely related. Both the research areas and services follow the different areas of focus as shown on the slide. Some of the types of data that are now being collected in a high-throughput way, presenting new challenges for how we organise and store this data. ‹#› ebi_new_white0805 5 What is EMBL-EBI? • •Based on the Wellcome Trust Genome Campus near Cambridge, UK • •Part of the European Molecular Biology Laboratory •Non-profit organisation • But first some background, the EBI is based on the Wellcome Trust Genome Campus in Hinxton, which is near Cambridge in UK. We share the campus with the Sanger Institute. The EBI is part of the European Molecular Biology Laboratory and as part of that, we’re a non-profit organisation. ‹#› ebi_new_white0805 6 The five branches of EMBL Mouse biology Monterotondo Structural biology Grenoble Bioinformatics Hinxton Structural biology Hamburg Basic research in molecular biology Administration EMBO Heidelberg •EMBL is a basic research institute funded by public research monies from 20 member states. •1400 staff, over 60 nationalities. We’re the second largest of the five EMBL sites; there is the main lab and administrative centre in Heidelberg; structural biology labs in Hamburg and Grenoble; mouse biology in Monterotondo, near Rome, and bioinformatics in Hinxton. There are around 1,400 staff within EMBL and about 330 of those work at the EBI. ‹#› ebi_new_white0805 7 EMBL-EBI’s mission •To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress •To contribute to the advancement of biology through basic investigator-driven research in bioinformatics •To provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators •To help disseminate cutting-edge technologies to industry EBI shares its central four mission objectives with EMBL, although focussed on bioinformatics rather than molecular biology. The EBI is at the centre of Europe’s efforts to collect, organise and make all types biological data available and we do this by providing services so researchers can access and make sense of the information, by being active in bioinformatics research, by providing training and by working closely with industry. ‹#› ebi_new_white0805 8 The Wellcome Trust Genome Campus EMBL-EBI Sanger Institute Sulston Building Cairns Pavilion (shared) Sanger Labs/informatics Data centre Sanger Research Support Facility Thanks to Don Powell, Wellcome Trust Sanger Institute, for providing this image. We’re based on the Wellcome Trust Genome Campus in Hinxton, south of Cambridge, UK, which we share with the Wellcome Trust Sanger Institute. Good strategic fit as Sanger is a major sequencing centre (most famous for sequencing 1/3 of the human genome) with a strong programme in functional genomics. ‹#› ebi_new_white0805 9 EMBL-EBI funding Sources of funding for the year as of November 2008. The Wellcome Trust also supports us through provision of our buildings. Funding08piechart The EBI is supported by money from the 20 EMBL member states – contributes more than half of our funding, but then we are also supported by grants from the EC, UK research councils, the Wellcome Trust and our database collaborators such as National Institute of Health. ‹#› ebi_new_white0805 10 but… Unfortunately, these funds are not secure in the long term and this slide highlights the insecure future of our data resources. This is a situation we are trying to change by our coordination of the preparatory phase of a EC FP7 funded project called ELIXIR. ‹#› ebi_new_white0805 11 •The preparatory phase of ELIXIR, an EU-funded project to agree upon the future bioinformatics infrastructure for Europe, began on 1 November 2007 •Anyone involved with bioinformatics in Europe is a stakeholder in this process •Outcome - the resulting memorandum of understanding among EU member states will pave the way towards a more stable footing for Europe’s core data resources in the future •Next stakeholder meeting will be held in Copenhagen, Denmark on 19–20 May 2009 See www.elixir-europe.org/ to register logo1 ELIXIR is a campaign to unite the data providers and users of life science information in Europe in order to develop a sustainable infrastructure for biological data. You can get involved by completing the online surveys on the resources you use and providing feedback on whether these meet your needs, signing the letter of support available on the website and also by informing your national funding agencies about the project. The project aims to safeguard the funding for all of Europe’s data resources, not just those at the EBI so it really is important for everyone to get involved. logo_embl_6eck logo_embl_6eck ebi_new_white0805 Services www.ebi.ac.uk/services ‹#› ebi_new_white0805 13 Key facts about services • •European node for globally coordinated data collection and dissemination projects •Core databases produced in collaboration with other world leaders, including NCBI (US), National Institute of Genetics (Japan), Swiss Institute of Bioinformatics, Cold Spring Harbor Laboratory (US) •The world’s most comprehensive collection of molecular databases The EBI is the European centre for the collection and dissemination of biological data; we do this in collaboration with other global centres (primarily in the US and Japan but different for each data type); The EBI is probably unique in the world for its range of data resources, spanning everything from DNA and protein sequence to complex pathways and networks. ‹#› ebi_new_white0805 14 Principles of service provision •Accessibility – all data and tools freely available without restriction •Compatibility – we develop and promote the use of standards in bioinformatics •Comprehensive data sets – agreements with other data providers ensure that our resources contain comprehensive and up-to-date data; agreements with publishers ensure that published data are placed in a public repository at the earliest opportunity •Portability – data and software can be downloaded and installed locally •Quality – Our databases are enhanced through annotation and cross-referencing The EBI’s services function to meet the needs of researchers for data deposition, access, analysis and integration. Our data resources differ in detail but they all uphold the same five principles. ‹#› ebi_new_white0805 15 Arrow_09_for_EBI_loop Databases: molecules to systems Genomes Ensembl Ensembl Genomes EGA Nucleotide sequence EMBL-Bank Microarray & gene expression data ArrayExpress Proteomes UniProt, PRIDE Protein families, motifs and domains InterPro Protein structure MSD Protein interactions IntAct Chemical entities ChEBI Pathways Reactome Systems BioModels Literature and ontologies CiteXplore, GO The slide shows the core resources at the EBI mapped on to the same arrow to show the range of data you can access through the EBI. The EBI is the European centre for the collection and dissemination of biological data; we do this in collaboration with other global centres such as NCBI, the Institute of Genetics in Japan, the Swiss Institute of Bioinformatics and Cold Spring Harbor. ‹#› ebi_new_white0805 16 Database collaborations Fig5_db_collabs_08white_border Many of the EBI’s data resources are members of international consortia, Some, such as the International Nucleotide Sequence Collaboration, exchange data on a regular basis; others, such as the UniProt Consortium and the GO Cosnortium, work together to produce a single resource. ‹#› ebi_new_white0805 17 Arrow_09_for_EBI_loop Standards development – international collaborations Genome annotation www.geneontology.org Microarray and Gene Expression Data (MGED) www.mged.org Protein sequence www.uniprot.org HUPO- Proteomics Standards Initiative (PSI) Psidev.sf.net Protein structure www.wwpdb.org Cheminformatics www.ebi.ac.uk/chebi Pathways www.reactome.org www.biopax.org Systems modelling standards www.sbml.org Metabolomics Standards Initiative (MSI) www.metabolomicssociety.org Genomics Standards Consortium (GSC) gensc.org Nucleotide sequence www.insdc.org ‹#› ebi_new_white0805 18 EBI website and search engine EB-eye Picture 5 Picture 6 Picture 7 Search all main databases in one go Refine your search Advanced search: drill down to specific fields in specific databases We launched a new website and search engine just over a year ago. Our website gets over 2 million hits a day and it’s the gateway for accessing the information you want. The search engine, the EB-eye, allows integrated searching of all our core data resources from a single search box – it’s like a google for all the information held at the EBI. ‹#› ebi_new_white0805 19 Genomes 1: Ensembl Across species Within species Synteny Ensembl_home2 Pick a genome gen_alignment orthology Orthology synteny gene families Genomic alignments snps Gene families SNPs genes Genes chromosomes Chromosomes Ensembl provides a framework for working with the genomes of higher animals (metazoans). It presents, via an interactive website, the human genome together with other genomes that are important for addressing questions in medical research and molecular biology. It uses automated methods for gene prediction and annotation to provide a consistent view of completely sequenced genomes. Users can view the data at many levels, from entire chromosomes down to single nucleotide polymorphisms. As well as accessing a wealth of data for each species, users can also perform cross-species comparisons. ‹#› ebi_new_white0805 20 Genomes 2: Ensembl Genomes MetazoaCMYK BacteriaRGB EnsemblGenomesScreenshot Ensembl-like genome browser for non-vertebrate species current_gene_view exapnded_view_v2 orthologue_view Ensembl Metazoa Ensembl Bacteria Using view options, you can select to view only the current gene or the entire expanded gene tree. Select Orthologue view to see putative orthologues. exapnded_view_middle Across species View options Ensembl Genomes is the combined repository for non-vertebrate genome data, consisting of five resources: Ensembl Bacteria, Ensembl Fungi, Ensembl Metazoa, Ensembl Plants, and Ensembl Protists, bringing the power of the Ensembl system to all branches of life. Ensembl Genomes re-uses and extends software developed for vertebrate genomes in the context of the Ensembl project, and replaces several pre-existing resources (Integr8, Genome Reviews and ASTD) thereby unifying services and simplifying data access for users. ‹#› ebi_new_white0805 21 • •Keyword and sequence searching •Map-based search of environmental samples •Downloads • Nucleotides: EMBL-Bank EMBL-Bank DDBJ GenBank www.insdc.org • •Direct submissions •Patents •Genome-sequencing projects • •Updates •Third-party annotation EMBL-Bank is Europe's primary nucleotide sequence resource. The database is produced in an international collaboration (the International Sequence Database Collaboration, INSDC) with GenBank (USA) and the DNA Database of Japan (DDBJ). Main sources of DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. Users can search the data using either keyword-based searches or using sequence homology tools such as BLAST and FASTA to compare their own sequence with the contents of EMBL-Bank. There’s also a map-based search (EMBLWorld) for exploring sequences derived from environmental genome sequencing projects. The data belong to the submitter and can only be updated by the submitter, but other researchers can submit ‘third party annotations’ to EMBL-Bank if they’re associated with a peer-reviewed publication. ‹#› ebi_new_white0805 22 Transcriptomes: ArrayExpress Search by experiment Search by keyword AE_home_box AE_sample_properties Link to sample properties and experiment design AE_experiment copy View experiment Search by gene across experiments AE_sample_data_file AE_Atlas_summary Browse results summary Search by gene name, species and experimental condition AE_expn_summary View expression under different conditions and profiles AE_expn_plot ArrayExpress is the world’s first and largest MIAME-compliant repository for microarray-based data (mostly gene expression data, but it also takes CGH and chip-chIP data). You can search the repository to view and download experiments; a subset of the data in the repository is hand-picked for the Data Warehouse, which can be searched on the basis of gene names and allows you to view gene expression data for different time points or experimental conditions. ‹#› ebi_new_white0805 23 Protein sequence: UniProt UniProt •Manual curation •Literature-based annotation •Sequence analysis •Automated annotation PRIDE GO InterPro IntAct IntEnz HAMAP RESID Functional info Protein identification data Protein families and domains Molecular interactions Enzymes Microbial protein families Post-translational modifications Transmembrane prediction InterPro classification Signal prediction Other predictions Protein classification UniProt is the gold-standard resource for information on proteins. It comprises three different databases, but I haven’t shown all three here for the sake of simplicity. UniProtKB is the central database of protein sequences with accurate, consistent, and rich sequence and functional annotation. It comprises the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section. The UniProt archive is an archive of all the protein sequences in the public domain, and the UniRef databases are a series of three databases that store sequences of 100%, 90% and 50% identity in the same records to speed up searching without losing information. UniProtKB contains more than 29 million cross references to over 100 other data resources; a few key ones are shown here. ‹#› ebi_new_white0805 24 Protein families, motifs and domains: InterPro InterPro_example_proteins interpro_architectures InterPro_taxon Powerful tool for protein classification, integrating several methods into one resource View architectures of proteins containing a signature Compare methods of protein signature prediction Visualize the taxonomic range for a protein signature InterPro amalgamates several different databases of protein signatures, which use different methods, into a single resource, providing a powerful tool for protein classification. ‹#› ebi_new_white0805 25 pgh2 Proteomics services p53_intact intenz IntAct: molecular interactions INTENZ: enzyme classification ChEBI: small molecules PRIDE_MS PRIDE: protein identifications from proteomics experiments The proteomics services team produces a range of resources for the proteomics research community; they are also heavily involved in the development of standards for proteomics research, and their resources adhere to these standards. IntAct provides an open source database and toolkit for the storage, presentation and analysis of molecular interactions. PRIDE is an open-source public repository of protein identifications, peptide identifications, post-translational modifications and mass spectra. The Integrated relational Enzyme database provides a complete, freely available database storing the most up-to-date version of the Enzyme Nomenclature approved by the NC-IUBMB. Chemical Entities of Biological Interest (ChEBI) aims to provide standardised descriptions of molecular entities that enable other databases at the EMBL-EBI and worldwide to annotate their entries in a consistent fashion. ‹#› ebi_new_white0805 26 Structures: PDBe MSD_fig2_nolabels Ligands Sequence mapping Linking to domain data Assemblies Surface matching Fold matching Active sites Electron density visualization The Macromolecular Structure Database (MSD) is the European resource for the collection, organisation and dissemination of data about biological macromolecular structures. The MSD is one of three partners in the worldwide Protein Data Bank (wwPDB), the consortium entrusted with the collation, maintenance and distribution of the global repository of macromolecular structure data. The MSD team has developed a wide range of resources for the analysis of data in the PDB. ‹#› ebi_new_white0805 27 Pathways: Reactome reactome_rxn View reactions and events in detail Reactome_home Select a pathway Picture 4 Compare events in different species Link to source databases Reactome aims to develop a curated resource of core pathways and reactions in human biology, although other species are also covered. It treats both metabolic and signalling pathways in the same way, providing both a human- and a computer-readable account of all the processes in a pathway. It makes extensive use of crosslinks to other data resources; the data in Reactome can also be exported to a number of different types of modelling software so that they can be incorporated into computer models of living systems. ‹#› ebi_new_white0805 28 Data management •over 2 million web requests per day, over 3 million if Ensembl is included • >230,000 unique hosts served per month, excluding Ensembl •Total disk space has reached 2.5 petabytes in 2008. This is expected to double in 2009. •400 million cross-references in the databases we serve • This provides a feel for the volume of data that the EBI manages. Roughly 30% of our users are from the UK. ‹#› ebi_new_white0805 29 User support screens_bckground_cropped_out • • •2Can bioinformatics user support – www.ebi.ac.uk/2Can •Online help pages – www.ebi.ac.uk/help •E-mail support – www.ebi.ac.uk/support If you need help using any of our databases it’s available; if our online support pages can’t answer your question we offer e-mail support and promise to get back to you within 2 working days. logo_embl_6eck logo_embl_6eck ebi_new_white0805 Research www.ebi.ac.uk/groups As well as providing services, the EBI does research… ‹#› ebi_new_white0805 31 Key facts about research • •The EBI provides a unique environment for bioinformatics research •Eight dedicated research groups aim to understand biology through new approaches to interpreting biological data •Services teams also carry out R&D to enhance existing services and develop new ones •Research programme complements services and the two are mutually supportive ‹#› ebi_new_white0805 32 Arrow_09_for_EBI_loop Research groups Transcriptome analysis Brazma, Huber Text mining Rebholz-Schuhmann Protein annotation Apweiler Structural bioinformatics Thornton Pathways, networks, systems - Le Novère Cheminformatics Steinbeck, Overington Genome analysis Birney, Flicek, Enright, Goldman Regulatory networks Luscombe Obdélníkový popisek: Differentiation and development - Bertone Differentiation and development - Bertone logo_embl_6eck logo_embl_6eck ebi_new_white0805 Training www.ebi.ac.uk/training Our third mission is to provide training… ‹#› ebi_new_white0805 34 Predoc and postdoc training •Annual Open Days for bioinformatics masters’ students •PhD studentships through EMBL International PhD Programme •Short-term placements for visiting PhD students though EU-funded Marie Curie Fellowships EBI PhD group As well as training our own PhD students, we offer 3-6-month placements to students through the Marie-Curie mobility programme ; we also run a one-day intro to the EBI for Masters’ students. Next one is on March 19th 2007; contact training@ebi.ac.uk if interested. ‹#› ebi_new_white0805 35 MBL ics v me g A tripartite user-training programme Training comes to you www.ebi.ac.uk/training/roadshow Training any time, anywhere, at any pace www.ebi.ac.uk/training/elearning Hands-on user training on all our core data resources for lab-based researchers www.ebi.ac.uk/training/handson To train researchers in using the EBI’s resources, we have a tripartite training programme, encompassing training courses in-house at the EBI, the Bioinformatics Roadshow where EBI trainers travel out to host organisations to provide hands-on training on resources requested by the host, and most recently, we are launched an elearning programme for anyone to use so they can undertake some training in their own time. ‹#› ebi_new_white0805 36 WTtrainingphoto Hands-on training for all levels of experience •Interactive training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge •Learn from the EBI’s experts through a combination of talks and practical exercises •Take a tour of all our core data resources, or focus in on specific data types •Full programme at www.ebi.ac.uk/training/handson Wellcome Images laptopmouse NBBJ_EW training room ‹#› ebi_new_white0805 37 http://www.ebi.ac.uk/training/handson/ Genomics, proteomics, transcriptomics, protein structures… The training programme aims to cover all the core EBI resources, both at introductory and an advanced level. For example, the two day dip course shown here, gives an overview of different resources and acts as a way to orientate yourself and become familiar with the range of resources, whereas a more specific course, such as one on transcriptomics will link the use of several different resources together. ‹#› ebi_new_white0805 38 What our trainees say… this course gave me just what I was looking for From sequence to gene “ Textové pole: “ “ the hands on sessions were clear “ “ Transcriptomics A very valuable experience. I'll definitely tell my colleagues about EBI's courses Protein to Proteomes “ Textové pole: “ “ Transcriptomics Great facilities, very good presentations, interesting content From sequence to gene “ “ it’s been a great learning experience the best I have attended Proteomes Proteomes superb course ran by attentive tutors “ “ very nice to hear about tools that biologists usually are not aware of Protein to Proteomes “ “ “ Textové pole: “ “ “ Textové pole: “ “ ‹#› ebi_new_white0805 39 Moodle-based eLearning platform Picture 4 Courses available www.ebi.ac.uk/training/elearning •EBI and EB-eye •Sequence searching •Patent searching •Literature searching •Ensembl •Transcriptomics As an alternative of coming to us, you can now access EBI training from anywhere in the world via our Moodle-based elearning which has just been launched. We currently offer three courses which have been developed with an external consultancy company; and then we are going to use the same format to develop additional courses covering the use of other EBI resources. The elearning is free to use and you just need to register to get a username and password. ‹#› ebi_new_white0805 40 Each course is modular Picture 1 Video tutorial learn by watching and listening A course contains 3–5 modules (~30 min each) Each module contains… Print tutorial Learn by reading Quiz Learn by testing your understanding Reflective task Learn by practicing Please beta-test and provide feedback! Each course is modular and the different parts allow people to use a combination of learning methods – computer based elements such as the video tutorial and quiz, and computer independent parts such as the print version of the tutorial so you can learn on the go, and a guided reflective task so you can learn by doing. logo_embl_6eck logo_embl_6eck ebi_new_white0805 Industry support www.ebi.ac.uk/industry Finally, we support industry through two programmes ‹#› ebi_new_white0805 42 The EBI Industry Programme •Enables industry to adapt quickly to, and maximise the benefit from, innovations in bioinformatics. •Membership benefits include: •Research of benefit to industry •Expert training •Standards development •Technical development •Networking opportunities •Membership is by invitation and members subscribe on an annual basis ‹#› ebi_new_white0805 43 Industry Programme members •AstraZeneca •Bayer Schering Pharma AG •Boehringer Ingelheim Pharma GmbH & Co. KG •Eli Lilly & Company •Galderma •GlaxoSmithKline •F. Hoffman-La Roche •Johnson & Johnson Pharmaceutical Research & Development •Merck KGaA •Nestlé Research Centre •Orion Pharma •Philips Research •Pfizer Ltd •Syngenta Limited •Sanofi-Aventis Recherche & Développement •Unilever logo_embl_6eck logo_embl_6eck ebi_new_white0805 Consolidating Bioinformatics in Europe EU-funded projects coordinated by the EBI We’re getting into politics here; I’d exclude this series of slides for a roadshow audience. ‹#› ebi_new_white0805 45 SLING – Serving life science information in the next generation •Providing unrestricted access to some of the world’s most important biological databases •Bioinformatics roadshows provide hands-on training for users •Funded by the European Commission within its FP7 Programme within the Research Infrastructure Programme •4 partners in 4 countries • FELICS_Partners ‹#› ebi_new_white0805 46 ELIXIR – European life sciences infrastructure for biological information •To build a sustainable European infrastructure for biological information supporting life science research and its translation to: ELIXIR_pic_logo •medicine, •the environment, •the bioindustries, and •society 32 participants in 13 countries ‹#› ebi_new_white0805 47 ENFIN Network of Excellence •Brings together experimentalists and computational biologists to develop the next generation of informatics resources for systems biology •Funded by the European Commission within its FP6 programme under the thematic area ‘Life sciences, genomics and biotechnology for health’ •20 partners in 13 countries •www.enfin.org EMBRACE_Partners ‹#› ebi_new_white0805 48 EMBRACE Network of Excellence •Aims to enable bioinformatics research through better interoperability of servers, databases and services •Funded by the European Commission within its FP6 programme under the thematic area ‘Life sciences, genomics and biotechnology for health’ •17 partners in 11 countries •www.embracegrid.info • EMBRACE_Partners ‹#› ebi_new_white0805 49 BioSapiens Network of Excellence •A large-scale, concerted effort to annotate genome data through a virtual institute for genome annotation and a European School of Bioinformatics •Funded by the European Commission within its FP6 programme under the thematic area ‘Life sciences, genomics and biotechnology for health’ •24 partners in 14 countries •www.biosapiens.info BioSapiens_Partners ‹#› ebi_new_white0805 50 EBI_group_shot_presentation ‹#› ebi_new_white0805 51 EBIexternal72ppi ‹#› ebi_new_white0805 52 ‹#› ebi_new_white0805 53 ‹#› ebi_new_white0805 54