Genome annotation Sequences ● Sequencing ● Mapping ● Assembly Annotation ● Identifying the locations of genes and all of the coding regions ● Determining their functions Annotation steps ● Identify non-coding part of genome ● Identify genome elements ● Assign functional information Genome elements ● Coding – genes – Transcription and translation ● Non-coding – Structural DNA (not transcribed) – Functional RNA (not translated) – Introns (removed before translation) Non-coding ● Structural – Telomeres, centromeres, repetetives ● Functional – tRNA, rRNA ● Introns – Removed from mRNA Structural annotation ● Open Reading Frames ● Gene structure ● Coding regions ● Regulatory motifs ORFs ● Part of genome between start and stop codons ● 6 frames for one sequence ● 1, 2, 3, -1, -2, -3 Search potential genes ● BLAST+, HMM search, KRAKEN ● Comparing with known set of genes ● Score the similarity Search potential product ● Compare translated product with known database ● DIAMOND Identification ab initio ● Search for patterns ● AI to derive function ● GLIMMER, GeneScan Databases ● ENCODE ● Entrez Gene ● Ensembl ● GENCODE ● Gene Ontology Consortium ● RefSeq ● Uniprot ● Vega ● ... Useful tools ● Prokka ● MicrobeAnnotator ● NCBI Prokaryotic Genome Annotation Pipeline Prokka ● Illumina BaseSpace app ● Vendor lock MicrobeAnnotator NCBI Annotation pipeline NCBI Annotation pipeline