Molecular diagnostics The human genome the total genetic information (DNA content) in human cells  nuclear  mitochondrial - double-stranded DNA is organized into one circular molecul. Exclusively maternal inheritance The human nuclear and mitochondrial genomes 22 000 1 – 2 % coding DNA Pseudogenes Nuclear genom 3 000 Mb cca 22 000 genes Mitochondrial genome 16,6 kb 37 genes Genes Extragenic DNA 22 tRNA genes 13 struktural genes 2 rRNA genes Not coding DNA Gen fragments Introns untranslated regions Unique sequences Repetitive sequences repeating sequences Interspearsed sequences 1% DNA is coding It comprise two genomes: The human genome Superstructure Johann Gregor Mendel published the results of his investigations of the inheritance of "factors" in pea plants DNA - deoxyribonucleic acide contains the complete genetic information that defines the structure and function of an organism. Proteins are formed using the genetic code of the DNA. DNA is a double-stranded helix (1953) James Watson and Francis Crick worked out the three-dimensional structure of DNA, based on work by Rosalind Franklin DNA is a nucleic acid, made of long chains of nucleotides Nucleotide Phosphate group Nitrogenous base Sugar Polynucleotide Sugar-phosphate backbone DNA nucleotide Phosphate group Nitrogenous base (A, G, C, or T) Thymine (T) Sugar (deoxyribose)  These two strands run in opposite directions to each other and are therefore anti- parallel  The DNA double helix is stabilized by hydrogen bonds between the bases attached to the two strands. • Each base pairs with a complementary partner • A pairs with T • G pairs with C  The sugars are joined together by phosphate groups that form phosphodiester bonds  RNA is also a nucleic acid • RNA has a slightly different sugar • RNA has U instead of T Phosphate group Nitrogenous base (A, G, C, or U) Uracil (U) Sugar (ribose) Gene  is the basic unit of heredity in a living organism  segment within a very long strand of DNA with specific instruction for the production of one specific protein  contains both „coding" sequences that determine what the gene does „non-coding" sequences that determine when the gene is active (expressed)  The 'gene coding region' (about 1.5 % of our DNA) codes for a polypeptide (around 25, 000 proteins).  The non-coding region function remains unclear but can be as much as 5-45% of the total genome Gene Structure  Exons: Protein-coding DNA sequences of a gene Introns: Non-coding DNA sequence located in between exons Transcription control regions of a gene, binding sites for transcription factors and RNA polymerase One Gene One Polypeptide  Theory- One gene is transcribed and translated to produce one polypeptide  Some proteins are composed of a number of polypeptides and in this theory each polypeptide has its own gene.  e.g. Hemoglobin is composed of 4 polypeptides (2 of each type) and there is a gene for each type of polypeptide. One Gene One Polypeptide This theory, like so many in biology has exceptions 1. Some genes code for types of RNA which do not produce polypeptides 2. Some genes control the expression of other genes DNA the Genetics Makeup  Genes are inherited and are expressed • genotype (genetic makeup) • phenotype (physical expression)  On the left, is the eye’s phenotypes of green and black eye genes. Genome x Genotype Individuals of the same species have the same genome. Individuals of the same species have a different genotypes.  Dominant Gene: When present this gene will express itself in the phenotype.  Recessive Gene: When present with a dominant gene this trait will not be shown in the phenotype  Allele: different versions of the same gene. There are two alleles for every gene. One from the mother and one from the father. e.g. the gene for eye color, or blood type, etc.  Heterozygous: A combination of genes for the same trait in the organism that are different from each other.  Homozygous: A combination of genes for the same trait in the organism that are the same as each other. Human genome project (HUGO)  Identify all of the genes in human DNA  Determine the sequence of the 3 billion chemical nucleotide bases that make up human DNA  Store this information in data bases  Develop faster, more efficient sequencing technologies  Develop tools for data analysis  Address the ethical, legal, and social issues (ELSI) that ay arise form the project  $3-billion project founded in 1990 by the United States Department of Energy and the U.S. National Institutes of Health. The international consortium comprised also geneticists in the United Kingdom, France, Germany, Japan, China and India. Human genome project (HUGO) A parallel project was conducted outside of government by the Celera Corporation  June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome Human genome project Key findings of Genome Project: 1. There are approx. 22,000 genes in human beings, the same range as in mice and twice that of roundworms. Understanding how these genes express themselves will provide clues to how diseases are caused. 2. All human races are 99.99 % alike, so racial differences are genetically insignificant. 3. Most genetic mutation occurs in the male of the species and as such are agents of change. They are also more likely to be responsible for genetic disorders. 4. Genomics has led to advances in genetic archaeology and has improved our understanding of how we evolved as humans and diverged from apes 25 million years ago. It also tells how our body works, including the mystery behind how the sense of taste works. Transfer of genetic information  Replication  Transcription  Translation  In DNA replication, the strands separate • Enzymes assemble the new strands with each strand serving as a template • semiconservative DNA replication Parental molecule of DNA Both parental strands serve as templates Two identical daughter molecules of DNA Nucleotides A DNA replication begins at specific sites Parental strand Origin of replication Bubble Two daughter DNA molecules Daughter strand  The “words” of the DNA “language” are triplets of bases called codons • The codons in a gene specify the amino acid sequence of a polypeptide DNA molecule Gene 1 Gene 2 Gene 3 DNA strand TRANSCRIPTION RNA Polypeptide TRANSLATION Codon Amino acid Genetic code  set of rules by which a gene is translated into a functional protein.  a correspondence between nucleotides, the basic building blocks of genetic material, and amino acids, the basic building blocks of proteins  Three codons are known as "stop codons" Genetic code There are 64 possible codons and only 20 standard amino acids the code is redundant and multiple codons can specify the same amino acid Virtually all organisms share the same genetic code  Noncoding segments called introns are cut out  Exons are the coding regions  A cap and a tail are added to the ends Eukaryotic RNA is processed before leaving the nucleus DNA RNA transcript with cap and tail mRNA Exon Intron IntronExon Exon Transcription Addition of cap and tail Introns removed Exons spliced together Coding sequence NUCLEUS CYTOPLASM Tail Cap RNA Several types exist, classified by function  mRNA – this is what is usually being referred to when a Bioinformatician says “RNA”. This is used to carry a gene’s message out of the nucleus  tRNA – transfers genetic information from mRNA to an amino acid sequence  rRNA – ribosomal RNA. Part of the ribosome which is involved in translation Translation Include the roles of mRNA, tRNA, codons, anticodons, ribosomes and amino acids.  Translation- the process of assembling polypeptides from information encoded for in the mRNA  Each codon specifies the addition of a particular amino acid to the growing polypeptide The flow of genetic information in the cell is DNARNAprotein A gene is expressed in two steps  Transcription: RNA synthesis  Translation: Protein synthesis The central dogma of molecular biology The central dogma of molecular biology the transfer of sequence information between sequential information-carrying biopolymers - DNA and RNA (both nucleic acids), and protein The general transfers describe the normal flow of biological information: - DNA can be copied to DNA (DNA replication), - DNA information can be copied into mRNA, (transcription), - proteins can be synthesized using the information in mRNA as a template (translation) Mutations Any alteration in a gene from its natural state; may be disease causing or a benign, normal variant Frequency less then 1 % Mutations - positive (variability, selection) - negative (4500 monogenic diseases, ageing) - neutral Each human: 5 – 10 patologic mutations Mutations are changes in the DNA base sequence These are caused by errors in DNA replication or by mutagens Types of mutations NORMAL GENE mRNA BASE SUBSTITUTION BASE DELETION Protein Met Lys Phe Gly Ala Met Lys Phe Ser Ala Met Lys Leu Ala His Missing  Silent mutations do not alter the amino acid sequence of the polypeptide  Missense mutations - an amino acid change does occur • Example: Sickle-cell anemia • If the substituted amino acids have similar chemistry, the mutation is said to be neutral  Nonsense mutations change a normal codon to a termination codon  Frameshift mutations involve the addition or deletion of nucleotides in multiples of one or two • This shifts the reading frame so that a completely different amino acid sequence occurs downstream from the mutation Mutations in the coding sequence of a structural gene Clasification of mutations according to its effect on gene product 1. Product with lower to zero function (loss-of-function) - typical product is enzyme - type of mutation is frequently deletion 2. Product with abnormal function (gain-of-function) - typical product is nonenzymatic protein - frequently in tumours (somat. mutation), rarely in monogenic diseases - deletions do not lead to new function Type 1 frequently recessive, type 2 dominant mutations In some genes- both types of mutations  THE ONE BIG FLY HAD ONE RED EYEgeneration 1 Expanding mutation THE ONE BIG WET FLY HAD ONE RED EYEInsertion THE ONE BIGNonsense THE ONE BIG FLY HAD ONE RED EYENormal Example Type of mutation  THE ONE BIG FLY FLY FLY FLY FLY HAD ONE RED EYE generation 3  THE ONE BIG FLY FLY FLY HAD ONE RED EYE generation 2 THE ONE BIG FLY FLY HAD ONE RED EYEDuplication THE ONE BIG HAD ONE RED EYEDeletion THE ONE QBI GFL YHA DON ERE DEYFrameshift THQ ONE BIG FLY HAD ONE RED EYEMissense Major types of Genetic diseases a.) chromosomal diseases  are the result of the addition or deletion of entire chromosomes or part of chromosomes  most major chromosome disorders are characterised by growth retardation, mental retardation and variety of somatic abnormalities  typical examples of major chromosomal disease is Down syndrom (trisomy 21), Edwards sy(trisomy 18), Patau sy (trisomy13) b.) monogenic diseases (single gene defects)  only a single gene is altered (mutant) → flawed protein → manifestation (development) of a disease  inherited in simple Mendelian fashion  some 6000 distinct disorders are now known (sicle cell anemia, familial hypercholesterolemia, cystic fibrosis, Hemophila A., Duchenne Muscular Dystrophy, Huntington Disease...) c.) polygenic diseases  result from the interaction of multiplex genes, each of which may have a relatively minor effect  environmental factors contribute to the manifestation of these diseases (e.g. nutrition, exercise)  for this group of illnesses, the contribution of the gene can be thought of as a “predisposition”  examples: diabetes mellitus, hypertension, schizophrenia and congenital defects such as cleft lip, cleft palate and most congenital heart diseases  very common in the population Human pedigree Autosomal dominant inheritance process Only one of the two homologous genes is mutated and although another normal gene is present (heterozygosity), the illness still appears (dominant gene effect). If, therefore, one of the parents carries this gene, there is a 50% probability that it will be transmitted to each child. Both men and women can be affected by this. This inheritance pattern accounts for over 60% of monogenic diseases,representing by far the most common inheritance process. Obviously a mutated protein in just half the amount will have a pathological effect on the human organism in such cases. E.g. achondorplasia Autosomal recessive inheritance In this inheritance pattern, both homologous genes must be mutated (homozygosity) in order to produce an illness in the affected person. Individuals, who only receive one version of the mutated gene are called carriers. Both sexes can be affected. If, for example, both parents are carriers, there is a 25% chance that the child will receive both mutated genes and so develop the illness. Many metabolic diseases fall into this category (e.g. cystic fibrosis, phenylketonuria, adrenogenital syndrome, haemochromatosis). X chromosome inheritance (sexlinked inheritance) Women have two X chromosomes. If they have a recessively acting mutated gene on one X chromosome, they are carriers for the corresponding illness. Men have only one X chromosome, since the other sex chromosome is a Y chromosome. If they have the mutated gene on the X chromosome, they will develop the illness as a rule. If a woman is a carrier for the illness inherited by the X chromosome, there is a 50% chance that she will pass on this illness to her son. Her daughters have a 50% chance of becoming a carrier for this illness. Identification of inherited diseases 1.) Phenotype analysis Genes are directly responsible for the production of hormones, enzymes and other proteins. Investigation procedure: Diagnostic measurement of altered or missing proteins using blood or urine analysis. This provides indirect evidence of a mutation of the gene responsible for this. Examples: Phenylketonuria, alpha1-antitrypsin deficiency 2.) Chromosome analysis (cytogenetic investigations) This includes microscope examinations to investigate chromosome alterations in terms of number (duplication or loss of individual chromosomes = numeric chromosome aberration) and in terms of structure (wrong composition, chromosome breaking = structural chromosome aberration). There is no detailed investigation of individual genes in such cases. Indication: Anomalies in children (malformations, retarded development) in the context of prenatal diagnosis, tendency to miscarriages, infertility. 3.) Molecular genetics testing (DNA analysis, genome analysis DNA tests) This provides evidence of a gene mutation responsible for producing the illness. Here it is determined whether the sequence of the DNA bases (nucleotide sequence) has changed within the affected DNA/RNA diagnosis of genetic diseases Not all mutation test use DNA. Testing RNA by RT-PCR has advantages when screening genes with many exons ( NF1 gene, DMD gene...) or seeking splicing mutations. Very important in molecular genetic testing is using a proteinbased functional assay, which may classify the products into two simple groups: functional and nonfunctional – essential question in most diagnostics monogenic and also polygenic diseases sometimes do not occur in both twins, even though the genetic information is the same in identical twins. This is due to several factors: Penetrance: not every pathogenic mutation leads to the manifestation of a disease in the lifetime of a person. Expressivity on the other hand describes quantitative differences in the manifestation of the disease/symptoms. Sometimes, the two concepts are difficult to separate, when, for example, a disease is so weakly manifested that it can no longer be diagnosed. Limitations of DNA analysis The age at which the disease manifests itself can vary strongly. An example of this is Huntington’s chorea. Differences in the onset of diseases are sometimes explained by so-called dynamic mutations. In passing on to the next generation, the disease-inducing mutation can lead to an earlier onset of the illness (anticipation) involving the extension of a mutated sequence of bases. In many cases, genetic information is manifested in a different way when it is inherited from the mother than when it is inherited from the father. Here one speaks of imprinting. Molecular genetics testing (DNA analysis, genome analysis DNA tests) A.) Direct testing – DNA from a patient is tested to see whether or not it carries a given pathogenic mutation B.) Indirect testing (gene tracking) - linked markers are used in family studies to discover whether or not the consultand inherited the disease-carrying chromosome from a parent A.) Direct testing • provides evidence of a gene mutation responsible for producing the illness. It is determined whether the sequence of the DNA bases (nucleotide sequence) has changed • to see wheter the DNA of tested person has a gene normal or mutant Detection of mutation in relevant gene always confirms the clinical diagnosis we must know which gene to examine the relevant „normal“ (wild type) sequence Mutation testing methods can be divided into two groups: 1. Mutation detection methods (scoring) – test the DNA for the presence or absence of one specific mutation. Searching for known mutations 2. Mutation screening methods (scanning) – screen a sample for any deviation from the standard sequence. 1. Mutation detection methods – test a DNA for the presence or absence of one specific mutation searching for known sequence change is possible for: - diseases where all affected people in the population have one particular mutation - most affected people in the population have one of limited number of specific mutations - diagnosis within a family - once mutation is characterized, other family members need to be tested for that particular mutation 2.Mutation screening methods - screen a sample for any deviation from the standard sequence The mutation screening is possible for diseases where a good proportion of patients carry independent mutations. Testing for unknown mutations in laboratory suffer two limitations: methods are quite laborious and expensive for use in diagnostic service, which needs to produce answers quickly detect differences between the patient´s sequence and published normal sequence ( not distinguish between pathogenic and nonpathogenic changes.) Polymerase chain reaction (PCR) To amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence. The method relies on thermal cycling, consisting of cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA Kery Mullis – 1983 discovered the PCR procedure, for which he was awarded the Nobel prize PCR selective amplification of specific target DNA sequence within heterogeneous collection of DNA (total genomic DNA or complex cDNA) requires: -sequence information from the target sequence for construction two oligonucleotide primer sequences ( 15 – 30 nucleotides long ) -denatured genomic DNA -heat stable DNA polymerase -DNA precursors (four deoxynucleotide triphosphates dATP, dCTP, dGTP and dTTP) PCR involves sequential cycles composed of three steps: - Denaturation ( typically at about 93 – 95o C ) - Reannealling (at temperatures usually from about 50 o – 70o C, depending on Tm of the expected duplex - DNA synthesis – typically at about 70 –75o Senzitivity of PCR allows us to use a wide range of samples:  blood samples  monthwashes or buccal scrapes  chorionic villus biopsy samples  amniocentesis speciments  ome or two cells (removed from eight-cell stage embgryos)  hair, semen  archived pathological specimens Guthrie cards (spot of dried blood)  sequence analysis: (synonym: sequencing) Process by which the nucleotide sequence is determined for a segment of DNA Princip metody DGGE DGGE: the sequence-specific denaturation characteristics in a chemical gradient (in the gel) lead to partial separation of strands. This in turn leads to differential mobility and results in a single band per variant ds DNA ss DNA SSCP in gel (Single-strand conformation polymorphism) non mt/non mt Non mt/mutation - + mutation/mutation SSCP: after denaturation, single strands form a sequence-specific structure. This structure leads to differential mobility in a non-denaturing matrix and two bands per variant SSCP in capillary non mt/non mt non mt/mutation mutation/mutation mV time RFLP  Unique sequence primers are used to amplify a mapped DNA sequence from two related individuals, A/A and B/B, and from the heterozygote A/B. In the case of the heterozygote A/B, two different PCR products will be obtained, one which is cleaved three times and one which is cleaved twice. mutation scanning (synonym: mutation screening): A process by which a segment of DNA is screened via one of a variety of methods to identify variant gene region(s). Variant regions are further analyzed (by sequence analysis or mutation analysis) to identify the sequence alteration Some Clinical Implications  Mutation scanning is used when mutations are distributed throughout a gene, when most families have different mutations, and when sequence analysis would be excessively time-consuming due to the size of a given gene.  Mutation scanning may cover the entire gene or select regions.  The sequence alteration identified in a segment of DNA may be a benign variant (polymorphism), a diseasecausing mutation, or an alteration of undetermined significance. Types of sequence alterations that may be detected: - Pathogenic sequence alteration reported in the literature - Sequence alteration predicted to be pathogenic but not reported in the literature - Unknown sequence alteration of unpredictable clinical significance - Sequence alteration predicted to be benign but not reported in the literature - Benign sequence alteration reported in the literature Possibilities if a sequence alteration is not detected Patient does not have a mutation in the tested gene (e.g., a sequence alteration exists in another gene at another locus) Patient has a sequence alteration that cannot be detected by sequence analysis (e.g., a large deletion) Patient has a sequence alteration in a region of the gene (e.g., an intron or regulatory region) not covered by the laboratory's test historically first type of DNA diagnostic method most of the mendelian diseases went through a phase of gene tracking and moved on to direct test once the genes were cloned with some diseases, even though the gene has been cloned, mutations are hard to find mutations are scattered widely over a large gene the existence of homologous pseudogenes the lack of mutational hot spots never confirm clinical diagnosis! B.) Indirect testing linkage analysis: (synonym: indirect DNA analysis) Testing DNA sequence polymorphisms (normal variants) that are near or within a gene of interest to track within a family the inheritance of a disease-causing mutation in a given gene DNA sequence polymorphisms  Single nucleotid polymorphismus (SNP) – substitution of bases. In genome approx. 30 mil. SNP  Minisatellite (VNTR) consist of repetitive, generally GC-rich, variant repeats (> 6bp) that range in length from 10 to over 100bp, these variant repeats are tandemly intermingled  Microsatelite – Short Tandem Repeats (STR) consist of short sequence typically from 2 to 6 nucleotides long tandemly repeated several times (2 – 100x), and characterised by many alleles Use of polymorphic regions  Identification of persons/samples DNA  paternity testing (VNTR, STR)  Undirect diagnostics of monogenic diseases  Searching of new genes  SNP and multifactorial diseases The three steps of linkage analysis  Establish haplotypes: Multiple DNA markers lying on either side of (flanking) or within (intragenic) a generegion of interest are tested to determine the set of markers (haplotypes) of each family member.  Establish phase: The haplotypes are compared between family members whose genetic status is known (e.g., affected, unaffected) in order to establish the haplotype associated with the disease-causing allele.  Determine genetic status: Once the disease-associated haplotype is established, it is possible to determine the genetic status of at-risk family members. Indirect DNA analysis gene CFTR - intron 8 - polymorphic site (CA)n chr.7 from motherfrom father GTATCACACACATTCGG allele A1: ------ GTATCACATTCGG---- the lenght of this allele is 130 bp allele A2: -----GTATCACACACATTCGG--- the lenght of this allele is 134 bp chr.7 chr.7 chr.7 chr.7 mutation in CFTRgene dF508 / non non / ? dF508 / ? non / non A1 / A3 A1 / A2 A1 / A2 A1 / A3 informative A1 / A3 A1 / A1 A1 / A1 A1 / A3 non informative Linkage analysis is often used when direct DNA analysis is not possible because the gene of interest is unknown or a mutation within that gene cannot be detected in a specific family. In most instances, the haplotype itself has no significance; it has meaning only in the context of a family study. The accuracy of linkage analysis is dependent on:  The accuracy of the clinical diagnosis in affected family member(s).  The distance between the disease-causing mutation and the markers. Linkage analysis may yield false positive or false negative results if recombination of markers between maternally and paternally-inherited chromosomes occurs during gamete formation. The risk of recombination is proportional to the distance between the disease-causing mutation and the markers. The risk of recombination is lowest if intragenic markers are used. The informativeness of genetic markers in the patient's family. If the DNA sequence for a given variant differs on the maternallyinherited and paternally-inherited chromosomes, that marker is informative. If the DNA sequence for a given variant does not differ on the two chromosomes, that marker is not informative. Indirect diagnosis – Neurofibromatosis type 1 135 135 181 185 135 131 181 179 131 131 179 179 135 131 181 179 135 131 187 179 131 135 179 179 135 131 181 179 Polymorfic systems GXAlu / i27b IVS38GT /i38 131 131 179 179 Autosomal dominant unknown mutation haplotype in assotiation with unknown mutation A A 6 6 A C 3 5 A B 3 1 A B 3 1 A A 2 3 A C 2 2 A C 3 5 C A 5 6 C A 5 6 B A 1 3 A A 3 2 A D 2 2 A A 2 2 A D 2 2 A D 2 2A C 3 2 F508del unknown mutation Polymorfic systems IVS17BTA alely 1 -6 IVS8BTA alely A - D haplotype in assotiation with unknown mutation A D 3 2 Indirect diagnosis – cystic fibrosis Autosomal recessive [F508]+[=] [=]+[=] [F508]+[=] [=]+[=] [F508]+[G542X] [A1]+[A1] [A1]+[A3] [A2]+[A5] [A1]+[A2] [A3]+[A5] Indirect diagnosis – cystic fibrosis de novo mutation Retinoblastoma RB1 Mutation analysis of Rb1 was done Pathology in Rb1 gene was not detected Polymorfic markers •extragene (DS13S 1307, DS13S 272, DS13S 164) •intragene (Rb1.20B) A1: DS 13S 1307 [141] DS 13S 272 [133] DS13S 164 [179] Rb1.20B [3] A2: DS 13S 1307 [151] DS 13S 272 [133] DS13S 164 [188] Rb1.20B [4] A3: DS 13S 1307 [139] DS 13S 272 [127] DS13S 164 [179] Rb1.20B [1] A4: DS 13S 1307 [139] DS 13S 272 [133] DS13S 164 [186] Rb1.20B [1] A5: DS 13S 1307 [139] DS 13S 272 [131] DS13S 164 [188] Rb1.20B [1] A6: DS 13S 1307 [126] DS 13S 272 [133] DS13S 164 [188] Rb1.20B [2] A7: DS 13S 1307 [126] DS 13S 272 [129] DS13S 164 [188] Rb1.20B [4] A8: DS 13S 1307 [139] DS 13S 272 [127] DS13S 164 [178] Rb1.20B [5] RB1 [A1]+[A2] [A3]+[A4] [A7]+[A8] [A5]+[A6] Retinoblastoma - Indirect diagnostics Haplotype with pathology cannot be established Explanation: • occurance of mutation in another system of cell division and growth regulation • nonhereditary form of retinoblastoma in both cousins [A1]+[A3] [A6]+[A7]