UNI
PHAR
Molecular Biology -2
The structure and function of biopolymers during the transitions of
genetic information
Department of Molecular Biology and Pharmaceutical Biotechnology (45-308)
Mgr. Marie Brázdová, Ph.D. (FAF, BFU AVČR) Mgr. Denis Šubert (BFU AVČR) Mgr. Daniel Renčiuk (BFU AVČR) PharmDr. Jakub Treml, Ph.D. (FAF)
brazdovam(a>phariTi.muni.cz,
1
Genetic information is coded by DNA
The experiment combining two strains of Streptococcus pneumoniae bacteria.
S strain
I
smooth pathogenic bacterium causes pneumonia
RANDOM MUTATION
R strain
o 0o
o u
rough nonpathogenic mutant bacterium
live R strain cells grown in presence of either heat-killed S strain cells or cell-free extract of S strain cells
TRANSFORMATION
♦
^^fe A  Some R strain cells are X  transformed to S strain cells, whose daughters S strain   are pathogenic and cause pneumonia
CONCLUSION: Molecules that can carry heritable information are present in S strain cells.
(A)
S strain cells
r
fractionation of cell-free extract into classes of purified molecules
RNA
protein DNA
1 1
lipid carbohydrate
1 1			1      1 1		
molecules tested for transformation of R strain cells					
					
oo
R
strain
o0
R
strain
S
strain
°o
R
strain
o°
R
strain
CONCLUSION: The molecule that carries the heritable information is DNA.
(B)
Molecular Biology of the Cell (© Garland Science 2008)
MB 2 2023
0. Avery C. MacLeod M.McCarty
STUDIÉS ON 1HE CHEMICAL XrVTlTlE OF TTTK SUBSTANCE IHDUC1KG TRAN5 fOKJlAriOM OF PWĚUMOCŮOCAL TYPES
FyiinriKW oř ^UWHPQTATGQfi pv a Dcsco^rijMMticxfcC ACHt FultujqJ iMUttfcD Tli-Ctf PbíEMOCOCCUB TTFt HI
Bi ŮBWALD T. AVGlLV, H.LJ COLItf M. HfrJLEOO, H.P.. **a U.WTf.YK U^V.T TT." M D.
■ JVfin litt ti-řipiiil z/ T.w S&ťStfclltr JmnVyfe for Hí-fcal JkremrAJ
PtAI» I
^"Rfia-ivůd í-k ijluliliráciiiri, Fiírembec I, LVU)
BirtJflptfj fiivt lnoj atbenpttd Hy cbeoiLÉil tw.t.i tn indLjCE in tpjjtur ůfgaalůmů pwdfcerabl* iiui spůtlác dímuctt fťokl tfwtctíwj omjIJ tt tmu-in kcící ■|_L,.r-- .!:-in -L:ri.   AíDang EZULrodr^nuou tt* crarf
AEtrapLt -ol inJbcriubk atiiÍ tfKcific Jl-třitLWii in cr:ll tAructuit uai - ■-■--- Jíl luil be nfMTimMluLJy- iílJufltJ áml iPř Hfttúdudbk UůdíL" bell i-if-finL-rj jin^ ii.íqii ulily canlralled ■rri-cirtJLincES ir, ihe truufarmitiíH] nÉ apeciac
iypn <4 Jatvivxňiifub, Thi* pli»ii«rawn w** flnt dwíflb*J by Gnftfth (1)
vrbo HJCCKLlrd n imruíocituji^ mi i■ r.iII ■.-•A nDa-nci}»uiirni (K) vilUl Acivpl frera flc* ipefiůc tjrpc inlů Eij^y tflca;pwiLp.Jcd -Hnit virufcnl (5() urllůufa hjelůtůfc#ouůflpMlůt <yp*. A typlaLLiriiliůewillaiifiůctju llIllůllAW (je IťdmiqLiH nrijinnlly mjx:l serv* ta ir.rlĎcalc th.t wic> vařičiy oJ Irii^:-(ycUůHíjyiiH (tai 3J* po&iifck ^iťtbrio. Llic ILmJ4ů a4 Ihia bacut-i^Lůpcctta.
Nuclein
Nuclein - acidic substance rich in nitrogen and
phosphorus
J. F. Miescher -1864
Isolated from blood of wounded patients and cleaved by pepsin (proteolytic enzyme)
Courtwy of Herrn Courvofeler Pornalt-Sammlung, University of Base) Noncommercial, educational UK only
MB 2 2023
Roles of genetic material
Genotype - storage of genetic information and its transition to the offspring
Phenotype - expression of genetic information to particular properties of an individual
Evolutionary role - adaptation of an organism/species to the environment through the changes in genetic information
MB 2 2023
4
Terminology
Gene - several "defi nit ions" depending on the point of view:
classic genetics (Mendel) - elementary unit of hereditary genetic information
molecular genetics - part of DNA coding for RNA (and as a consequence coding for some property of the individual)
structural genes coding for mRNA/protein (+ regulatory regions)
genes coding for functional RNA (miRNA,...)
strict - structural gene - part of DNA that codes for protein sequence
Allele - particular variant of the respective gene
Genome - complete DNA of organism (molecular) x complete genetic information of organism x sum of genes (classic)
Genotype - the combination of particular alleles of all genes in individual
Phenotype - the sum of actual individual properties (as a result of expression of particular genotype in the respective environment)
Genophore - the carrier of genetic information, usually a molecule of DNA (often used for bacteria)
MB 2 2023 5
Information biopolymers
Deoxyribonucleic acid (DNA)
• linear heteropolymer composed from 2-deoxyribonucleotides connected by phosphodiester bonds
• usually as a stable and resistant double helix
• serves as a storage of genetic information, as a template for its reproduction (replication) and as a template for the expression of genetic information to the phenotype (transcription)
Ribonucleic acid (RNA)
• linear heteropolymer composed from ribonucleotides connected by phosphodiester bonds
• usually as a single-stranded structure of variable length, structure and reactivity
• many functions depending on type of RNA
Protein
• linear heteropolymer composed from 20 (21) amino acids connected by peptide bonds
• highly variable structures, properties and functions
MB 2 2023 6
Central dogma of molecular biology
Nucleic acids
DNA
n
Replication
Transcription!   Reverse transcription (retroviruses, ...)
RN/p
Replication
Translation
1
Protein
MB 2 2023
Gene
P0U5F1 POU class 5 homeobox 1 [ Homo sapiens (human) ]
Gere ID: 5460, updated or 17-Sep-201E * Summary
* 7
Official Symbol   POU5F1 pro rfedbyHGHC Official Full Name  POU class 5 homeobox 1 : ; =s= v hghc Primary source HGNC:HGNC:9221
See related  Ensembl:ENSG00000204531 HPRD:012S2: MIM:1G4177: Veoa:QTTHIJMGQ00QQ031206 Gene type   protein coding RefSeq status REVIEWED Organism  Homo sapiens Lineage   Eu^aivota l,-1etaica Chcrdata Craniata Vertebrata Euteleostami Mammalia: Eutheria: Also known as  OCT3; OCT4; OTF3; OTF4; QTF-3; Oct-3; Oct-4
Summary   This gene encodes a transcription factor containing a POU homeo;:c "ain that clays a <ey role in embryonic development and ste " cell clurkctency Aberrant egression sn:\ a;ed ■.villi tunorigenesis. This gene can participate in a translocation with the Ewing's sarcoma gene on clirc~oso~e 21 which also leads to tumor formation. Alien alternative AUG and non-AUG translation initiation codons. results in multiple isoforms. One of the AUG start codons is colymorphic in human populations. Relate;: ~seu:l chromosomes 1,3,8,10, and 12. [provided by RefSeq, Oct 2013] Orthologs  "cuse ?U
nates: Haplorrhin
ofthis gene native splici
https://www.ncbi.nlm.nih.gov/gene/5460
Go to nucleotide:    Graphics    FASTA GenBank
NC_000006.12: 31M..31M (S.3Kbp] C " | Find: 131,171,500 131,171 K 131,170,503        131,170 K 131,185,500        131,189 K 131,188,500        131,188 K 131,187,500
X Tools T   I   ÍI Trads -
131,188,500        131,188 K
31,185,500 131,185 K 131,184,500        131,184 K ISiJi
Senes, NCEI Heme sapiens Annctaticn Release 108, 2016-06-07
UU L±U
POU5F1
■(H_99a79i.s
M1.802701.S NP_B82692.2: POU A
N11.881173531.2 NM_203289.E W1_001285987.1 WL001285986.1
5' UTR
4
Introns = noncoding regions
Regulatory regions Promoter + enhancers
Exons = coding regions
5837.1: t 2: tran -)P_B0107|0979.1: t
NP_0S13E 5: ■IP J090Í 8.2
HPJ81167002.1: PBU.. NP.976834.4: PBU d... NP.881272916.1: PBU.. NP_B01272915.1: PBU..
8
Genes
Most eukaryotic genes contain introns, that are transcribed into primary RNA transcript and introns are consecutively removed by splicing process on spliceosome to form final mRNA.
Intron = noncoding region
Exon = coding region
Prokaryotic genes do not contain introns and they are directly transcribed into mRNA that serves as a template for translation into protein.
Exon = coding region
MB 2 2023
9
Nucleic acids
NUCLEOTIDE,-'
V NUCLEOSIDE
Phosphate
5\carbon sugaf
NH.
j Nitrogenous base
10
5-carbon sugar - pentose
(3-D-2-deoxyribose
MB 2 2023
(3-D-ribose
11
Base
N
2^.
6 7
8
3 \
H
Purine
Adenine
Guanine
Pyrimidine
H
o
N
Chi
H
Cytosine Thymine
MB 2 2023
O
H.
N
O^N
H
Uracil
12
Nucleic acid nomenclature
	Base	Nucleoside	Nucleotide NTP
G	Guanine	Guanosine	Guanosine triphosphate
A	Adenine	Adenosine	Adenosine triphosphate
T	Thymine	Thymidine	Tymidine triphosphate
C	Cytosine	Cytidine	Cytidine triphosphate
U	Uracil	Uridine	Uridine triphosphate
MB 2 2023
13
Modified bases
tRNA
o
x
HN^NH
O
O
pseudouracil
HN
I
R
dihydrouracil
Oxidative damage
NH
O
8-oxo adenine
8-oxo guanine
Metabolism
Epigenetics
<
o
NH,
N
CH,
NH
N
I
R
"N
2 yOH CH,
xanthine
hypoxanthine
inosine     MB_2_2025-methyl cytosine       5-hydroxy methyl cytosinel4
Conformation of N-glycosidic bond
OH
NH,
■ N
oh
OH
OP
ISO"
0°
120"
ISO"
Figure 2-8
The glycosidic torsion angle ^ is defined by 04- C1-N9-C4 for purines and 04'-Cl'-Nl-C2 for pynmidines When ^ = (T the 04'-Cl' is eclipsed by the N9-C4 bond in purines and by the N L-C2 bond in pyrimidines. The syn conformations correspond to 0° ± 90"; anti conformations correspond to 180" ± 90". In nucleotides steric hindrance limits the conformations actually found to a much narrower range of angles thai depend on sugar pucker and base. The syn MB cpnf^j^tions are usually found with x — 45° ± 45*; anti ^olllbrmations are usually found with X — -135^ ± 45°.
Formation of sugar-phosphate backbone
5' end
(phosphate)f ho—p—o—\ N
o
N NH2 NH2
Phosphodiester bond
0 o
II II HO-P-O-P-OH
1 l OH OH
000 II
HO—P—O—P—O—P—O—i
1 I I
OH       OH OH
REPLICATION
DNA-de pen dent DNA polymerase TRANSCRIPTION
DNA-dependent RNA polymerase
REVERSE TRANSCRIPTION RNA-dependent DNA polymerase
NHc
,0.
OH
MB 2 2023
16
Base reactivity - hydrogen bonds
Hydrogen bond - weak electrostatic interaction of two polar groups - one covalently binds
hydrogen (DONOR - usually -N-H a -O-H); the second (ACCEPTOR) is usually N or 0
Length: ~ 2.8 A (2 - 3,4 A) Energy: < 1 kcal/mol   both depend on particular atoms and on
thr-environment
Adenine
Guanine
MB 2 2023
Tymine
17
Watson-Crick base pairing
2 hydrogen bonds hydrogen bonds
Chargaff rules: = XT = ZC Pu = I Py
MB 2 2023
18
DNA double helix
MB 2 2023
DNA double helix
• two molecules (strands) of DNA
• the helix is right-handed
• the strands are antiparallel - their 5'-3' direction is opposite in the context of double helix direction
• similar content of purines and pyrimidines; content of A = T, G = C (Chargaff rules)
• result - the strands are complementary - i.e. according to the Watson-Crick base pairing rules we can predict the sequence of one strand according to the sequence of the other
• on average the double helix contains 10,5 base pairs per turn of the helix, which is about 3,4 nm in length
MB 2 2023
20
DNA structure -Watson and Crick model
no. 4336   April 25, 195:>
N A T U R K
7S7
equipment, and to |)r. (i. K. U. Ifrwom and lim oftj.lnni nud Iři of K.K.8. Discovery 11 for llmir pnrt ut making the ol we r vat ions.
•V. it c.      B. ť■•nu J. II . Aii.l Jrv<m*. W,   f'A./, U«y , M. lit t,       <l7lt» ~
• Vim AM. «'. > ■ « im li II ulf ľi[<l» In ľlijf. Ort U «b. ĽrlIU 11
(íl <!««>>.
• tJnwin. *•   «   .Irti'a.Jíaŕ .Irfaw i iMimM
M. Wilkins
E. Chargaff
MOLECULAR STRUCTURE OF NUCLEIC ACIDS
A Structure for Deoxyribose Nucleic Acid
WE wish to suggest a structure for lho mI) nf deoxyril»o*e nucleic acid <D.X.A.|. Tin* §t,rueture hat novel Oil Mm* which art- of considerable biological interest.
itrurturn for nucleic ueul luaa oln-ody been pro],■- ■! by Pauling ami Corey1. They kindly made viipt available tn iu in advance of Tlwir model consist* of three interns:, with the phnsplia'c* near tho fibre e Iumm on the outside. In our opinion, re ia unsatisfactory for two reasons: eve tluit tlm material which give* the mv* ii tlm naif, mil tlm froo acid. Without t'flrogen atoms it is not clear whnt force* tlie Mriniui" tugetlier, especially aM I lie I ..I -1 phoaphato* near (he »xu will it her. |2] Some of tlm van iter Waal* ipuar to 1)4 loo ■mall. hr>i' clwiin structure lias alwi been aug 'raaor  mi tlm press).   Tn his model HM
lire mi the outside anil tint Ikvho* on tlm si togotlmr by hydrogen lunula. This i describ I m rather illHleflned, and for thin rva*oii we shall not comment on it.
Wo wish 111 put f«irwun 1 a radically dinVren* structure for tlm nail of uVoxyrtlwi*' nucleic acid. Thiw structure luw two helical chain* each coiled ronn«l the taunt- i»Xi* (see-diagram). We have made the usual chemical assiimption*, namely, that each rlmin consist* of pliospliato chewier _■'!]■■ joining 3-n-deoxy-nbofuraniMn reajditos with 3'..V huk»K>*. Tim two chaiiiH (hut not writ bases) are related by a dyad |mrpcndicular to the fibre axis. !'■ 'I. ' '.-i na follow right-liaitditl lichees, bill owing t<> (he dyad tlm sequence* of the atoms in tlm two chain* mn in »{i]KMito directions. Each (limn |iwi*ely n<Ntiiihle* Hur-bergs' model No. I ; I lint is. he !...-.■- are on tlm insiilr of ihi" helix and tlm phosphate* on tlm oulsidc. I'm t-oiifiguration of the mtgar ami (ho atoim near it in close to Kurlmrg 'standard configuration', the sugar being roughly porpondi-cular to the attached tw.se. There
I
1
. íJtaS 'iit»
l<lnfi[i|'4ip—aiis**
Rdi the pair* nf
UlfL-Wll-...      II.'-   V. it!. ..1
nutriu i-,, -ii.t. hv'.
im a reaidim on each eliam every 3-4 a. in the :-direc-tmn. We liave assumed an ancle of 36" between adjacent j ■ - ■ ■ i *■ - in tlm «aum cliam. ao that the Finiciure refloats after 10 modue* on eaoh chain, that is, ,i!'ľi 34 A. The distance of a phosphorus atom from I 11 111 \     \   ■ I ■ ■ i liT.iiiluitfj
tlm nutatde, eariona huvi
The it niet lire in Utl op.
14 rot I m r high. At low expect tlm baaoa to lilt become more eom|wWT.
The novel fea'ure of m which tlie Iwo elian purine and pyrimidino b n perpondicuUr to tim together in jtaira, a aingj hyilrogen-lxindefl |o a chain, ao that ilm m„ :-co-onlniate«. One i>f t the other a pynmidine hydrogen Inrnda nre mm I to psii. : i, positi' p>*rimidme A.
If it ta (Mrtiirrmd that •trucIure in the most itlLtt*. ia. with the kcto figuration*) it ia found
bMM can bond logetlm. Franklin's X-ray photograph shows
I purine)  with thyrnine t
tparjnt) with c\tóaine | DNA's "B'-form (1952)
In other wortu, if an i.*,..UM>- mM1 ,w „m..,m u. a jiftir, on eitlmr eliam. limn on the** aatuunptionn Ihe OllMf member mtut 1m ll^-mino; itirailarly for L'n.iriii. ■ and cytokine. The teigiinnce of baaeN on a nf. K m. I - not ap|Mtar to lm roatrioted in any way. However, if only apooifte pain, of base* nan ú-fonnetl. it follow* thai if tlm aoqm-iioc of batten on one cliain ia given, then tlm nrqiieiiee oil th" otlmr chain w automatieally detenninml.
It haa been found exp-rnnontally*'- tliat tlm ratio of tlie omotinta of lulenum tn thymine, anjl the ratio • if guanine turytotme, are always. \-er>- clove to unity for deoxynbuao nucleic acid.
It ia probably nh;...,.':,!. to build this atntcture with a nlxMe sugiu- in jilaee of tlm deoxyrilxne, ae tlie extra oxygen Atom wtmtd make |qq cIimo a van dor Wiuila contact.
Tlie provioualy pubtialied X-ray <lata** on deoxy-nli..-- nuoleio acid am inviimViont for a rigorous lest of our structure. Ho far aa we can tell, it ia roughly 1:;' i' I' - with the expenmental data, but it iniui hn regarded aa unproved until it has been cliecknl Against moro exact results. Scene of these are given m the following commuuications. Wo wore not aware of the details of tlm results presented then- when we iloviacd our structure, which rests mainly though not entirety on published experimental data and stereochemical arguments.
It lias not escaped our notion that tlm specific {■airing we liave jiostulatixl immediately suggiwls m possible copying raecluuiisai lor the genetic material.
Full details of tlie stnicturu, including the oon-ditiona assume.) in building it, together with a set
of ■.....       I-'. i      for t lie atoms,  will  be publiahed
elsowlioro.
We are much indebted |o Dr. Jem* Donohuo for consliuit advice and criticiam, espoeially on inter-atomic dpftanect, Wo have also btmii stimulated by a know! L- of the goiieral imture of the unpnbhabetl experimental results and ideas of Dr. M. U. ]>'. Wilkin-  1 •    ľ.  K. V- i-.l.t. i anil tlmir    .workers at
MB 2 2023
21
DNA structure - Pauling model
Linus Pauling
A PROPOSED STRUCTURE FOR THE NUCLEIC ACIDS
By Linus Pauling and Rohrrt B. Corby
Gates and Ckhllin Laboratories of CHKmsrRV,* California Institute of Tbciinology
Comtnuiik-ateu' l)ecemlM.>r 31, 1962
The nucleic acids, as constituents of living organisms, are comparable in importance to the proteins. There is evidence that they are involved in the processes of cell division and growth, thai they participate in the transmission of hereditary characters, and that they are ini|x>rtant constituents of viruses. An understanding of the molecular structure of the nucleic acids should be of value in the effort to understand the fundamental phc-
92 CHEMISTRY: PAUUXG AMD COREY        Proo N. A. S.
which are involved in ester linkages. This distortion of the phosphate group from the regular tetrahedral configuration is not supported by direct experimental evidence; unfortunately no precise structure determinations have been made of any phosphate di-esters. The distortion, which corresponds to a larger amount of double bond character for the inner oxygen atoms than for the oxygen atoms involved in the ester linkages, is a reason-
\
FIGURE fi
Plan of the nucleic Kid structure, showing several nucleotide residues.
able one, and the assumed distances are those indicated by the observed values for somewhat similar substances, es|>ecially the ring compound SsOj, in which each sulfur atom is surrounded by a tetrahedron of four oxygen atoms, two of which are shared with adjacent tetrahedra, and two unshared. The 0—0 distances within the phosphate tetrahedron are 2.:i2 A (between the two inner oxygen atoms), 2.4») A, 2.55 A, and 2.60 A. The
MB 2 2023
22
Various types of double helix
B-DNA
DNA in water/salt soulutions
A-DNA Z-DNA
• DNA in crowding solutions • CpG sequences in crowding conditions
• RNA
Left handed Zig-zag step
MB 2 2023
23
Reversed Watson-Crick pairing
Base protonation
N
H
H ^
Adeiioiine
-M
H      I 1+Vh
pKl=l.<5
H
H
H
CT N
I
N
*|An I
H
Guano&inc
N
V H
N
I
H
N
N
N
H    pKi = 9.2
H
Uridine
0
N
1
H
H
H
H
o n
t-
H
Thymidine
0
N
1
R
CH,
H
•base protonation might alter the base reactivity
• free bases have pK far from physiological
•pK of bases in DNA might be closer to pH 7.4
• cytosine in Cn sequences has pK~7 - cytosine i-motif
H
1 + 1 ^
R
Cylidiric
N H I
RMB 2 2023
25
DNA double helix x ions / water
• phosphates in DNA backbone are negatively charged - repulsion
• this is compensated by interaction with ions (Na+, K+, Mg2+,...) or water (H-H bonds)
26
Stability of DNA double helix
T
7"m = melting temperature
hydrogen bonds AT = 2 x GC = 3 Tm increases with GC and length
base stacking various Tm increases with length and ions
repulsion of backbone phosphates   Mg2+>Na+ Tm increases with ions
MB 2 2023
27
Base-pair parameters in double helix
Shear(Sx)
Buckle (k)
Shift (Ox)
Tilt(r)
Stretch (Sy)
Stagger (Sz)
Propeller (it)
Opening [a)
Lu et al., 2003, NAR
Rise (Dz)
MB 2 2023
Twist (>•)
+
Coordinate frame
3'
Types of nucleic acids
• linear (human chromosome) x circular (bacterial genome)
• single-stranded (most RNAs) x double-stranded (human DNA)
MB 2 2023
29
Superhelicity
Overwound topological domains form compact large scale chromatin structures
Supercoiling influences
higher levels of chromatin organisation
Underwound topological domains have a decompacted large-scale structure
and
'ij»t»'"«»■•£•:■•;•♦ :v;/^^.Vi .' ;•[*>•:«•••':;V;-*:.:v> .-•■•"•../•.v: -\
Superhelicity happen mostly as a result of transition of polymerase complex and unwinding of DNA (helicase,...) during replication and transcription.
Topoisomerases
• Enzymes that relax the superhelicity
• Topo I - works on 1 DNA strand
• Topo II - works on 2-strand DNA
MB 2 2023
30
Reactivity of bases with amino acids
Double-stranded NA: Interaction of Hoogsteen side with amino acid in major groove.
n
Mparagine (or glubumi nt) Serine (or threonine)
Aiginine Arginine
Figure 2-16
[nleractions involving two hydrogen bonds between amino acids and bases that can occur through the major groove of a double helix.
MB 2 2023
31
Reactivity of bases with amino acids
Single-stranded NA: Interaction of Watson-Crick side with amino acid.
II 0 0-...
H
c; h
Axpartic (or glutamic)
0
1 R H R
Asparagine (or glutaminc)
,CHa
^H-"-0*"'rJ'",H
I
Asparagine (or glutaminej
H
Hi I
cT    C H
I H
Asparaginic (or giiiiamine)
CH,
H
1 '■■
H'' .....H
I N
H^C
CH;
H HSCT I I
C H.
© I
Asparagine (or glutatnine)
Asparagine (or glutamic)
Arginine
Figure 2-17
[nieracticirii involving two hydrogen bonds between amino acids and bases that take the place of Watson-Crick base pairing.
MB 2 2023
32
Genome composition
percentage
0        10 20
i-1-r
30
T
40
T
50
T
60
T
70
T
80
90
T
100
LINEs SINEs retroviral-like elements —1
DNA-only transposon 'fossils'
TRANSPOSONS
simple sequence repeats — segmental duplications
JL
introns
protein-coding regions —
I_
GENES
non-repetitive DNA that is neither in introns nor codons
J
REPEATED SEQUENCES
UNIQUE SEQUENCES
Molecular Biology of the Cell (© Garland Science 2008)
MB 2 2023
33
Repetitive sequences - repeats
Some sequences in genome are unique, usually the genomic sequences (both coding and non-coding). In contrast, other sequences exist in many copies - repetitive sequences (repeats). The length of repeat (microsatelites 2-6 bp x LINE 6-7000 bp), as well as the number of copies (several - 1.5M SINE in human) is highly variable.
Structure:
• direct repeats
5'...AGTC. 3'...TCAG.
.AGTC .TCAG
.3' .5'
Position:
• Tandem repeats
5'...AGTC...CTGA...3' 3'...TCAG...GACT...5'
• inverted repeats + palindromes
5'...AGTC...GACT...3' 3'...TCAG...CTGA...5'
Interspersed repeats
MB 2 2023 34
Inverted repeats
Hairpin
5'-
Palindrome
AGTCGACT
3'
5-
c-
T-G-A-
-G -A -C -T
Hairpin with loop
5'-
Inverted repeat
AGTCTGAGCTGACT
3'
5-
A G G C
C-T-G-A-
-G -A -C -T
MB 2 2023
35
Inverted repeats
Cruciform
A G G C
Inverted repeat
5-AGTCTGAGCTGACT
3-TCAGACTCGACTGA
3' 5'
5-3-
c-
T-G-A-T-C-A-G-
-G -A -C -T --A --G -T -C. A
■3' ■5'
T C
MB 2 2023
36
Special types of repetitions - transposons
Interspersed repetitions with various lengths and number of copies.
LTR - long terminal repeat -100 bp - 5 kbp - variant of retrotransposons LINE - long interspersed nuclear elements - up to 6 kbp - human> 500k copies
- 3 types (LI, L2, L3) - only some LI are able to transpose SINE - (Alu,...) short interspersed nuclear elements - up to 500 bp - human ~ 1,5M copies
MB 2 2023
37
Loops and hairpins in RNA
16S RNA
7W. u*C* no
a*ugc aucuggag0
accc„ug
caaSu acgagc uguug ugcucg
ugccgaucuggaccuuaaca|j _ cc cgc
Vaaurcc -g^ a g  .   _C^ i/qo
ccuua uccuuug uuccc CGG uc
ggggu agg a a ac a.GG GCCcc Ö  oc   Ca., IK|UC
tRNA (Lys)
......?*,
D-loop \
5' pG-C • G -G -A ■ U ■
u ■
u
3'
A-OH
C
c
A
■ C
• c
■ u u
• A
■ A
mm	R6:1.301-1.54;
mm	RS: 1.051-1.300
mm	R4: 751-1.050
	R3:501-750
mm	R2:251-500
mm	Rl:1-250
Anticodon loop „
V A C U C m2G
I I I I
G AG C , v mJG C C A G A Cm U
acceptor stem TipC loop
I
c u
G A C A C
I I I I I G
m5C U G U G c C T,tl
U 7^
A ,
— G G
— G
— U
— nrC
— >P
A
Y
variable loop
G™A a
MB 2 2023
Nature Reviews | Microbiology
Functional types of RNA
hnRNA
mRNA
Translation; codes for protein sequence
pre-rRNA rRNA
Translation; part of ribosome
DNA        ->   pre-tRNA tRNA
Translation; amino acid carrier
snRNA, snoRNA,...    Splicing/modification of RNA
SÍRNA, miRNA,...      Gene expression regulation
MB 2 2023
39
Hoogsteen pairing - triplexes
Hoogsteen pairing-G-quadruplexes
Guanine quadruplexes
GGGNnGGGNnGGGNnGGG
• gene expression regulation
• telomere structure
transcriptional activation
altered transcription
(Huppert J.L, Chem Soc Rev, 2008)
IV
V
(Biffi G., et al., Nat Chem, 2013)
8'
VEGF-A
(Brooks T. A., et al., FEBS J, 2010)
5' -UTR     Exon 1     Intron 1     Exon 2
20%
£ o 15% O =
>C 5%
Intron 2     Exon 3     3' -UTR
-If-
Regions 25,747 13,909 28,239 26,321 24.963 21,838 20,613 20.882 SUrV6yed l-TSS .Poly-A
MB_2_2023      (Maizels N. and Gray L.T., PloS Genet., 2013)
250 bp 42
Base reactivity
Hydrofobic bases with high ability to form hydrogen bonds are reluctant to be freely expressed into water environment around - if there is any chance to avoid this and lower the base exposition to the environment by any type of base pairing or base stacking, the bases tend to form a structure. Even the "single-stranded" RNA or DNA forms, in fact, compact structure with number of base pairs.
MB 2 2023
43
Packing of DNA into chromosome
CP-
T At the simpl is a double-structure of
simplest level, chromatin stranded helical DNA.
ONA double helix
DNA is complexed with histones to form nucleosomes.
Each nucleosome consists of eight histone proteins around which the DNA wraps 1.65 times.
Nucleosome core of eight histone molecules
... that forms loops averaging 300 nm in length.
A chromatosome consists of a nucleosome plus the HI histone.
Chromatosome
The chromatosomes fold up to produce a 30-nm fiber...
30 nm
<z>.
The 300-nm fibers are compressed and folded to produce a 250-nm-wide fiber.
MB 2 2023
I
Tight coiling of the 250-nm fiber produces the chromatid of a chromosome.
1400 nm
44
(Nature Education, 2013)
Binding of DNA to a histone octamer
linker DNA I-1
core histones of nucleosome
\
"beads-on-a-string" nucleosome includes form of chromatin      ~200 nucleotide
pairs of DNA
NUCLEASE DIGESTS LINKER DNA
I
released nucleosome core particle
Vi
nm
DISSOCIATION WITH HIGH CONCENTRATION OF SALT
octamenc histone core 147-nucleotide-pair DNA double helix
J*
DISSOCIATION
1 \
50 nm
H2A        H2B H3 H4
Molecular Biology of the Cell (© Garland Science 2008)
MB 2 2
Folding of nucleosomes into 30 nm fiber
MB_2_2023
Molecular Biology of the Cell (© Garland Science 2008)
46
30 nm fiber binds to protein scaffold
folded 30-nm fiber
looped domain
high-level expression of genes in loop
histone
modifying enzymes chromatin
remodeling complexes RNA polymerase
proteins forming chromosome scaffold
Molecular Biology of the Cell (© Garland Science 2008)
MB 2 2023
47
Chromosome
One Chromosome
(two identical Chromatides)
O i
Short Arm (p)
Long Arm (q)
Centromere - here are the chromosomes connected to the system of cellular microtubules - important for chromosome segregation during cell division
Telomere - terminal part of chromatides that protect the end from being recognised as a double-strand break by a DNA repair machinery
Molecular Biology of the Cell (© Garland Science 2008)
MB 2 2023
Nature Reviews | Cancer
48
Chromosome
Fully condensed chromosomes are present only during the cell division, otherwise they are more or less decondensed to a lower levels of structure, especially in transcriptionally active sites (euchromatin). Transcriptionally inactive parts of DNA, as well as repetitive regions or telomeres are much more condensed (heterochromatin). Various types of chromatin differ in epigenetic markers of both DNA (5-methyl cytosine) and histones (methylation a acetylation).
MB 2 2023
49
Mycoplasma
BACTERIA I ANDARCHAEA
10;
I   I I I I III
106
PROTISTS
Arabidopsis
PLANTS I
Drosophila
INSECTS I
MOLLUSKS ■■■■■
shark
CARTILAGINOUS FISH | Fugu zebrafish
BONY FISH
AMPHIBIANS
REPTILES BIRDS
human
MAMMALS
I  I IIIIII      I  I IIIIII
107 108
I  I I Mill
109
I   I I Mill 10
10
I   I I Mill 10
ii
I  I I II III 10
12
number of nucleotide pairs per haploid genome
MB_2_2023
Figure 1-37 Molecular Biology of the Cell, Fifth Edition (© Garland Science 2008)
50
Table 1-1 Some Genomes That Have Been Completely
SPECIES ARCHAEH
Sequenced HABITAT
GENOME SIZE (1000s OF NUCLEOTIDE PAIRS PER
ESTIMATED NUMBER OF GENES CODING FOR
HAPLOID GENOME) PROTEINS
Methanococcus jannaschii	lithotrophic, anaerobic,	hydrothermal vents	1664	1750
	methane-producing			
Archaeoglobus fulgidus	lithotrophic or organotrophic,	hydrothermal vents	2178	2493
	anaerobic, sulfate-reducing			
Nanoarchaeum equitans	smallest known archaean;	hydrothermal and	491	552
	anaerobic; parasitic on another,	volcanic hot vents		
	larger archaean			
EUCARYOTES				
Saccharomyces cerevisiae	minimal model eucaryote	grape skins, beer	12,069	-6300
(budding yeast)				
Arabidopsis thaliana	model organism for flowering	soil and air	-142,000	-26,000
(Thale cress)	plants			
Caenorhabditis elegans	simple animal with perfectly	soil	-97,000	-20,000
(nematode worm)	predictable development			
Drosophila melanogaster	key to the genetics of animal	rotting fruit	-137,000	-14,000
(fruit fly)	development			
Homo sapiens (human)	most intensively studied mammal	houses	-3,200,000	-24,000
Genome size and gene number vary between strains of a single species, especially for bacteria and archaea. The table shows data for particular strains that have been sequenced. For eucaryotes, many genes can give rise to several alternative variant proteins, so that the total number of proteins specified by the genome is substantially greater than the number of genes.
MB_2_2023
Table 1-1 (part 2 of 2) Molecular Biology of the Cell, Fifth Edition (© Garland Science 2008)
51
Levels of structure of biopolymers
DNA
RNA
Protein
Primary
Secondary
43
.0
a—u a-u g-c ,
,.ga,
u
gguau" u. I I I I I *
u
A O A A
c
u
a
5><?
.6^
Tertiary
3,D-helix
Genetic code
Set of rules that assign a sequence of aminoacids in the protein to the sequence of nucleotides in DNA or RNA.
Transcription
Translation
MB 2 2023
RNA CODEWORDS AND PROTEIN SYNTHESIS, III. ON THE NUCLEOTIDE SEQUENCE OF A CYSTEINE AND A LEUCINE RNA CODEWORD
By Philip Ledlk and Marshall W. Nirenbeiui
NATIONAI> HEART INSTITUTE, NATIONAL INSTITUTES OF HEALTH
Communicated t>ji Richard B. Roberts, October I, 1964
Previous studies utilizing randomly ordered synthetic polynucleotides to direct amino acid incorporation into protein in E. coti extracts indicated that RNA codewords corresponding to valine, leucine, and cysteine contain the bases (UUG).1-' The activity of chemically denned trinucleotides in stimulating the binding of a specific C"-aminoacyl-sRNA to ribosomes, prior to peptide bond formation,1 provided a means of investigating base sequence of RNA codewords and showed that the sequence of a valine RNA codeword is GpUpU.'
53
Properties of genetic code
• genetic code is based on triplets - one aminoacid in protein is coded by a sequence of three nucleotides in DNA (RNA]
Triplet = Codon x anticodon = complementary sequence on particular tRNA that carries the
mRNA CGUGGUACGAUUGGAUGUL
_i ■ . ■ i_._11_._11_
Protein    Arg Gly Thr  Me Gly CyS    respective aminoacid
• genetic code is universal - individual triplets code for the same aminoacid in almost all organisms (x mitochondria)
CGU = Arginine
CGU = Arginine
CGU = Arginine
• genetic code is degenerated - one aminoacid might be coded by several different triplets (but the opposite is not true)
CGC
Arginine
MB 2 2023
AGA
54
Genetic code
	Second nt				
First nt	U	C	A	G	Third nt
	Phe	Ser	Tyr	Cys	U
U	Phe	Ser	Tyr	Cys	C
	Leu	Ser	STOP	STOP/Sel	A
	Leu	Ser	STOP	Trp	G
	Leu	Pro	His	Arg	U
C	Leu	Pro	His	Arg	C
	Leu	Pro	Gin	Arg	A
	Leu	Pro	Gin	Arg	G
		Thr	Asn	Ser	U
A	lie	Thr	Asn	Ser	C
		Thr	Lys	Arg	A
	Met/START	Thr	Lys	Arg	G
	Val	Ala	Asp	Gly	U
G	Val	Ala	Asp	Gly	C
	Val	Ala MB 2 Ala	Glu 2023 Glu	Gly	A
	Val			Gly	G
Reading frames
Genetic code is based on triplets - three possible ways of reading (reading frames), but only one is correct.
mRNA CGUGGUACGAUUGGAUGU
1—i—1 ■—i—■■—i—"—i—> >—i—"—i—>
Proteini Arg Gly Thr Me Gly Cys
mRNA CGUGGUACGAUUGGAUGU
'—i—>'—i—m—i—"—i—-'—i—"—i—
Protein2    Val Val Arg Leu Asp
mRNA CGUGGUACGAUUGGAUGU
•—i—• ■—i—n—i—"—i—1'—i——i
Protein3     Trp Tyr Asp Trp Met
MB 2 2023 56
Genetic code
Although the genetic code is universal, the usage of particular codons, as well as the amount of particular tRNAs and aminoacyl transferases differ
Modification of synthetic genes for recombinant protein production according to the expression system used (Bacteria, human,...) might be highly beneficial.
MB 2 2023
57