PA152: Efficient Use of DB
11. Replication and
High Availability
Vlastislav Dohnal
PA152, Vlastislav Dohnal, FI MUNI, 2024 2
Credits
◼ This presentation is based on:
 Microsoft MSDN library
 Course NoSQL databases and Big Data management
◼ Irena Holubová
◼ Charles University, Prague
◼ http://www.ksi.mff.cuni.cz/~holubova/NDBI040/
PostgreSQL documentation
◼ http://www.postgresql.org/docs/9.3/static/high-
availability.html
451 Research
◼ M. Aslett: CAP Theorem – two out of three ain’t
right
PA152, Vlastislav Dohnal, FI MUNI, 2024 3
Contents
◼ Availability
◼ Data distribution & replication
◼ High availability
◼ Failover
◼ Recommendations
Availability
PA152, Vlastislav Dohnal, FI MUNI, 2024 4
DB Server
DB ServerSource: Microsoft
High availability (HA) is the ability of a system to operate continuously
without failing for a specified period of time.
Determining Availability
◼ Hours of Operation
Business hours vs. all of the time
◼ intranet service vs. web service
◼ shift workers vs. all-around-the-world customers
◼ Connectivity Requirements
Tight/Loose coupling of app and DBMS
◼ Synchronous vs. asynchronous data updates
Online vs. offline applications – so response
time can be important!
PA152, Vlastislav Dohnal, FI MUNI, 2024 5
Availability
◼ Definition using operation hours
 Av = “up time” / “total time”
◼ “up time” = the system is up and operating
 More practical def.
◼ Av = (total time - down time) / total time
◼ Down time
 Scheduled – reboot, SW/HW upgrade, …
 Unscheduled – HW/SW failure, security breaches,
network unavailability, power outage, disasters, …
 Non-functional app requirements – response time
◼ For “true” high-availability, down time is not
distinguished
PA152, Vlastislav Dohnal, FI MUNI, 2024 6
Quantifying Availability: Nines
◼ Availability as percentage of uptime
Class of nines: 𝑐 = − log10 1 − 𝐴𝑣
◼ Assuming 24/7 operation:
PA152, Vlastislav Dohnal, FI MUNI, 2024 7
Nine class Availability Downtime per year Downtime per month Downtime per week
1 90% 36.5 days 72 hours 16.8 hours
2 99% 3.65 days 7.20 hours 1.68 hours
3 99.9% 8.76 hours 43.8 minutes 10.1 minutes
4 99.99% 52.56 minutes 4.32 minutes 1.01 minutes
5 99.999% 5.26 minutes 25.9 seconds 6.05 seconds
6 99.9999% 31.5 seconds 2.59 seconds 0.605 seconds
7 99.99999% 3.15 seconds 0.259 seconds 0.0605 seconds
Source: Wikipedia.org
Scalability
◼ Providing access to a number of
concurrent users
◼ Handling growing amounts of data without
losing performance
◼ With acceptable latency!
PA152, Vlastislav Dohnal, FI MUNI, 2024 8
Scalability: Solutions
◼ Scaling Up – vertical scaling
Increasing RAM
Multiprocessing
 → vendor dependence
◼ Scaling Out – horizontal scaling
Server federations/clusters
→ data distribution
PA152, Vlastislav Dohnal, FI MUNI, 2024 9
Horizontal Scaling
◼ Systems are distributed across multiple
machines or nodes
 Commodity machines → cost effective
 Often surpasses scalability of vertical approach
◼ Fallacies of distributed computing by Peter Deutsch
 Network
◼ Is reliable, secure, homogeneous
◼ Topology does not change
◼ Latency and transport cost is zero
◼ Bandwidth is infinite
 One administrator
PA152, Vlastislav Dohnal, FI MUNI, 2024 10
Source: https://blogs.oracle.com/jag/resource/Fallacies.html
Need for Distributing Data
◼ Bring data closer to its user
geographic scalability
◼ Allow site independence
◼ Separate
Online transaction processing
Read-intensive applications
◼ Reduce conflicts during user requests
◼ Process large volumes of data
PA152, Vlastislav Dohnal, FI MUNI, 2024 11
Distributing Data
◼ Approaches
Data partitioning
◼ Sharding
◼ E.g.,
 Peer-to-peer networks
Replication
◼ Copies of data
◼ E.g.,
 One writeable and many read-only (standby) servers
PA152, Vlastislav Dohnal, FI MUNI, 2024 12
Replication / Distribution Model
◼ Model of distributing data
Replication
◼ The same data stored in more nodes.
Filtering data (sharding)
◼ The data is partitioned and stored separately
◼ Helps avoid replication conflicts when multiple
sites are allowed to update data.
PA152, Vlastislav Dohnal, FI MUNI, 2024 13
Filtering Data (in general)
Subscriber
Horizontal Filtering
2
3
4
5
6
7
1
A B C D E F
3
6
2
A B C D E F
Vertical Filtering
2
3
4
5
6
7
1
A B C D E F
2
3
4
5
6
7
1
A B E
Table
TablePublisher
Source: Microsoft
PA152, Vlastislav Dohnal, FI MUNI, 2024 14
Distribution Model: Replication
◼ Master-slave model
Load-balancing of read-intensive queries
◼ Master node
manages data
distributes changes to slaves
◼ Slave node
stores data
queries data
no modifications
to data
PA152, Vlastislav Dohnal, FI MUNI, 2024 15
Slaves
Master
One master / many slaves
Distribution Model: Replication
◼ Master-master model
Consistency needs resolving update conflicts
◼ In “real” master-master model (or peer-to-peer)
Typically, with sharding (filtering) data
◼ Master for a subset of data
◼ Slave for the rest
PA152, Vlastislav Dohnal, FI MUNI, 2024 16
Multiple Masters
Master/Slave
Master/Slave
Master/Slave
Master-Master Model with Sharding
PA152, Vlastislav Dohnal, FI MUNI, 2024 17
Orders (Master A)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
1
1
1000
3100
~
~
15
22
Orders (Master B)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
2
2
1000
2380
~
~
32
8
Orders (Master C)
Primary Key
Area Id Order_no
1
1
2
2
3
3
1000
3100
1000
2380
1000
1070
~
~
~
~
~
~
Qty
15
22
32
8
7
19
3
3
1000
1070
~
~
7
19
Master/Slave
Master/SlaveMaster/Slave
Source: Microsoft
Replication Types (for “real” multi-master model)
PA152, Vlastislav Dohnal, FI MUNI, 2024 18
Snapshot Replication
Transactional Replication
Distributed Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Source: Microsoft
Replication Types
◼ Distributed Transactions
For “real” master-master model, ensures
consistency
Low latency, high consistency
◼ Transactional Replication
Replication of incremental changes
Minimal latency (typically online)
Conflicts are solved using shared locks
PA152, Vlastislav Dohnal, FI MUNI, 2024 19
Snapshot Replication
Transactional Replication
Distributed
Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Replication Types
◼ Snapshot Replication
Periodic bulk data transfer as new snapshots
◼ Intermediate updates to data might be unnoticed
by “subscribers”
◼ So, copies can be out-of-date
Data changes – substantial but infrequent
Replica servers are read-only
High latency is acceptable
PA152, Vlastislav Dohnal, FI MUNI, 2024 20
Snapshot Replication
Transactional Replication
Distributed
Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Replication Types
◼ Merge Replication
Autonomous changes to replicated data are
later merged
◼ Default and custom conflict resolution rules
Does not guarantee transactional consistency,
but converges
Adv: Nodes can update data offline, sync later
Disadv: Changes to schema needed.
▪ E.g., row version, row originator
PA152, Vlastislav Dohnal, FI MUNI, 2024 21
Snapshot Replication
Transactional Replication
Distributed
Transactions
Lower Autonomy
Lower Latency
Higher Autonomy
Higher Latency
Merge Replication
Issues of Distributed Systems
◼ Consistency
 After an update, all readers in a distributed system
see the same data
 All nodes are supposed to always contain the
same data
 E.g., in multiple instances, all writes must be
duplicated before write operation is completed.
◼ Availability
 Every request receives a response
◼ about whether it was successful or failed
◼ Partition Tolerance
 System continues to operate despite arbitrary
message loss or failure of part of the system.
PA152, Vlastislav Dohnal, FI MUNI, 2024 22
Brewer’s CAP Theorem
◼ Only 2 of 3 guarantees can be given in a “shareddata”
system.
 Proved by Nancy Lynch in 2002
◼ ACID
 provides Availability and
Consistency
 E.g., replication with
distributed transactions
◼ BASE
 provides Availability and Partition
tolerance
 Reality: you can trade a little consistency for some
availability
 E.g., sharding with merge replication
PA152, Vlastislav Dohnal, FI MUNI, 2024 23
Source: http://bigdatanerd.wordpress.com
NewSQL
DB
NewSQL
◼ Distributed database systems that scale out
◼ CP systems
trade availability for consistency when partition
occurs
◼ MySQL cluster, Google Spanner, VoltDB, …
In fact, master-master replication with data
sharding
PA152, Vlastislav Dohnal, FI MUNI, 2024 24
BASE Properties
◼ Basically Available
Partial failures can occur, but without total
system failure
◼ Soft state
System is in flux / non-deterministic
◼ Changes occur all the time
◼ Eventual consistency (replica convergence)
is a liveness guarantee
◼ Read requests eventually return the same value
is not safety guarantee
◼ can return any value before it converges
PA152, Vlastislav Dohnal, FI MUNI, 2024 25
Two out of three ain’t right
◼ Can a distributed system be not tolerant of
partitions?
◼ Partition Tolerance is mandatory in
distributed systems!
Network partitions are a given; therefore,
consistency and availability cannot be
achieved.
PA152, Vlastislav Dohnal, FI MUNI, 2024 26
Two out of three ain’t right
◼ In reality, there are two types of systems
when there is a partition:
give up availability
give up consistency.
◼ Incorrect conclusion:
Consistent, partition tolerant, databases
cannot be available, and are therefore not
feasible
PA152, Vlastislav Dohnal, FI MUNI, 2024 27
Consistency or Availability?
◼ NoSQL databases relax consistency in
favor of availability
but are not inconsistent.
◼ NewSQL databases sacrifice availability for
consistency
but are not unavailable.
◼ When partition occurs, make the decision.
Rather make the choice during the
development process
PA152, Vlastislav Dohnal, FI MUNI, 2024 28
Tunable Consistency
◼ Allows developers to adopt a consistency
model required for a specific application,
workload or query.
◼ Strong vs Eventual
consistency
PA152, Vlastislav Dohnal, FI MUNI, 2024 29
Consistency
◼ Strong (ACID) vs. Eventual (BASE)
consistency
◼ Example:
PA152, Vlastislav Dohnal, FI MUNI, 2024 30
Server A: read(A)=1 write(A,2) read(A)=2
Server B: read(A)= 1 read(A)=1 read(A)=2
time
Server C: read(A)= 1 read(A)=2
Server A: read(A)=1 write(A,2) read(A)=2
Server B: read(A)= 1 read(A)=2 read(A)=2
Server C: read(A)= 1 read(A)=2
Inconsistent state
EventualStrong
Tunable Consistency
◼ Would allow developers to adopt an
availability model required for a specific
application, workload or query.
◼ But tunable availability
is a trade‐off of
tunable consistency
◼ It is called latency.
PA152, Vlastislav Dohnal, FI MUNI, 2024 31
Beyond CAP Theorem
◼ It’s not “pick two”
◼ Or even pick one
◼ Think off them as dials
on a dashboard used
to achieve the most
appropriate balance of consistency,
availability and partition tolerance
for specific workloads
PA152, Vlastislav Dohnal, FI MUNI, 2024 32
Maintaining High Availability of DBMS
◼ Standby server
Shared disk failover (NAS)
File system replication (DRBD)
Transaction log shipping
Trigger-based replication
Statement-Based
Replication
Middleware
PA152, Vlastislav Dohnal, FI MUNI, 2024 33
Clients
Primary
Node
Secondary/
Standby
Node
Cluster
Log-shipping Standby Server
◼ Primary node
 serves all queries
 in permanent archiving mode
◼ Continuous sending of WAL records to standby servers
◼ Standby server
 serves no queries
 in permanent recovery mode
◼ Continuous processing of WAL records arriving from primary
node
 Also called warm standby
◼ Log shipping can be synchronous/asynchronous
◼ Disadvantage: all tables are replicated typically
◼ Advantage: no schema changes, no trigger
definitions
PA152, Vlastislav Dohnal, FI MUNI, 2024 34
Log-shipping: Failure of a node
◼ If standby fails, no action taken.
After becoming online, catch-up procedure is
started.
◼ If primary fails, standby server begins
failover.
1. Standby applies all WAL records pending,
2. marks itself as primary,
3. starts to serve all queries.
◼ Heartbeat mechanism
to continually verify the connectivity between
the two and the viability of the primary server
PA152, Vlastislav Dohnal, FI MUNI, 2024 35
Log-shipping: After failover
◼ When failover by standby succeeded,
a new standby should be configured.
◼ When the original primary node becomes
available
→ detect that it is no longer the primary
◼ do so-called STONITH (Shoot The Other Node In
The Head),
◼ otherwise, serious data corruption/loss may occur
◼ Old primary usually becomes new standby.
PA152, Vlastislav Dohnal, FI MUNI, 2024 36
Primary and Standby Servers
◼ Swap primary and standby regularly
To verify recovery steps
To do necessary maintenance on standby
server
◼ SW/HW upgrades, …
◼ Mind limits on SW upgrades – see DBMS docs
PA152, Vlastislav Dohnal, FI MUNI, 2024 37
PostgreSQL: Replication
PA152, Vlastislav Dohnal, FI MUNI, 2024
◼ TPC Benchmark B
38
scale factor 1
1 trans. = 5 queries
2x server
(AMD Opteron 8439 SE,
1024 MB RAM,
20 GB HDD)
Recommended HA Practices
◼ Maximize availability at each tier of the
application
Independent power supply to the primary
server
Keep standby servers on a different subnet
◼ Test whether your availability solution
works
PA152, Vlastislav Dohnal, FI MUNI, 2024 39
Lecture Takeaways
◼ Term of Availability and its classification
◼ Possible techniques (sharding / replication)
◼ CAP Theorem
ACID & BASE systems
◼ Know possible implementation in relational
DBMS
◼ The future of databases is distributed.
PA152, Vlastislav Dohnal, FI MUNI, 2024 40