PA152: Efficient Use of DB 12. Replication and High Availability Vlastislav Dohnal PA152, Vlastislav Dohnal, FI MUNI, 2018 2 Credits ◼ This presentation is based on:  Microsoft MSDN library  Course NoSQL databases and Big Data management ◼ Irena Holubová ◼ Charles University, Prague ◼ http://www.ksi.mff.cuni.cz/~holubova/NDBI040/ PostgreSQL documentation ◼ http://www.postgresql.org/docs/9.3/static/high- availability.html PA152, Vlastislav Dohnal, FI MUNI, 2018 3 Contents ◼ Availability ◼ Data distribution & Replication ◼ High availability ◼ Failover ◼ Recommendations Availability PA152, Vlastislav Dohnal, FI MUNI, 2018 4 DB Server DB Server Source: Microsoft Determining Availability Requirements ◼ Hours of Operation Business hours vs. all of the time ◼ intranet service vs. web services ◼ shift workers vs. all-around the world customers ◼ Connectivity Requirements Online vs. offline applications – so response time can be important! ◼ Tight/Loose coupling of app and DBMS Synchronous vs. asynchronous data updates PA152, Vlastislav Dohnal, FI MUNI, 2018 5 Availability ◼ Definition in operation hours  Av = “up time” / “total time” ◼ “up time” = the system is up and operating  More practical def. ◼ Av = (total time - down time) / total time ◼ Down time  Scheduled – reboot, SW/HW upgrade, …  Unscheduled – HW/SW failure, security breaches, network unavailability, power outage, disasters, …  Non-functional app requirements – response time ◼ For “true” high-availability, down time is not distinguished PA152, Vlastislav Dohnal, FI MUNI, 2018 6 Nines ◼ Availability as percentage of uptime Class of nines: 𝑐 = − log10 1 − 𝐴𝑣 ◼ Assuming 24/7 operation: PA152, Vlastislav Dohnal, FI MUNI, 2018 7 Nine class Availability Downtime per year Downtime per month Downtime per week 1 90% 36.5 days 72 hours 16.8 hours 2 99% 3.65 days 7.20 hours 1.68 hours 3 99.9% 8.76 hours 43.8 minutes 10.1 minutes 4 99.99% 52.56 minutes 4.32 minutes 1.01 minutes 5 99.999% 5.26 minutes 25.9 seconds 6.05 seconds 6 99.9999% 31.5 seconds 2.59 seconds 0.605 seconds 7 99.99999% 3.15 seconds 0.259 seconds 0.0605 seconds Source: Wikipedia.org Scalability ◼ Providing access to a number of concurrent users ◼ Handling growing amounts of data without losing performance ◼ With acceptable latency! PA152, Vlastislav Dohnal, FI MUNI, 2018 8 Need for Distributing Data ◼ Brings data closer to its user ◼ Allows site independence ◼ Separates Online transaction processing Read-intensive applications ◼ Can reduce conflicts during user requests ◼ Process big data PA152, Vlastislav Dohnal, FI MUNI, 2018 9 Scalability: Solutions ◼ Scaling Up – vertical scaling → vendor dependence Increasing RAM Multiprocessing ◼ Scaling Out – horizontal scaling Replication Read-only standby servers Server federations / clusters / data distribution PA152, Vlastislav Dohnal, FI MUNI, 2018 10 Horizontal Scaling ◼ Systems are distributed across multiple machines or nodes  Commodity machines → cost effective  Often surpasses scalability of vertical approach ◼ Fallacies of distributed computing by Peter Deutsch  Network ◼ Is reliable, secure, homogeneous ◼ Topology does not change ◼ Latency and transport cost is zero ◼ Bandwidth is infinite  One administrator PA152, Vlastislav Dohnal, FI MUNI, 2018 11 Source: https://blogs.oracle.com/jag/resource/Fallacies.html Brewer’s CAP Theorem ◼ Consistency  After an update, all readers in a distributed system see the same data  All nodes are supposed to contain the same data at all times  E.g. in multiple instances, all writes must be duplicated before write operation is completed. ◼ Availability  Every request receives a response ◼ about whether it was successful or failed ◼ Partition Tolerance  System continues to operate despite arbitrary message loss or failure of part of the system. PA152, Vlastislav Dohnal, FI MUNI, 2018 12 Brewer’s CAP Theorem ◼ Only 2 of 3 guarantees can be given in a “shareddata” system.  Proved by Nancy Lynch in 2002 ◼ ACID  provides Availability and Consistency  E.g. database on a single machine ◼ BASE  provides Availability and Partition tolerance  Reality: you can trade a little consistency for some availability  E.g. distributed database PA152, Vlastislav Dohnal, FI MUNI, 2018 13 Source: http://bigdatanerd.wordpress.com NewSQL DB NewSQL ◼ Distributed database system that scales out ◼ CP system trades availability for consistency when partition happens ◼ MySQL cluster, Google Spanner, VoltDB, … In fact, master-master replication with data sharding PA152, Vlastislav Dohnal, FI MUNI, 2018 14 BASE Properties ◼ Basically Available Partial failures can occur, but without total system failure ◼ Soft state System is in flux / non-deterministic ◼ Changes occur all the time ◼ Eventual consistency (replica convergence) is a liveness guarantee ◼ reads eventually return the same value is not safety guarantee ◼ can return any value before it converges PA152, Vlastislav Dohnal, FI MUNI, 2018 15 Consistency ◼ Strong (ACID) vs. Eventual (BASE) consistency ◼ Example: PA152, Vlastislav Dohnal, FI MUNI, 2018 16 Server A: read(A)=1 write(A,2) read(A)=2 Server B: read(A)= 1 read(A)=1 read(A)=2 time Server C: read(A)= 1 read(A)=2 Server A: read(A)=1 write(A,2) read(A)=2 Server B: read(A)= 1 read(A)=2 read(A)=2 Server C: read(A)= 1 read(A)=2 Inconsistent state EventualStrong Replication / Distribution Model ◼ Model of distributing data Replication ◼ The same data stored in more nodes. Filtering data (sharding) ◼ The data is partitioned and stored separately ◼ Helps avoid replication conflicts when multiple sites are allowed to update data. PA152, Vlastislav Dohnal, FI MUNI, 2018 17 Filtering Data Subscriber Vertical Filtering Horizontal Filtering 2 3 4 5 6 7 1 A B C D E F 2 3 4 5 6 7 1 A B C D E F 2 3 4 5 6 7 1 A B E 3 6 2 A B C D E F Table A Table BPublisher Source: Microsoft PA152, Vlastislav Dohnal, FI MUNI, 2018 18 Distribution Model: Replication ◼ Master-slave model  Load-balancing of read-intensive queries ◼ Master node  manages data  distributes changes to slaves ◼ Slave node  stores data  queries data  no modifications to data PA152, Vlastislav Dohnal, FI MUNI, 2018 19 Slaves Master One master / many slaves Distribution Model: Replication ◼ Master-master model Typically with filtering data ◼ Master for a subset of data ◼ Slave for the rest Consistency needs resolving of update conflicts PA152, Vlastislav Dohnal, FI MUNI, 2018 20 Multiple Masters Master/Slave Master/Slave Master/Slave Master-master Model PA152, Vlastislav Dohnal, FI MUNI, 2018 21 Orders (Master A) Primary Key Area Id Order_no 1 1 2 2 3 3 1000 3100 1000 2380 1000 1070 ~ ~ ~ ~ ~ ~ Qty 15 22 32 8 7 19 1 1 1000 3100 ~ ~ 15 22 Orders (Master B) Primary Key Area Id Order_no 1 1 2 2 3 3 1000 3100 1000 2380 1000 1070 ~ ~ ~ ~ ~ ~ Qty 15 22 32 8 7 19 2 2 1000 2380 ~ ~ 32 8 Orders (Master C) Primary Key Area Id Order_no 1 1 2 2 3 3 1000 3100 1000 2380 1000 1070 ~ ~ ~ ~ ~ ~ Qty 15 22 32 8 7 19 3 3 1000 1070 ~ ~ 7 19 Master/Slave Master/SlaveMaster/Slave Source: Microsoft Replication Types for “real” multi-master model PA152, Vlastislav Dohnal, FI MUNI, 2018 22 Snapshot Replication Transactional Replication Distributed Transactions Lower Autonomy Lower Latency Higher Autonomy Higher Latency Merge Replication Source: Microsoft Replication Types ◼ Distributed Transactions For “real” master-master model, ensures consistency Low latency, high consistency ◼ Transactional Replication Replication of incremental changes Minimal latency (typically online) Conflicts solves using shared locks PA152, Vlastislav Dohnal, FI MUNI, 2018 23 Replication Types ◼ Snapshot Replication Periodic bulk transfer of new snapshots of data ◼ Intermediate updates to data might be unnoticed by “subscribers” Data changes – substantial but infrequent Slaves are read-only High latency is acceptable PA152, Vlastislav Dohnal, FI MUNI, 2018 24 Replication Types ◼ Merge Replication Autonomous changes to replicated data are later merged Does not guarantee transactional consistency, but converges Default and custom conflict resolution rules Adv: Nodes can update data offline, sync later Disadv: Changes to schema needed. PA152, Vlastislav Dohnal, FI MUNI, 2018 25 Maintaining High-Availability ◼ Standby server Shared disk failover (NAS) File system replication (DRBD) Transaction log shipping Trigger-based replication Statement-Based Replication Middleware PA152, Vlastislav Dohnal, FI MUNI, 2018 26 Clients Primary Node Secondary/ Standby Node Cluster Log-shipping Standby Server ◼ Also called warm standby ◼ Primary node  serves all queries  in permanent archiving mode ◼ Continuous sending of WAL records to standby servers ◼ Standby server  serves no queries  in permanent recovery mode ◼ Continuous processing of WAL records arriving from primary node ◼ Log shipping can be synchronous/asynchronous ◼ Disadvantage: all tables are replicated typically ◼ Advantage: no schema changes, no trigger definitions PA152, Vlastislav Dohnal, FI MUNI, 2018 27 Failover ◼ If primary fails, standby server begins failover. Standby applies all WAL records pending, marks itself as primary, starts to serve all queries. ◼ If standby fails, no action taken. After becoming online, catch-up procedure is started. ◼ Heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary server PA152, Vlastislav Dohnal, FI MUNI, 2018 28 Failover ◼ Failover by standby succeeded New standby should be configured Original primary node becomes available → inform it that it is no longer the primary ◼ do so-called STONITH (Shoot The Other Node In The Head), ◼ otherwise serious data corruption/loss may occur Typically old primary becomes new standby PA152, Vlastislav Dohnal, FI MUNI, 2018 29 Primary and Standby Servers ◼ Swap primary and standby regularly To verify recovery steps To do necessary maintenance on standby server ◼ SW/HW upgrades, … PA152, Vlastislav Dohnal, FI MUNI, 2018 30 PostgreSQL: Replication PA152, Vlastislav Dohnal, FI MUNI, 2018 ◼ TPC Benchmark B 31 scale factor 1 1 trans. = 5 queries Recommended Practices ◼ Maximize availability at each tier of the application ◼ Keep standby servers on a different subnet ◼ Independent power supply to the primary server ◼ Test whether your availability solution works PA152, Vlastislav Dohnal, FI MUNI, 2018 32 Lectures are over… What next? ◼ Courses  PV003 Architektura relačních databázových systémů  PA128 Similarity Searching in Multimedia Data  PA212 Advanced Search Techniques for Large Scale Data Analytics  PA195 NoSQL Databases ◼ Master thesis  From DBs to App Development ◼ PhD study  Research in Motion Capture Data  Similarity Operators in Databases  … ◼ Or come me to have a chit-chat ((-: PA152, Vlastislav Dohnal, FI MUNI, 2018 33