PA152: Efficient Use of DB 12. New SQL Vlastislav Dohnal Credits ◼ This presentation is based on:  S. Harizopoulos et al: OLTP Through the Looking Glass, and What We Found There. SIGMOD, 2008  M. Stonebraker et al: H-Store: The End of an Architectural Era. VLDB, 2007  J. DeBradant et al: Anti-Caching: A New Approach to Database Management System Architecture. VLDB, 2013. PA152, Vlastislav Dohnal, FI MUNI, 2024 2 Contents ◼ Features of OLTP ◼ Trends in OLTP ◼ Performance study of individual bottlenecks ◼ H-store PA152, Vlastislav Dohnal, FI MUNI, 2024 3 Main Features of OLTP ◼ Buffer Management to facilitate data transfer between memory and disk ◼ B-Tree for on-disk data storage ◼ Logging for recovery ◼ Locking to support concurrency ◼ Latching for accessing shared data structure PA152, Vlastislav Dohnal, FI MUNI, 2024 4 Motivation ◼ Is the OLTP database optimized nowadays, given the hardware advancement? ◼ Request from outside the DB community for alternative DB architecture PA152, Vlastislav Dohnal, FI MUNI, 2024 5 Motivation: Hardware advancement In 1980s Nowadays HW cost In millions Few thousands Storage size DB size >> Memory Memory > DB size Processing time for most of the transactions \ In microseconds PA152, Vlastislav Dohnal, FI MUNI, 2024 6 Motivation: Request from outside ◼ “Database-like” storage system proposal from Operating System and networking conference varying forms of ◼ concurrency, ◼ consistency, ◼ reliability, ◼ replication, ◼ queryability. PA152, Vlastislav Dohnal, FI MUNI, 2024 7 Trends in OLTP 1. Cluster computing 2. Memory resident databases Data in OLTP doesn’t grow as fast as memory size. 3. Single threading 4. High availability vs. Logging 5. Transaction Variants PA152, Vlastislav Dohnal, FI MUNI, 2024 8 Trend 3: Single Threading ◼ A step backward from multithread to single thread? ◼ Why multithreading? Prevent idle of CPU while waiting data from disk Prevent long-running transactions from blocking short transaction ◼ Not valid for memory resident DB No disk wait Long-running transactions run in warehouse PA152, Vlastislav Dohnal, FI MUNI, 2024 9 Trend 3: Single Threading ◼ What about multi processors? Dynamic locking was experimentally the best concurrency control with disk. What concurrency control protocol is best? ◼ Goal: Achieve shared-nothing processor by virtual machine So, concurrency control code gets removed. PA152, Vlastislav Dohnal, FI MUNI, 2024 10 Trend 3: Single Threading ◼ What about network disk? Feasible to partition transaction to run in “single-site”. Intra-query parallelism: each processor running on a part of a single query. PA152, Vlastislav Dohnal, FI MUNI, 2024 11 Trend 4: HA vs. Logging ◼ 24x7 service achieved by using multiple sets of hardware. ◼ Perform recovery by copying missing states from other database replicas. Log for recovery can be avoided. PA152, Vlastislav Dohnal, FI MUNI, 2024 12 Production Server Standby Server Recovery from Trend 5: Transaction Variants ◼ Why transaction variants? 2-phase commit protocol harm performance of large-scale distributed DB system 2-phase commit involves commit-request and commit phase which involves all server to participate. PA152, Vlastislav Dohnal, FI MUNI, 2024 13 Trend 5: Transaction Variants ◼ Trade consistency for performance ◼ Eventual consistency, all writes propagate among the database servers. But not immediately. PA152, Vlastislav Dohnal, FI MUNI, 2024 14 Trend in OLTP - Summary PA152, Vlastislav Dohnal, FI MUNI, 2024 15 Impact on DBMS ◼ (1) memory resident DB can get rid of buffer management Buffers PA152, Vlastislav Dohnal, FI MUNI, 2024 16 Impact on DBMS ◼ (2) single thread can avoid locking and latching PA152, Vlastislav Dohnal, FI MUNI, 2024 17 Impact on DBMS ◼ (3) cluster computing helps avoid locking. Instead of single processor and multithreading, each processor is responsible for each own thread. PA152, Vlastislav Dohnal, FI MUNI, 2024 18 Impact on DBMS ◼ (4) high availability without replication mgr. Active-passive replication scheme (log shipping) 1. Replica may not be consistent with the primary  unless on two-phase commit protocol 2. Failover in not instantaneous 3. Log is required  It takes about 20% of CPU cycles. Active-active replication scheme with transactions ◼ Two-phase commit introduces large latency for distributed replication yet. PA152, Vlastislav Dohnal, FI MUNI, 2024 19 Impact on DBMS ◼ (5) being “transaction less” avoids book keeping, i.e., logging. ◼ (5+) Cache-conscious B-Trees Cache misses in the B-tree code may well be the new bottleneck for the stripped-down system. ◼ Related to utilization of the first-level data cache of the CPU. PA152, Vlastislav Dohnal, FI MUNI, 2024 20 TPC-C Benchmark ◼ TPC-C is industry standard used to measure ecommerce performance ◼ TPC-C is designed to represent any industry that must manage, sell, or distribute a product or service ◼ Vendors includes Microsoft, Oracle, IBM, Sybase, Sun, HP, DELL etc. http://www.tpc.org/tpcc/default.asp PA152, Vlastislav Dohnal, FI MUNI, 2024 21 TPC-C Benchmark ◼ 1 warehouse(~100M) serves 10 districts, and each district serve 3000 customers. PA152, Vlastislav Dohnal, FI MUNI, 2024 22 TPC-C Benchmark ◼ 5 concurrent business transactions New Order Transaction Payment Deliver Order Check status of Order Monitor Stock Level of warehouse PA152, Vlastislav Dohnal, FI MUNI, 2024 23 Experiment setup ◼ 40,000 transactions run for types New Order Transaction and Payment ◼ Results measured in Throughput (Time, Transactions completed) Instruction count ◼ Single-core Pentium 4, 3.2GHz, with 1MB L2 cache, hyper threading disabled, 1GB RAM, running Linux 2.6. PA152, Vlastislav Dohnal, FI MUNI, 2024 24 Effect of removing components (1) ◼ Payment transaction: PA152, Vlastislav Dohnal, FI MUNI, 2024 25 Effect of removing components (2) PA152, Vlastislav Dohnal, FI MUNI, 2024 26 Effect of removing components (3) PA152, Vlastislav Dohnal, FI MUNI, 2024 27 Effect of removing components (4) PA152, Vlastislav Dohnal, FI MUNI, 2024 28 Effect of removing components (5) PA152, Vlastislav Dohnal, FI MUNI, 2024 29 Effect of removing components (6) ◼ Instructions of useful work is only <2% of a memory resident DB PA152, Vlastislav Dohnal, FI MUNI, 2024 30 Effect of removing components (7) ◼ The same for New Order Transaction PA152, Vlastislav Dohnal, FI MUNI, 2024 31 Effect of removing components (8) ◼ Comparison of CPU instructions and cycles New order transaction PA152, Vlastislav Dohnal, FI MUNI, 2024 32 remaining overhead Experiment Results ◼ Memory resident DBMS 640 transactions per second. ◼ Stripped-down DBMS 12,700 transactions per second. ◼ Stripped-down DBMS gave a 20 times improvement in throughput PA152, Vlastislav Dohnal, FI MUNI, 2024 33 Conclusion ◼ Most significant overhead contributors buffer management and locking operation, followed by logging and latching. ◼ A fully stripped-down system’s performance is orders of magnitude better than an unmodified system. “One size fits all” DBMSes excel at nothing ◼ Need for specialized databases and languages PA152, Vlastislav Dohnal, FI MUNI, 2024 34 Conclusion ◼ Welcome to NewSQL PA152, Vlastislav Dohnal, FI MUNI, 2024 35 NewSQL DBMS ◼ Highly concurrent, latch-free data structures ◼ Partitioning into single-threaded executors ◼ H-store Distributed, shared-nothing, main mem DBMS Row-store based relational DBMS PA152, Vlastislav Dohnal, FI MUNI, 2024 36 ZDNet interview, Feb. 2008 PA152, Vlastislav Dohnal, FI MUNI, 2024 37 H-Store ◼ Logging overhead Replication for recovery → no redo log Transient undo log sufficient for tx rollback ◼ Transaction classes Optimize concurrency control protocols ◼ Incremental scalability Shared nothing architecture ◼ Remove knobs/tuning parameters Personnel costs higher than machine costs Automatic physical database design PA152, Vlastislav Dohnal, FI MUNI, 2024 38 H-Store Architecture PA152, Vlastislav Dohnal, FI MUNI, 2024 39 H-Store Cluster ◼ Cluster = multiple computers (nodes) ◼ Node = multi-core CPUs, RAM hosts multiple sites ◼ Site = process of H-Store dedicated CPU core and RAM, data partition PA152, Vlastislav Dohnal, FI MUNI, 2024 40 Node A Node B … Transaction Classes 1. Single-sited transactions 2. One-shot transactions 3. Two-phase transactions 4. Sterile transactions 5. General transactions PA152, Vlastislav Dohnal, FI MUNI, 2024 41 1. Single-sited transactions ◼ All queries hit the same partition ◼ Constrained Tree Schemas Root table can be horizontally hash-partitioned Collocate corresponding shards of child tables No communication between partitions PA152, Vlastislav Dohnal, FI MUNI, 2024 42 2. One-shot transactions ◼ No inter-query dependencies ◼ Execute in parallel without communication Replicate read-only parts Vertical partitioning Can be decomposed into single-sited plans Local decisions → No redo log required PA152, Vlastislav Dohnal, FI MUNI, 2024 43 3. Two-phase class ◼ Two-phase classes Phase 1: Read-only operations Phase 2: Updates cannot violate integrity No undo log required PA152, Vlastislav Dohnal, FI MUNI, 2024 44 4. Sterile classes ◼ Sterile classes Operate independently Do not depend on results / state of other concurrent transactions No concurrency control needed ◼ i.e., no coordination among transactions is necessary. PA152, Vlastislav Dohnal, FI MUNI, 2024 45 5. General transactions ◼ Require coordination with other transactions read/write shared data; update data in more partitions PA152, Vlastislav Dohnal, FI MUNI, 2024 46 Concurrency control is needed Concurrency Control ◼ Run sterile, single-sited and one-shot transactions with no controls ◼ Other transactions with basic strategy can escalate to intermediate or advanced ◼ Timestamp ordering of all transactions PA152, Vlastislav Dohnal, FI MUNI, 2024 47 Concurrency Control ◼ Basic Strategy Coordinator sends tx subplans to “workers” Worker waits for “small period of time” ◼ to preserve timestamp order (network delay). Worker executes the subplan ◼ if there is not any uncommited, conflicting transaction ◼ otherwise aborts. Coordinators wait for “ok” from all sites and commits. PA152, Vlastislav Dohnal, FI MUNI, 2024 48 Concurrency Control ◼ Intermediate Strategy if there are too many aborts with basic one Increase wait latency in workers ◼ Advanced Strategy if there are too many aborts with intermediate == Optimistic concurrency control Tracks read and write sets of each tx on each site ◼ Aborts if a conflict between write and write is detected. PA152, Vlastislav Dohnal, FI MUNI, 2024 49 Database Layout ◼ Table replication  Read-only tables are on all sites  i.e., no communication → no latency ◼ Data partitioning  Horizontal partitioning into 4 partitions and 2 replicas  i.e., allow transaction execution in parallel ◼ K-safety of 2  Not enough RAM to replicate all tables  Every site is given a unique set of three partitions per table, thus preventing any pair of two sites from holding the only copies of a partition. PA152, Vlastislav Dohnal, FI MUNI, 2024 50 Performance comparison PA152, Vlastislav Dohnal, FI MUNI, 2024 51 Anti-caching (Durability) ◼ No logging is performed ◼ Cold data moved from RAM to disk In a transactional-safe way PA152, Vlastislav Dohnal, FI MUNI, 2024 52 Anti-caching ◼ Fine-grained eviction Like in virtual memory mgmt, pages are copied. Cold pages are written out. A single hot tuple marks the page (block) hot. ◼ Non-blocking fetches Abort transactions instead of waiting ◼ for an I/O operation PA152, Vlastislav Dohnal, FI MUNI, 2024 53 Anti-caching ◼ Non-blocking fetches: ◼ Abort transactions instead of waiting Reschedule them Occurs if a transaction needs to operate on a tuple on disk “pre-pass” tx to identify all evicted blocks. PA152, Vlastislav Dohnal, FI MUNI, 2024 54 H-store implementations ◼ Volt Active Data (VoltDB) ensure “five 9’s” uptime ◼ SAP HANA ◼ SingleStore (MemSQL) ◼ eXtremeDB PA152, Vlastislav Dohnal, FI MUNI, 2024 55 Lecture Takeaways ◼ Trends in DBMS with current HW ◼ Main bottlenecks in full ACID systems ◼ NewSQL as H-Store principle transaction classes durability PA152, Vlastislav Dohnal, FI MUNI, 2024 56