PA152: Efficient Use of DB
12. New SQL
Vlastislav Dohnal
Credits
◼ This presentation is based on:
 S. Harizopoulos et al: OLTP Through the Looking
Glass, and What We Found There. SIGMOD, 2008
 M. Stonebraker et al: H-Store: The End of an
Architectural Era. VLDB, 2007
 J. DeBradant et al: Anti-Caching: A New Approach to
Database Management System Architecture. VLDB,
2013.
PA152, Vlastislav Dohnal, FI MUNI, 2024 2
Contents
◼ Features of OLTP
◼ Trends in OLTP
◼ Performance study of individual bottlenecks
◼ H-store
PA152, Vlastislav Dohnal, FI MUNI, 2024 3
Main Features of OLTP
◼ Buffer Management
to facilitate data transfer between memory
and disk
◼ B-Tree for on-disk data storage
◼ Logging for recovery
◼ Locking to support concurrency
◼ Latching for accessing shared data
structure
PA152, Vlastislav Dohnal, FI MUNI, 2024 4
Motivation
◼ Is the OLTP database optimized
nowadays, given the hardware
advancement?
◼ Request from outside the DB community
for alternative DB architecture
PA152, Vlastislav Dohnal, FI MUNI, 2024 5
Motivation: Hardware advancement
In 1980s Nowadays
HW cost In millions Few thousands
Storage size DB size >>
Memory
Memory >
DB size
Processing time
for most of the
transactions
\ In microseconds
PA152, Vlastislav Dohnal, FI MUNI, 2024 6
Motivation: Request from outside
◼ “Database-like” storage system proposal
from Operating System and networking
conference
varying forms of
◼ concurrency,
◼ consistency,
◼ reliability,
◼ replication,
◼ queryability.
PA152, Vlastislav Dohnal, FI MUNI, 2024 7
Trends in OLTP
1. Cluster computing
2. Memory resident databases
Data in OLTP doesn’t grow as fast as memory
size.
3. Single threading
4. High availability vs. Logging
5. Transaction Variants
PA152, Vlastislav Dohnal, FI MUNI, 2024 8
Trend 3: Single Threading
◼ A step backward from multithread to single
thread?
◼ Why multithreading?
Prevent idle of CPU while waiting data from
disk
Prevent long-running transactions from
blocking short transaction
◼ Not valid for memory resident DB
No disk wait
Long-running transactions run in warehouse
PA152, Vlastislav Dohnal, FI MUNI, 2024 9
Trend 3: Single Threading
◼ What about multi processors?
Dynamic locking was experimentally the best
concurrency control with disk.
What concurrency control protocol is best?
◼ Goal: Achieve shared-nothing processor
by virtual machine
So, concurrency control code gets removed.
PA152, Vlastislav Dohnal, FI MUNI, 2024 10
Trend 3: Single Threading
◼ What about network disk?
Feasible to partition transaction to run in
“single-site”.
Intra-query parallelism: each processor
running on a part of a single query.
PA152, Vlastislav Dohnal, FI MUNI, 2024 11
Trend 4: HA vs. Logging
◼ 24x7 service achieved by using multiple
sets of hardware.
◼ Perform recovery by copying missing
states from other database replicas.
Log for recovery can be avoided.
PA152, Vlastislav Dohnal, FI MUNI, 2024 12
Production Server Standby Server
Recovery from
Trend 5: Transaction Variants
◼ Why transaction variants?
2-phase commit protocol harm performance
of large-scale distributed DB system
2-phase commit involves commit-request and
commit phase which involves all server to
participate.
PA152, Vlastislav Dohnal, FI MUNI, 2024 13
Trend 5: Transaction Variants
◼ Trade consistency for performance
◼ Eventual consistency, all writes propagate
among the database servers.
But not immediately.
PA152, Vlastislav Dohnal, FI MUNI, 2024 14
Trend in OLTP - Summary
PA152, Vlastislav Dohnal, FI MUNI, 2024 15
Impact on DBMS
◼ (1) memory resident DB can get rid of
buffer management
Buffers
PA152, Vlastislav Dohnal, FI MUNI, 2024 16
Impact on DBMS
◼ (2) single thread can avoid locking and
latching
PA152, Vlastislav Dohnal, FI MUNI, 2024 17
Impact on DBMS
◼ (3) cluster computing helps avoid locking.
Instead of single processor and multithreading,
each processor is responsible for each own
thread.
PA152, Vlastislav Dohnal, FI MUNI, 2024 18
Impact on DBMS
◼ (4) high availability without replication mgr.
Active-passive replication scheme (log shipping)
1. Replica may not be consistent with the primary
 unless on two-phase commit protocol
2. Failover in not instantaneous
3. Log is required
 It takes about 20% of CPU cycles.
Active-active replication scheme with
transactions
◼ Two-phase commit introduces large latency for
distributed replication yet.
PA152, Vlastislav Dohnal, FI MUNI, 2024 19
Impact on DBMS
◼ (5) being “transaction less” avoids book
keeping, i.e., logging.
◼ (5+) Cache-conscious B-Trees
Cache misses in the B-tree code may well be
the new bottleneck for the stripped-down
system.
◼ Related to utilization of the first-level data cache of
the CPU.
PA152, Vlastislav Dohnal, FI MUNI, 2024 20
TPC-C Benchmark
◼ TPC-C is industry standard used to
measure ecommerce performance
◼ TPC-C is designed to represent any
industry that must manage, sell, or
distribute a product or service
◼ Vendors includes Microsoft, Oracle, IBM,
Sybase, Sun, HP, DELL etc.
http://www.tpc.org/tpcc/default.asp
PA152, Vlastislav Dohnal, FI MUNI, 2024 21
TPC-C Benchmark
◼ 1 warehouse(~100M) serves 10 districts,
and each district serve 3000 customers.
PA152, Vlastislav Dohnal, FI MUNI, 2024 22
TPC-C Benchmark
◼ 5 concurrent business transactions
New Order Transaction
Payment
Deliver Order
Check status of Order
Monitor Stock Level of warehouse
PA152, Vlastislav Dohnal, FI MUNI, 2024 23
Experiment setup
◼ 40,000 transactions run for types
New Order Transaction and Payment
◼ Results measured in
Throughput (Time, Transactions completed)
Instruction count
◼ Single-core Pentium 4, 3.2GHz, with 1MB
L2 cache, hyper threading disabled, 1GB
RAM, running Linux 2.6.
PA152, Vlastislav Dohnal, FI MUNI, 2024 24
Effect of removing components (1)
◼ Payment transaction:
PA152, Vlastislav Dohnal, FI MUNI, 2024 25
Effect of removing components (2)
PA152, Vlastislav Dohnal, FI MUNI, 2024 26
Effect of removing components (3)
PA152, Vlastislav Dohnal, FI MUNI, 2024 27
Effect of removing components (4)
PA152, Vlastislav Dohnal, FI MUNI, 2024 28
Effect of removing components (5)
PA152, Vlastislav Dohnal, FI MUNI, 2024 29
Effect of removing components (6)
◼ Instructions of useful work is only <2% of a
memory resident DB
PA152, Vlastislav Dohnal, FI MUNI, 2024 30
Effect of removing components (7)
◼ The same for New Order Transaction
PA152, Vlastislav Dohnal, FI MUNI, 2024 31
Effect of removing components (8)
◼ Comparison of CPU instructions and cycles
New order transaction
PA152, Vlastislav Dohnal, FI MUNI, 2024 32
remaining overhead
Experiment Results
◼ Memory resident DBMS
640 transactions per second.
◼ Stripped-down DBMS
12,700 transactions per second.
◼ Stripped-down DBMS gave a 20 times
improvement in throughput
PA152, Vlastislav Dohnal, FI MUNI, 2024 33
Conclusion
◼ Most significant overhead contributors
buffer management and
locking operation,
followed by logging and latching.
◼ A fully stripped-down system’s
performance is orders of magnitude better
than an unmodified system.
“One size fits all” DBMSes excel at nothing
◼ Need for specialized databases and languages
PA152, Vlastislav Dohnal, FI MUNI, 2024 34
Conclusion
◼ Welcome to NewSQL
PA152, Vlastislav Dohnal, FI MUNI, 2024 35
NewSQL DBMS
◼ Highly concurrent, latch-free data
structures
◼ Partitioning into single-threaded executors
◼ H-store
Distributed, shared-nothing, main mem DBMS
Row-store based relational DBMS
PA152, Vlastislav Dohnal, FI MUNI, 2024 36
ZDNet interview, Feb. 2008
PA152, Vlastislav Dohnal, FI MUNI, 2024 37
H-Store
◼ Logging overhead
Replication for recovery → no redo log
Transient undo log sufficient for tx rollback
◼ Transaction classes
Optimize concurrency control protocols
◼ Incremental scalability
Shared nothing architecture
◼ Remove knobs/tuning parameters
Personnel costs higher than machine costs
Automatic physical database design
PA152, Vlastislav Dohnal, FI MUNI, 2024 38
H-Store Architecture
PA152, Vlastislav Dohnal, FI MUNI, 2024 39
H-Store Cluster
◼ Cluster = multiple computers (nodes)
◼ Node = multi-core CPUs, RAM
hosts multiple sites
◼ Site = process of H-Store
dedicated CPU core and RAM, data partition
PA152, Vlastislav Dohnal, FI MUNI, 2024 40
Node A Node B
…
Transaction Classes
1. Single-sited transactions
2. One-shot transactions
3. Two-phase transactions
4. Sterile transactions
5. General transactions
PA152, Vlastislav Dohnal, FI MUNI, 2024 41
1. Single-sited transactions
◼ All queries hit the same partition
◼ Constrained Tree Schemas
Root table can be horizontally hash-partitioned
Collocate corresponding shards of child tables
No communication between partitions
PA152, Vlastislav Dohnal, FI MUNI, 2024 42
2. One-shot transactions
◼ No inter-query dependencies
◼ Execute in parallel without communication
Replicate read-only parts
Vertical partitioning
Can be decomposed into single-sited plans
Local decisions → No redo log required
PA152, Vlastislav Dohnal, FI MUNI, 2024 43
3. Two-phase class
◼ Two-phase classes
Phase 1: Read-only operations
Phase 2: Updates cannot violate integrity
No undo log required
PA152, Vlastislav Dohnal, FI MUNI, 2024 44
4. Sterile classes
◼ Sterile classes
Operate independently
Do not depend on results / state of other
concurrent transactions
No concurrency control needed
◼ i.e., no coordination among transactions is necessary.
PA152, Vlastislav Dohnal, FI MUNI, 2024 45
5. General transactions
◼ Require coordination with other transactions
read/write shared data;
update data in more partitions
PA152, Vlastislav Dohnal, FI MUNI, 2024 46
Concurrency
control
is needed
Concurrency Control
◼ Run sterile, single-sited and one-shot
transactions with no controls
◼ Other transactions with basic strategy
can escalate to intermediate or advanced
◼ Timestamp ordering of all transactions
PA152, Vlastislav Dohnal, FI MUNI, 2024 47
Concurrency Control
◼ Basic Strategy
Coordinator sends tx subplans to “workers”
Worker waits for “small period of time”
◼ to preserve timestamp order (network delay).
Worker executes the subplan
◼ if there is not any uncommited, conflicting transaction
◼ otherwise aborts.
Coordinators wait for “ok” from all sites and
commits.
PA152, Vlastislav Dohnal, FI MUNI, 2024 48
Concurrency Control
◼ Intermediate Strategy
if there are too many aborts with basic one
Increase wait latency in workers
◼ Advanced Strategy
if there are too many aborts with intermediate
== Optimistic concurrency control
Tracks read and write sets of each tx on each
site
◼ Aborts if a conflict between write and write is detected.
PA152, Vlastislav Dohnal, FI MUNI, 2024 49
Database Layout
◼ Table replication
 Read-only tables are on all sites
 i.e., no communication → no latency
◼ Data partitioning
 Horizontal partitioning into 4 partitions and 2
replicas
 i.e., allow transaction execution in parallel
◼ K-safety of 2
 Not enough RAM to replicate all tables
 Every site is given a unique set of three partitions
per table, thus preventing any pair of two sites
from holding the only copies of a partition.
PA152, Vlastislav Dohnal, FI MUNI, 2024 50
Performance comparison
PA152, Vlastislav Dohnal, FI MUNI, 2024 51
Anti-caching (Durability)
◼ No logging is performed
◼ Cold data moved from RAM to disk
In a transactional-safe way
PA152, Vlastislav Dohnal, FI MUNI, 2024 52
Anti-caching
◼ Fine-grained eviction
Like in virtual memory mgmt, pages are
copied.
Cold pages are written out.
A single hot tuple marks the page (block) hot.
◼ Non-blocking fetches
Abort transactions instead of waiting
◼ for an I/O operation
PA152, Vlastislav Dohnal, FI MUNI, 2024 53
Anti-caching
◼ Non-blocking
fetches:
◼ Abort transactions
instead of waiting
Reschedule them
Occurs if a transaction needs to operate
on a tuple on disk
“pre-pass” tx to identify all evicted blocks.
PA152, Vlastislav Dohnal, FI MUNI, 2024 54
H-store implementations
◼ Volt Active Data (VoltDB)
ensure “five 9’s” uptime
◼ SAP HANA
◼ SingleStore (MemSQL)
◼ eXtremeDB
PA152, Vlastislav Dohnal, FI MUNI, 2024 55
Lecture Takeaways
◼ Trends in DBMS with current HW
◼ Main bottlenecks in full ACID systems
◼ NewSQL as H-Store
principle
transaction classes
durability
PA152, Vlastislav Dohnal, FI MUNI, 2024 56