PA152: Efficient Use of DB
10. Schema Tuning
Vlastislav Dohnal
PA152, Vlastislav Dohnal, FI MUNI, 2022 2
Schema
 Relation schema
relation name and a list of attributes, their types
and integrity constraints
E.g.,
 Table student(uco, name, last_name, day_of_birth)
 Database schema
Schema of all relations
PA152, Vlastislav Dohnal, FI MUNI, 2022 3
Differences in Schema
 Same data organized differently
 Example of business requirements
Suppliers
 Address
Orders
 Part/product, quantity, supplier
PA152, Vlastislav Dohnal, FI MUNI, 2022 4
Differences in Schema
 Alternatives
Schema 1
 Order1(supplier_id, part_id, quantity,
supplier_address)
Schema 2
 Order2(supplier_id, part_id, quantity)
 Supplier(id, address)
 Differences
Schema 2 saves space.
Schema 1 may not keep address when there
is no order.
PA152, Vlastislav Dohnal, FI MUNI, 2022 5
Differences in Schema
 Performance trade-off
Frequent access to address of supplier given
an ordered part
  schema 1 is good (no need for join)
Many new orders
  schema 1 wastes space (address duplicates)
  relation will be stored in more blocks
PA152, Vlastislav Dohnal, FI MUNI, 2022 6
Theory of Good Schema
 Normal forms
1NF, 2NF, 3NF, Boyce-Codd NF, …
 Functional dependency
A  B
 B functionally depends on A
 Value of attr. B is determined if we know the value
of attr. A
 Let t, s be rows of a relation,
then t[A] = s[A]  t[B] = s[B]
PA152, Vlastislav Dohnal, FI MUNI, 2022 7
Theory of Good Schema
 Order1(supplier_id, part_id, quantity,
supplier_address)
 Functional dependency example:
supplier_id  supplier_address
supplier_id, part_id  quantity
PA152, Vlastislav Dohnal, FI MUNI, 2022 8
Theory of Good Schema
 K is a primary key
K  R
L  R for any L  K
 i.e., for each attribute A in R holds:
K  A and L  A
 Example
Supplier(id, address)
id  address
id is the (primary) key
PA152, Vlastislav Dohnal, FI MUNI, 2022 9
Theory of Good Schema
 Example
Order1(supplier_id, part_id, quantity,
supplier_address)
supplier_id  supplier_address
supplier_id, part_id  quantity
supplier_id, part_id is the primary key
PA152, Vlastislav Dohnal, FI MUNI, 2022 10
Schema Normalization
 1NF – all attributes are atomic
 2NF – all attributes depend on a whole
super-key
 3NF – all attributes depend directly on a
candidate key
no transitive dependency
 Normalization
= transformation to BCNF/3NF
PA152, Vlastislav Dohnal, FI MUNI, 2022 11
Schema Normalization
 A relation R is normalized if
every functional dependency X  A involving
attributes in R has the property that X is a
(super-)key.
 Example
Order1(supplier_id, part_id, quantity,
supplier_address)
 supplier_id  supplier_address
 supplier_id, part_id  quantity
Is not normalized
PA152, Vlastislav Dohnal, FI MUNI, 2022 12
Schema Normalization
 Example
Order2(supplier_id, part_id, quantity)
 supplier_id, part_id  quantity
Supplier(id, address)
 id  address
Schema is normalized
PA152, Vlastislav Dohnal, FI MUNI, 2022 13
Schema Normalization: Example
 Bank
Customer has an account
Customer has an address
Account is open at a branch of the bank
 Is relation normalized?
Bank(customer, account, address, branch)
PA152, Vlastislav Dohnal, FI MUNI, 2022 14
Schema Normalization: Example
 Relation
Bank(customer, account, address, branch)
customer  account
customer  address
account  branch
 Primary key is customer
Proven by functional dependencies…
 Relation is not normalized
There is a transitive dependency.
PA152, Vlastislav Dohnal, FI MUNI, 2022 15
Schema Normalization: Example
 Relation decomposition
Bank(customer, account, address, branch)
 customer  account
 customer  address
Account(account, branch)
 account  branch
Normalized now…
PA152, Vlastislav Dohnal, FI MUNI, 2022 16
Practical Schema Design
 Identify entities
Customer, supplier, order, …
 Each entity has attributes
Customer has an address, phone number, …
 There are two constraints on attributes:
1. An attribute cannot have attribute of its own (is
atomic).
2. The entity associated with an attribute must
functionally determine that attribute.
 A functional dependency for each non-key attribute.
PA152, Vlastislav Dohnal, FI MUNI, 2022 17
Practical Schema Design
 Each entity becomes a relation
 To these relations, add relations that
reflect relationships between entities
E.g., WorksOn(emp_id, project_id)
 Identify the functional dependencies
among all attributes and check that the
schema is normalized
If functional dependency AB  C, then ABC
should be part of the same relation.
PA152, Vlastislav Dohnal, FI MUNI, 2022 18
Vertical Partitioning
 Example: Telephone Provider
Customer entity has id, address and
remaining credit value.
 Deps:
 id  address
 id  credit
Normalized schema design
 Customer(id, address, credit)
 Or
 CustAddr(id, address)
 CustCredit(id, credit)
Which design is better?
PA152, Vlastislav Dohnal, FI MUNI, 2022 19
Vertical Partitioning
 Which design is better, depends on the
query pattern:
The application that sends a monthly
statement.
The credit is updated or examined several
times a day.
  The second schema might be better
Relation CustCredit is smaller
 Fewer blocks; may fit in main memory
  faster table/index scan
PA152, Vlastislav Dohnal, FI MUNI, 2022 20
Vertical Partitioning
 Single relation is better than two
if attributes are queried together
 no need for join
 Two relations are better if
Attributes queried separately (or some much
more often)
Attributes are large (long strings, …)
 Caveat: LOBs are stored apart of the relation.
Or some attributes are updated more often
than the others.
PA152, Vlastislav Dohnal, FI MUNI, 2022 21
Vertical Partitioning
 Another example
Customer has id and address (street, city, zip)
 Is this normalization convenient?
CustStreet(id, street)
CustCity(id, city, zip)
PA152, Vlastislav Dohnal, FI MUNI, 2022 22
Vertical Partitioning: Performance
 R(X,Y,Z) - X integer, Y and Z large strings
Performance depends on query pattern
0
0.005
0.01
0.015
0.02
No
Partitioning Query
XYZ
Vertical
Partitioning Query
XYZ
No
Partitioning Query
XY
Vertical
Partitioning Query
XY
Througput(queries/sec)Table-scan
No partitioning:
R(X,Y,Z)
Vert. part.:
R1(X,Y)
R2(X,Z)
SQLServer 2k
Windows 2k
PA152, Vlastislav Dohnal, FI MUNI, 2022 23
Vertical Partitioning: Performance
 R(X,Y,Z) - X integer, Y and Z long strings
Selection X=?, project XY or XYZ
0
200
400
600
800
1000
0 20 40 60 80 100
Throughput(queries/sec)
% of access that only concern XY
no vertical
partitioning
vertical
partitioning
Index Scan
Vert. part.
gives advantage if
proportion of
accessing XY is
greater than 25%.
Join requires 2
index accesses.
PA152, Vlastislav Dohnal, FI MUNI, 2022 24
Vertical Antipartitioning
 Start with normalized schema
 Add attributes of a relation to the other
 Example
Stock market (brokers)
 Price trends for last 3 000 trading days
 Broker’s decision based on last 10 day mainly
Schema
 StockDetail(stock_id, issue_date, company)
 StockPrice(stock_id, date, price)
PA152, Vlastislav Dohnal, FI MUNI, 2022 25
Vertical Antipartitioning
 Schema
StockDetail(stock_id, issue_date, company)
StockPrice(stock_id, date, price)
 Queries for all 10-day prices are
expensive
Even though there is an index on stock_id,
date
Join is needed for further information from
StockDetail
PA152, Vlastislav Dohnal, FI MUNI, 2022 26
Vertical Antipartitioning
 Replicate some data
 Schema
StockDetail(stock_id, issue_date, company,
price_today, price_yesterday, …,
price_10d_ago)
StockPrice(stock_id, date, price)
 Queries for all 10-day prices
1x index scan; no join
PA152, Vlastislav Dohnal, FI MUNI, 2022 27
Vertical Antipartitioning
 Disadvantage
Data replication
 Not high
 Can diminish by not storing in StockPrice
  queries for average price get complicated, …
PA152, Vlastislav Dohnal, FI MUNI, 2022 28
Tuning Denormalization
 Denormalization
violating normalization
for the sake of performance!
 Good for
Attributes from different normalized relations
are often accessed together
 Bad for
Updates are frequent
  locate “source” data to update replicas
PA152, Vlastislav Dohnal, FI MUNI, 2022 29
Tuning Denormalization
 Example (TPC-H)
 region(r_regionkey, r_name, r_comment)
 nation(n_nationkey, n_name, n_regionkey, n_comment)
 supplier(s_suppkey, s_name, s_address,
s_nationkey, s_phone, s_acctbal, s_comment)
 item(i_orderkey, i_partkey, i_suppkey, i_linenumber,
i_quantity, i_extendedprice, i_discount, i_tax,
i_returnflag, i_linestatus, i_shipdate, i_commitdate,
i_receiptdate, i_shipmode, i_comment)
 T(item) = 600 000
T(supplier) = 500, T(nation) = 25, T(region) = 5
 Query: Find items of European suppliers
PA152, Vlastislav Dohnal, FI MUNI, 2022 30
Tuning Denormalization
 Denormalization of item
 itemdenormalized (i_orderkey, i_partkey , i_suppkey,
i_linenumber, i_quantity, i_extendedprice,
i_discount, i_tax, i_returnflag, i_linestatus,
i_shipdate, i_commitdate, i_receiptdate,
i_shipmode, i_comment, i_regionname);
 600 000 rows
PA152, Vlastislav Dohnal, FI MUNI, 2022 31
Tuning Denormalization
 Queries:
SELECT i_orderkey, i_partkey, i_suppkey, i_linenumber,
i_quantity, i_extendedprice, i_discount, i_tax,
i_returnflag, i_linestatus, i_shipdate, i_commitdate,
i_receiptdate, i_shipinstruct, i_shipmode, i_comment, r_name
FROM item, supplier, nation, region
WHERE i_suppkey = s_suppkey AND s_nationkey = n_nationkey AND
n_regionkey = r_regionkey AND r_name = 'Europe';
SELECT i_orderkey, i_partkey, i_suppkey, i_linenumber,
i_quantity, i_extendedprice, i_discount, i_tax,
i_returnflag, i_linestatus, i_shipdate, i_commitdate,
i_receiptdate, i_shipinstruct, i_shipmode, i_comment, i_regionname
FROM itemdenormalized
WHERE i_regionname = 'Europe';
PA152, Vlastislav Dohnal, FI MUNI, 2022 32
Tuning Denormalization: Performance
 Query:
Find items of European suppliers
0.0000
0.0005
0.0010
0.0015
0.0020
normalized denormalized
Throughput(Queries/sec)
Normalized:
join of 4 relations
Denormalized:
one relation
54% perf. gain
Oracle 8i EE
Windows 2k
3x 18GB disk
(10 000 rpm)
54% gain
PA152, Vlastislav Dohnal, FI MUNI, 2022 33
Clustered Storage of Relations
 An alternative to denormalization
 Not always supported by DB system
 Oracle
Clustered storage of two relations
 Order(supplier_id, product_id, quantity)
 Supplier(id, address)
Storage
 Order records stored at the corresponding supplier
record
PA152, Vlastislav Dohnal, FI MUNI, 2022 34
Clustered Storage of Relations
 Example
 Order(supplier_id, product_id, quantity)
 Supplier(id, address)
10, Inter-pro.cz Hodonín
10, 235, 5
10, 545, 10
11, Unikov Bzenec
11, 123, 30
11, 234, 2
11, 648, 10
11, 956, 1
12, Školex Modřice
12, 12, 50
12, 34, 120
…
PA152, Vlastislav Dohnal, FI MUNI, 2022 35
Horizontal Partitioning
 Divides table by its rows
Vertical partitioning = by columns
 Motivation
Smaller volume of data to process
Rapid deletions
 Use
Data archiving
Spatial partitioning
…
PA152, Vlastislav Dohnal, FI MUNI, 2022 36
Horizontal Partitioning
 Automatically
Modern (commercial) DB systems
 MS SQL Server 2005 and later
 Oracle 9i and later, …
 PostgreSQL 10
 Manually
With DB support
 Query optimizer
Without DB support
PA152, Vlastislav Dohnal, FI MUNI, 2022 37
Horizontal Partitioning
 Query rewrites
Automatic partitioning
 No rewrites necessary
Manual partitioning
 With DB support
 No rewrites necessary
 Table inheritance / definition of views with UNION ALL
 Without DB support
 Manual query rewrite
 List of tables in FROM clause must be changed
PA152, Vlastislav Dohnal, FI MUNI, 2022 38
Horizontal Partitioning: SQL Server
 MS SQL Server 2005 and later
 Define partitioning function
 CREATE PARTITION FUNCTION
 Partitioning to intervals
 Define partitioning scheme
 CREATE PARTITION SCHEME
 Where to store data (what storage partitions)
 Create partitioned table
 CREATE TABLE … ON partitioning scheme
 Stored data are automatically split into partitions
 Create indexes
 CREATE INDEX
 Indexes are created on table partitions, i.e., automatically
partitioned
PA152, Vlastislav Dohnal, FI MUNI, 2022 39
Horizontal Partitioning: Oracle
 Oracle 9i and later
Partitioning by intervals, enums, hashing
 Composite partitioning supported
 Partitions split into subpartitions
Included in syntax of CREATE TABLE
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_7002.htm#i2129707
 PostgreSQL 10 and later
Partitioning by intervals, enums, hashing
 CREATE TABLE … ( … ) PARTITION BY RANGE
(…);
Horizontal Partitioning: MariaDB
 Part of SQL syntax, applies to indexes
 Types:
 hash, range, list; also double partitioning
 Limitation on UNIQUE constraints
 All columns used in the table's partitioning
expression must be part of every unique key the
table may have.
PA152, Vlastislav Dohnal, FI MUNI, 2022 40
CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE) ENGINE=MyISAM
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6
CREATE TABLE ti …
PARTITION BY RANGE (MONTH(tr_date)) (
PARTITION spring VALUES LESS THAN (4),
PARTITION summer VALUES LESS THAN (7),
PARTITION fall VALUES LESS THAN (10),
PARTITION winter VALUES LESS THAN MAXVALUE );
Including primary key
PA152, Vlastislav Dohnal, FI MUNI, 2022 41
Horizontal Partitioning: PostgreSQL
 PostgreSQL 8.2 and later
Partitioning by intervals, enums
 Principle (http://www.postgresql.org/docs/current/static/ddl-partitioning.html)
Table inheritance
 Create a base table
 No data stored, no indexes, …
 Individual partitions are inherited tables
 For each table, a CHECK constraint to limit data is defined
 Create necessary indexes
Disadvantage: ref. integrity cannot be used
PA152, Vlastislav Dohnal, FI MUNI, 2022 42
Horizontal Partitioning: PostgreSQL
 Principle
Inserting records
 Inserted into base table
 Insert rules defined on the base table
 Insertion to the “newest” partition only  one RULE
 In general, one rule per partition is defined
 Triggers can be used too…
In case views are used,
 Define INSTEAD OF triggers
PA152, Vlastislav Dohnal, FI MUNI, 2022 43
Horizontal Partitioning: PostgreSQL
 Example in xdohnal schema (db.fi.muni.cz)
Not partitioned table account
 Primary key id
 R(account) = 200 000
 V(account,home_city) = 5
Partitioned table account_parted
 by home_city (5 partitions)
 Partitions: account_parted1 .. account_parted5
home_city | count
home_city1 | 40020
home_city2 | 40186
home_city3 | 39836
home_city4 | 39959
home_city5 | 39999
PA152, Vlastislav Dohnal, FI MUNI, 2022 44
Horizontal Partitioning: PostgreSQL
 Statistics
Table Rows Sizes Indexes
account 200 000 41 984 kB 4 408 kB
account_parted 0 0 kB 8 kB
account_parted1 40 020 8 432 kB 896 kB
account_parted2 40 186 8 464 kB 896 kB
account_parted3 39 836 8 392 kB 888 kB
account_parted4 39 959 8 416 kB 896 kB
account_parted5 39 999 8 424 kB 896 kB
Totals: 200 000 42 128 kB 4 472 kB
PA152, Vlastislav Dohnal, FI MUNI, 2022 45
Horizontal Partitioning: PostgreSQL
 Query optimizer
Allow checking constraint on partitions
 Queries (compare execution plans)
select * from account where id=8;
select * from account_parted where id=8;
select count(*) from account where home_city='home_city1';
select count(*) from account_parted where home_city='home_city1';
select * from account where home_city='home_city1' and id=8;
select * from account_parted where home_city='home_city1' and id=8;
set constraint_exclusion=on;
Transaction Tuning
 Application view on a transaction
It runs isolated – without any concurrent
activity.
 Database view on a transaction
Atomic and consistent change of data; many
can be run concurrently.
So, correctness of result must be ensured.
PA152, Vlastislav Dohnal, FI MUNI, 2022 46
Transaction Concurrency
 Two transactions are concurrent if their
executions overlap in time.
Can happen on a single thread/processor too,
e.g., one waiting for I/O to complete.
 Concurrency control
Control activity of transactions and make the
result appear as equivalent of serial
execution.
Typically achieved by mutual exclusion
 E.g., semaphore
PA152, Vlastislav Dohnal, FI MUNI, 2022 47
Transaction Concurrency
 A semaphore on entire database (one
transaction at a time)
Good for in-memory databases.
 Locking mechanism is good for secondary
memory databases.
Read (shared) locks and write (exclusive)
locks.
Record level and relation (table) level
PA152, Vlastislav Dohnal, FI MUNI, 2022 48
Concurrency through locking
 Rules
1. A transaction must hold a lock on x before
accessing it.
2. A transaction must not acquire a lock on any
item y after releasing a lock on any item x.
 This ensures correctness
no update can be made to data read (and
locked) by someone else.
PA152, Vlastislav Dohnal, FI MUNI, 2022 49
Duration of Transaction
 Duration effects on performance
More locks a transaction requests, more likely
it is that it will wait for some other transaction
to finish.
The longer T executes, the longer some other
transaction may wait if it is blocked by T.
 In operational DBs, shorter transactions
are preferred.
since updates are frequent
PA152, Vlastislav Dohnal, FI MUNI, 2022 50
Transaction Design
 Avoid user-interaction during a transaction
 Lock only what you need
 E.g., do not filter recs in an app
 Chop transaction
 E.g., T accesses x and y. Any other T’ accesses
at most one of x or y and nothing else. T can be
divided into two transaction (each modifying x
and y separately).
 Weaken isolation level
 Many DBMSes default to releasing read locks on
completing the read IO.
PA152, Vlastislav Dohnal, FI MUNI, 2022 51
Levels of Isolation
 Serializable
 Repeatable read
Phantom reads (newly inserted recs)
 Read committed
Non-repeatable reads (a transaction has
committed an update)
 Read uncommitted
Dirty reads (non-committed recs); writes are
still atomic
 No locking
PA152, Vlastislav Dohnal, FI MUNI, 2022 52
Query Tuning: Takeaways
 Five basic principles
Think globally; fix locally
Break bottlenecks by partitioning
 transactions, relations, also more HW ((-:
Start-up costs are high; running costs are low
 E.g., it is expensive to begin a read operation on a
disk.
Render unto server what is due unto server
Be prepared for trade-offs
PA152, Vlastislav Dohnal, FI MUNI, 2022 53