PA036: DB Project 1. PostgreSQL Vlastislav Dohnal Relational DBMS ◼ PostgreSQL open-source relational database management system ver. 16 – docs ◼ Accessing DB a command-line client: psql ◼ Benchmarking DB a command-line tool: pgbench PA036, Vlastislav Dohnal, FI MUNI, 2024 2 Query Plan ◼ command EXPLAIN shows how a database will execute a query PA036, Vlastislav Dohnal, FI MUNI, 2024 3 EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000; QUERY PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244) Filter: (unique1 < 7000) EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100; QUERY PLAN ----------------------------------------------------------------- Bitmap Heap Scan on tenk1 (cost=5.07..229.20 rows=101 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) Query Plan (cont.) PA036, Vlastislav Dohnal, FI MUNI, 2024 4 EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ----------------------------------------------------------------------------- Hash Join (cost=230.47..713.98 rows=101 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=229.20..229.20 rows=101 width=244) -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 < 100) Query Plan (cont.) ◼ Add analyze to show real execution times PA036, Vlastislav Dohnal, FI MUNI, 2024 5 EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2; QUERY PLAN ------------------------------------------------------------------------------ Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1) -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1) Index Cond: (unique2 > 9000) Filter: (unique1 < 100) Rows Removed by Filter: 287 Planning time: 0.096 ms Execution time: 0.336 ms It may signal inefficiency of filter. Planner may show discrepancy in the number of rows If planner is wrong in estimates, try to update statistics – vacuum command. Query Plan (cont.) ◼ Beware of data modifying queries under inspection -> use transactions PA036, Vlastislav Dohnal, FI MUNI, 2024 6 BEGIN; EXPLAIN ANALYZE UPDATE tenk1 SET hundred = hundred + 1 WHERE unique1 < 100; QUERY PLAN ------------------------------------------------------------------------------ Update on tenk1 (cost=5.08..230.08 rows=0 width=0) (actual time=3.791..3.792 rows=0 loops=1) -> Bitmap Heap Scan on tenk1 (cost=5.08..230.08 rows=102 width=10) (actual time=0.069..0.513 rows=100 loops=1) Recheck Cond: (unique1 < 100) Heap Blocks: exact=90 -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.05 rows=102 width=0) (actual time=0.036..0.037 rows=300 loops=1) Index Cond: (unique1 < 100) Planning Time: 0.113 ms Execution Time: 3.850 ms ROLLBACK; Query Plan (cont.) ◼ Add buffers – shows how much data was read PA036, Vlastislav Dohnal, FI MUNI, 2024 7 EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) (actual time=0.323..0.342 rows=10 loops=1) Recheck Cond: ((unique1 < 100) AND (unique2 > 9000)) Buffers: shared hit=15 -> BitmapAnd (cost=25.08..25.08 rows=10 width=0) (actual time=0.309..0.309 rows=0 loops=1) Buffers: shared hit=7 -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1) Index Cond: (unique1 < 100) Buffers: shared hit=2 -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) (actual time=0.227..0.227 rows=999 loops=1) Index Cond: (unique2 > 9000) Buffers: shared hit=5 Planning time: 0.088 ms Execution time: 0.423 ms PA152, Vlastislav Dohnal, FI MUNI, 2023 8 Query Rewriting ◼ Modifying the query text to eliminate unnecessary operations  Keeping the result identical!!! ◼ Techniques  Use of indexes ◼ attribute selection, inclusion of attributes, clustering the table by an index  Elimination unnecessary ops (DISTINCT, GROUP BY, …)  (Correlated) subqueries ◼ changing subquery to a join, “with” clause  Temporary tables  Incorrect use of having  Materialized views and its maintenance PA152, Vlastislav Dohnal, FI MUNI, 2023 9 Statistics in PostgreSQL ◼ Relation hotel PA152, Vlastislav Dohnal, FI MUNI, 2023 10 Statistics in PostgreSQL ◼ Attribute hotel.id ◼ Attribute hotel.name PA152, Vlastislav Dohnal, FI MUNI, 2023 11 Statistics in PostgreSQL ◼ Attribute hotel.state ◼ Attribute hotel.distance_to_center Indexes in PostgreSQL ◼ Index types in docs  B-tree, Hash  GiST, SP-GiST ◼ for several two-dimensional geometric data types, ◼ supports “nearest neighbors” queries  GIN ◼ inverted file, for indexing arrays  BRIN (Block Range Index) ◼ stores summaries about the values stored in consecutive physical block ranges of a table, good for values well-correlated with the physical order of the table rows PA036, Vlastislav Dohnal, FI MUNI, 2024 12 Indexes in PostgreSQL (cont.) ◼ Multicolumn indexes Mind the order of attributes ◼ Expression based indexes Evaluate a function to be stored in the index ◼ Partial indexes Index only a subset of rows PA036, Vlastislav Dohnal, FI MUNI, 2024 13 ◼ That’ all, folks. PA036, Vlastislav Dohnal, FI MUNI, 2024 14