PA152: Efficient Use of DB 13. Advanced Topics sequences, security, spatial indexes Vlastislav Dohnal PA152, Vlastislav Dohnal, FI MUNI, 2024 2 Credits ◼ Materials are based on presentations: Courses CS245, CS345, CS345 ◼ Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom ◼ Stanford University, California  Course CS145 following the book ◼ Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom: Database Systems: The Complete Book  Book ◼ Andrew J. Brust, Stephen Forte: Mistrovství v programování SQL Serveru 2005  MSDN library by Microsoft PA152, Vlastislav Dohnal, FI MUNI, 2024 3 Contents ◼ Generating IDs ◼ DB security Access control in DB Stored procedures Attacking DBMS ◼ Spatial data Data types, indexing PA152, Vlastislav Dohnal, FI MUNI, 2024 4 Generating PK values ◼ Typically, a sequence of numbers  Increasing monotonically ◼ Example:  student(učo, first_name, last_name) ◼ Ad-hoc solution 1: ◼ Getting current maximum maxučo := SELECT max(učo) FROM student; ◼ Incrementing and using in new record INSERT INTO student VALUES (maxučo+1, ‘Mad’, ‘Max’);  Disadvantage: ◼ Concurrent use → duplicate values PA152, Vlastislav Dohnal, FI MUNI, 2024 5 Generating PK values ◼ Ad-hoc solution 2: Combining INSERT and SELECT in a statement INSERT INTO student VALUES ( (SELECT max(učo) FROM student)+1, ‘Mad’, ‘Max’); Updates to index are atomic ◼ Looks promising…. ◼ Nested select may be evaluated on “stale data” Duplicate values are less probable. ◼ Improved performance only  i.e., sending one statement to DB PA152, Vlastislav Dohnal, FI MUNI, 2024 6 Generating PK values ◼ Ad-hoc solution 2: Concurrency Issues Always in transaction Depends on the way of locking DB uses: ◼ SELECT locks data (but request exclusive lock)  Others are blocked  Locks are always released after commit ◼ INSERT ◼ → values are correct (no dups), but others are waiting PA152, Vlastislav Dohnal, FI MUNI, 2024 7 Generating PK values ◼ Ad-hoc solution 3: Auxiliary table keys(table VARCHAR, id INTEGER) 1. UPDATE keys SET id=id+1 WHERE table=‘student’; 2. newid := SELECT id FROM keys WHERE table=‘student’;  Or one statements: newid := UPDATE keys SET id=id+1 WHERE table=‘student’ RETURNING id; 3. INSERT INTO student VALUES (newid , ‘Mad’, ‘Max’); PA152, Vlastislav Dohnal, FI MUNI, 2024 8 Generating PK values ◼ Ad-hoc solution 3: Inconvenience in concurrency when in transaction: ◼ UPDATE locks the record in keys ◼ Locks get released after commit (after INSERT) ◼ → values are correct (no dups), but others are waiting Advantage: ◼ If combined with Solution 1  i.e., two consecutive transactions ◼ → values are correct (no dups) and nobody is blocked! PA152, Vlastislav Dohnal, FI MUNI, 2024 9 Generating PK values ◼ Recommended to use DB tools Data types ◼ PostgreSQL: SERIAL, BIGSERIAL ◼ SQLServer: IDENTITY Sequences ◼ Oracle, PostgreSQL Toggle at attribute ◼ MySQL ◼ Support for getting last generated number Good for inserting to tables with foreign keys ◼ E.g., inserting first item into e-shopping basket  Creating a new basket & inserting goods PA152, Vlastislav Dohnal, FI MUNI, 2024 10 Generating PK values ◼ CREATE SEQUENCE … Numeric sequence generator Is parameterized: ◼ Min / max value, cyclic ◼ Functions in PostgreSQL Nextval – generate new value Currval – get last generated value Can be imbedded in INSERT ◼ INSERT INTO table_name VALUES (nextval(‘sequence_name’), …); PA152, Vlastislav Dohnal, FI MUNI, 2024 11 Generating PK values: Performance ◼ Example for Solution 3:  accounts(number, branchnum, balance); ◼ Clustered index on number  counter(nextkey); ◼ One record with value 1 ◼ For generating values of id by Solution 3 ◼ Configuration:  Transaction isolation: READ COMMITTED ◼ Only committed data are visible.  Dual Xeon (550MHz,512Kb), 1GB RAM, RAID controller, 4x 18GB drives (10000RPM), Windows 2000. PA152, Vlastislav Dohnal, FI MUNI, 2024 12 Generating PK values: Performance ◼ Batch of 100 000 insertions into accounts ◼ Generating ID values:  DB support: ◼ SQLServer 7 (identity)  insert into accounts (branchnum, balance) values (94496, 2789); ◼ Oracle 8i (sequence)  insert into accounts values (seq.nextval, 94496, 2789);  Solution 3: begin transaction update counter set nextkey = nextKey+1; :nk := select nextkey from counter; commit transaction begin transaction insert into accounts values( :nk, 94496, 2789); commit transaction PA152, Vlastislav Dohnal, FI MUNI, 2024 13 Generating PK values 0 200 400 600 800 1000 1200 1400 0 10 20 30 40 50 Throughput (statements/sec) Number of concurrent insertion threads SQLServer system ad-hoc 0 50 100 150 200 250 300 0 10 20 30 40 50 Throughput(statements/sec) Number of concurrent insertion threads Oracle system ad-hoc ◼ X axis: Increasing number of parallel insertions ◼ DB tools outperforms ad-hoc solution. PA152, Vlastislav Dohnal, FI MUNI, 2024 14 Generating PK values ◼ PostgreSQL CREATE TABLE product ( id SERIAL PRIMARY KEY, title VARCHAR(10) ); Internal implementation ◼ Create new sequence  product_id_seq ◼ Attribute id has defaults value:  nextval(‘product_id_seq’) PA152, Vlastislav Dohnal, FI MUNI, 2024 15 Generating PK values ◼ PostgreSQL (hand-crafted)  CREATE SEQUENCE product_id_seq;  CREATE TABLE product ( id INT PRIMARY KEY DEFAULT nextval(‘product_id_seq’), title VARCHAR(10) ); ◼ Usage:  INSERT INTO product (title) VALUES (‘Coil’);  INSERT INTO product (id, title) VALUES (DEFAULT, ‘Coil’); PA152, Vlastislav Dohnal, FI MUNI, 2024 16 Contents ◼ Generating IDs ◼ DB security Access control in DB Stored procedures Attack on DB ◼ Spatial data Data types, indexing PA152, Vlastislav Dohnal, FI MUNI, 2024 17 Access Control – Authorization ◼ Analogy to file systems Objects ◼ File, directory, … Subject ◼ Typically: owner, group, others (all users) Access Right ◼ Defined on an object O for a subject S ◼ Typically: read, write, execute PA152, Vlastislav Dohnal, FI MUNI, 2024 18 Privileges ◼ Database systems  Typically, finer granularity than the typical file system  Access rights vary for objects ◼ Tables, views, procedures, sequences, schema, database, …  Views are an important tool for access control  Subjects are typically user and group ◼ Often referred as authorization id or role ◼ Subject “others“ is denoted as PUBLIC  Granting access for PUBLIC means allowing access to anyone. PA152, Vlastislav Dohnal, FI MUNI, 2024 19 Privileges ◼ For relations/tables: SELECT ◼ Query the table’s content (i.e., list rows) ◼ Sometimes can be limited to selects attributes INSERT ◼ Sometimes can be limited to selects attributes DELETE UPDATE ◼ Sometimes can be limited to selects attributes REFERENCES ◼ Create foreign keys referencing this table PA152, Vlastislav Dohnal, FI MUNI, 2024 20 We add beers that do not appear in Beers; leaving manufacturer NULL. Privileges ◼ Example INSERT INTO Beers(name) SELECT beer FROM Sells WHERE NOT EXISTS (SELECT * FROM Beers WHERE name = beer); Requirements for privileges: ◼ INSERT on the table Beers ◼ SELECT on Sells and Beers PA152, Vlastislav Dohnal, FI MUNI, 2024 21 Privileges ◼ Views as Access Control Relation ◼ Employee(id, name, address, salary) Want to make salary confidential: ◼ CREATE VIEW EmpAddress AS SELECT id, name, address FROM Employee; ◼ Privileges:  Grant SELECT on EmpAddress  Revoke SELECT from table Employee PA152, Vlastislav Dohnal, FI MUNI, 2024 22 Privileges ◼ Granting privileges GRANT ON TO ; ◼ You may also grant “grant privilege” By appending clause “WITH GRANT OPTION“ ◼ GRANT SELECT ON TABLE EmpAddress TO joe WITH GRANT OPTION PA152, Vlastislav Dohnal, FI MUNI, 2024 23 Privileges ◼ Example (to be run as owner of sells) GRANT SELECT, UPDATE(price) ON sells TO sally; ◼ User sally can Read (select) from table sells Update values in attribute price PA152, Vlastislav Dohnal, FI MUNI, 2024 24 Privileges ◼ Example (to be run as owner of sells) GRANT UPDATE ON sells TO sally WITH GRANT OPTION; ◼ User sally can Update values of any attribute in sells Grant access to other users ◼ Only UPDATE can be granted but can be limited to some attributes. PA152, Vlastislav Dohnal, FI MUNI, 2024 25 Privileges – Diagram ◼ Diagram depict privileges granted by a grantor to a grantee  Each object has its diagram  Node is specified by 1. Role (user / group) 2. Granted privilege 3. Flag of ownership or granting option  Edge from X to Y ◼ Role X has granted the privilege to role Y. root,all,** joe,INSERT,* eva,INSERT, * eva, INSERT** ownership, * grant option Table T PA152, Vlastislav Dohnal, FI MUNI, 2024 26 Privileges – Diagram ◼ „root,all “ denotes  user root has privilege all. ◼ Privilege „all“ on table means  = insert, update, delete, select, references ◼ Grant option “*“  The privilege can by granted by the user ◼ Option “**“  Object owner (root node of each diagram) ◼ Object owner  All is granted by default  Can pass the privileges to other users PA152, Vlastislav Dohnal, FI MUNI, 2024 27 Privileges – Testing for Access ◼ DBMS grants User C the privilege Q as long as there is a path from XP** to OP, OP* or OP**. where ◼ P is a superprivilege of Q or the same as Q, and ◼ O = C or C is a member of group O root,all,** joe,INSERT,* eva,INSERT, * eva, INSERT** ownership, * grant option Table T joe$ SELECT * FROM T WHERE A=5; PA152, Vlastislav Dohnal, FI MUNI, 2024 28 Privileges ◼ Revoking statement REVOKE ON FROM ; ◼ Can listed users no longer use the privileges? But they may still have the privilege → because they obtained it independently from elsewhere. ◼ Or they are members of a group or PUBLIC is applied PA152, Vlastislav Dohnal, FI MUNI, 2024 29 Privileges ◼ Revoking privileges  Appending to REVOKE statement: ◼ CASCADE – Now, any grants made by a revokee are also not in force, no matter how far the privilege was passed ◼ RESTRICT (implicit)  If the privilege has been passed to others, the REVOKE fails as a warning  So, something else must be done to “chase the privilege down.”  REVOKE GRANT OPTION FOR [select, update, …] … ◼ Removes only the “grant option”.  Omitting this prefix leads to removing the privilege and also the grant option! PA152, Vlastislav Dohnal, FI MUNI, 2024 30 Contents ◼ Generating IDs ◼ DB security Access control in DB Stored procedures Attack on DB ◼ Spatial data Data types, indexing PA152, Vlastislav Dohnal, FI MUNI, 2024 31 Stored Procedures ◼ User-defined program implementing an activity E.g., factorial computation, distance between GPS coords, inserting rows to multiple tables, … ◼ PostgreSQL CREATE FUNCTION name ([parameters,…]) [RETURNS type] …code… PA152, Vlastislav Dohnal, FI MUNI, 2024 32 Stored Procedures ◼ Example: Compute average salary without revealing the individual salaries ◼ Table Employee(id, name, address, salary) PostgreSQL: ◼ CREATE FUNCTION avgsal() RETURNS real AS ‘SELECT avg(salary) FROM employee’ LANGUAGE SQL; User executes the procedure (function): ◼ SELECT avgsal(); PA152, Vlastislav Dohnal, FI MUNI, 2024 33 Stored Procedures ◼ Example (cont.): Salaries are not secured To secure we need to ◼ REVOKE SELECT ON Employee FROM … ◼ GRANT EXECUTE ON FUNCTION avgsal() TO … By running “SELECT avgsal();” the procedure is executed with privileges of current user. ◼ → it needs SELECT on Employee! PA152, Vlastislav Dohnal, FI MUNI, 2024 34 Stored Procedures ◼ Context of execution Can be set during procedure creation Types: ◼ INVOKER – run in the context of user that calls the function (typically current user) ◼ DEFINER– run in the context of the owner of function ◼ „particular user“ – run in the context of the selected user ◼ … PA152, Vlastislav Dohnal, FI MUNI, 2024 35 Stored Procedures ◼ Execution context in PostgreSQL SECURITY INVOKER SECURITY DEFINER ◼ Solution: set the context to owner CREATE FUNCTION …. LANGUAGE SQL SECURITY DEFINER; ◼ Assumption: owner has the SELECT privilege to Employee PA152, Vlastislav Dohnal, FI MUNI, 2024 36 Attacks to DB system ◼ Network connection DB port open to anyone → use firewall Unsecured connection ◼ Apply SSL ◼ Logging in Weak password Limit users to logging in ◼ Allow selected user accounts, IP addresses and databases Using one generic (admin) DB account PA152, Vlastislav Dohnal, FI MUNI, 2024 37 Attacks to DB system ◼ SQL injection Attack by sending SQL commands in place of valid data in forms. Typically related to using only one DB account ◼ which is admin )-: PA152, Vlastislav Dohnal, FI MUNI, 2024 38 SQL injection: Example ◼ App presents a form to enter string to update customer’s note in DB:  Internally the app use the following DB statement: UPDATE customer SET note=‘$note’ WHERE id=‘$login’; ◼ Malicious user ‘johnd’ enters to the form: Vader’; -- ◼ After variable expansion we get string: UPDATE customer SET note=’Vader’; --’ WHERE id=’johnd’; PA152, Vlastislav Dohnal, FI MUNI, 2024 39 SQL injection – another example ◼ App presents a form to enter string to update customer’s note in DB:  Internally the app use the following DB statement: UPDATE customer SET note=‘$note’ WHERE id=‘$login’; ◼ Malicious user ‘johnd’ enters to the form: Vader’; DROP TABLE customer; -- ◼ After variable expansion we get string: UPDATE customer SET note=’Vader’; DROP TABLE customer; --’ WHERE id=’johnd’; All in one line! SQL Injection: Countermeasures ◼ Use specific user account  Avoid using admin account ◼ Check input values  Input length, escape characters,… ◼ Functions in programming language  mysql_real_escape_string(), add_slashes()  $dbh->quote($string) ◼ Functions in DBMS  quote_literal(str) ◼ returns a string str suitably quoted to be used as a string literal in an SQL statement PA152, Vlastislav Dohnal, FI MUNI, 2024 40 SQL Injection: Countermeasures ◼ Prepared statements Parsed statements prepared in DB ◼ i.e., compiled templates ready for use Values are then substituted ◼ Parameters do not need to be quoted then May be used repetitively Example: PA152, Vlastislav Dohnal, FI MUNI, 2024 41 $st = $dbh->prepare("SELECT * FROM emp WHERE name LIKE ?"); $st->execute(array( "%$_GET[name]%“ )); SQL Injection: Countermeasures ◼ Prepared statements at server-side programming  The same concept, but stored in DB  Typically, in procedural languages in DB  PostgreSQL ◼ PREPARE emp_row(text) AS SELECT * FROM emp WHERE name LIKE $1; EXECUTE emp_row(‘%John%’); ◼ Query is planned in advance  Planning time can be amortized  But: the plan is generic! ◼ i.e., without any optimization induced by knowing the parameter  Lasts only for the duration of the current db session PA152, Vlastislav Dohnal, FI MUNI, 2024 42 Prepared Statements: Performance ◼ Prepared execution yields better performance when the query is executed more than once: No compilation No access to catalog. ◼ Experiment performed on Oracle8iEE on Windows 2000. PA152, Vlastislav Dohnal, FI MUNI, 2024 43 0 0,2 0,4 0,6 0 1 2 3 4 5 6 Throughput(queries/sec) No. of repetitions direct prepared Attacking Views ◼ Views protect data rows…  even if permissions are correctly set  E.g., student(studentid, firstname, lastname, fieldofstudy) ◼ CREATE OR REPLACE VIEW studentssme AS SELECT * FROM student WHERE fieldofstudy = 'N-SSME‘;  But, creating a “cheap” function ◼ CREATE OR REPLACE FUNCTION test(name text, study text) RETURNS boolean AS $$ begin raise notice 'Name: %, Study: %', name, study; return true; end; $$ LANGUAGE plpgsql VOLATILE COST 0.00001;  The query leaks other students in a side channel… ◼ SELECT * FROM studentssme WHERE test(lastname, fieldofstudy)  NOTICE: Name: Nový, Study: N-AplInf NOTICE: Name: Dlouhý, Study: N-Inf NOTICE: Name: Svoboda, Study: N-AplInf NOTICE: Name: Starý, Study: N-SSME NOTICE: Name: Lukáš, Study: N-SSME … ◼ Countermeasures:  Ban creating new DB objects.  Use security_barrier in Pg.conf or in create view. PA152, Vlastislav Dohnal, FI MUNI, 2024 44 PA152, Vlastislav Dohnal, FI MUNI, 2024 45 Contents ◼ Generating IDs ◼ DB security Access control in DB Stored procedures Attack on DB ◼ Spatial data Data types, queries Indexing – Quad-tree, Grid index, R-tree PA152, Vlastislav Dohnal, FI MUNI, 2024 46 Processing Spatial Data ◼ Spatial data Typically geographic, 2d geometry ◼ X, Y coordinates x y E.g., … PA152, Vlastislav Dohnal, FI MUNI, 2024 47 Processing Spatial Data ◼ Spatial queries What city is at position ? What is in neighborhood of 5km from position ? What is the closest site to ? ◼ Without DBMS support How to measure distance?  E.g., for GPS coordinates ◼ We can create a user-defined function (Traditional) Index on X, or on XY, … ◼ May not help for some queries PA152, Vlastislav Dohnal, FI MUNI, 2024 48 Processing Spatial Data ◼ Geometric constructs: lines, rectangles, polygons, … ◼ Operations: Is point inside a polygon? Do polygons intersect? … PA152, Vlastislav Dohnal, FI MUNI, 2024 49 Processing Spatial Data ◼ DBMS support is convenient Special data types and functions/operators ◼ PostgreSQL  Types: point, line, box, circle, …  Functions: area(), center(), length(), …  Operators: ~= same as, ~ contains, ?# intersects, …  Index: R-tree ◼ SQL Server 2008  Types: point, linestring, polygon, geography, …  Index: Grid ◼ Oracle 9i  Types: SDO_GEOMETRY (SDO_POINT, SDO_LINE,…)  Index: R-tree, Quad-tree PA152, Vlastislav Dohnal, FI MUNI, 2024 50 Processing Spatial Data ◼ Quad-tree Search tree, where each node splits data space into 2d regions of equal size ◼ e.g., 2d data → 4 regions Leaf nodes may be of larger capacity than 1. PA152, Vlastislav Dohnal, FI MUNI, 2024 51 Processing Spatial Data ◼ Quad-tree Supports points only Extension to complex data: ◼ Item stored in many regions ◼ Complex objects wrapped in rectangle PA152, Vlastislav Dohnal, FI MUNI, 2024 52 Processing Spatial Data ◼ Grid Bounded data space: xmin, ymin, xmax, ymax SQL Server ◼ Grid of fixed dimensions: 4x4, 8x8, 16x16 cells ◼ Multiple levels Source: Microsoft MSDN, http://msdn.microsoft.com/en-us/library/bb964712.aspx PA152, Vlastislav Dohnal, FI MUNI, 2024 53 Processing Spatial Data ◼ R-tree (Rectangle Tree)  Extension of B+ trees to d-dimensional data ◼ Insertion, deletion – almost identical to B+ tree  Leaves may contain more data items ◼ List is represented by minimum bounding rectangle (MBR)  Internal nodes ◼ References to child nodes and their MBRs  Node MBRs may overlap → search procedure has to follow more colliding tree branches.  Each data item stored exactly once ◼ Advantage over Grid and Quad-tree PA152, Vlastislav Dohnal, FI MUNI, 2024 54 Processing Spatial Data ◼ R-tree  Organizing complex spatial data done by wrapping them in MBR. (an object is represented as a rectangle) Lecture Takeaways ◼ Primary key value generation ◼ Securing DB Avoid using admin account for general use Mind “no-action” revoke command and recheck the resulting graph of grants. ◼ Extensions to more complex data with indexing support PA152, Vlastislav Dohnal, FI MUNI, 2024 55