NoSQL Marián Rusnák Seminar of DISA Laboratory 9.12.2013 Outline •Motivation •What is NoSQL •Key-Value stores •Wide-Column stores •Document stores •Summary • Motivation •Massive data volumes •Extreme query workload •Schema evolution http://cancercenters.cancer.gov/images/data.gif Traditional RDBMS •Transactions – ACID •Integrity •Complex queries http://www.sfera.sk/media/93946/oracle_1.jpg http://www.bugtreat.com/blog/wp-content/uploads/2012/07/mysql-logo-1.png http://2.bp.blogspot.com/-EW1oGsxNyPo/UFB5Cof4GrI/AAAAAAAAGWk/l1GkLjVQbyA/s1600/postgresql.jpg http://www.scriptcase.net/blog/wp-content/uploads/2013/09/ibm-db2.png But… •Modern web apps may have different needs –Low latency –Scalability & elasticity –High availability –Flexible schemas / semi-structured data –Distributed – http://www.tech-noob.com/wp-content/uploads/2013/09/clear-cloud-computing-diagram.jpg What is NoSQL •Term NoSQL –Not Only SQL –Non-relational database •Databases different from traditional RDBMS http://smist08.files.wordpress.com/2012/01/nosql.png What is NoSQL •Meet modern web apps requirement •Scalable –huge volumes of data •Distributed –multiple nodes –multiple datacenters •Flexible –no strict schema • What is NoSQL •Umbrella term for different types of datastores –Key-Value stores –Wide column stores / column families –Document stores –Graph databases –Other… What is NoSQL •http://nosql-database.org/ –150 NoSQL databases •Almost every big company has own solution http://www.innovativeinteractivity.com/wp-content/themes/tma/images/latest/nosql_header.jpg Key-Value Stores •Simplest •Largest group •Something like distributed hash tables •Examples –Amazon Dynamo –Amazon DynamoDB –Voldemort –Redis –Oracle NoSQL Database –SimpleDB • http://www.ingenioussql.com/wp-content/uploads/2013/02/KeyValueStore.gif Dynamo •Key-Value distributed storage system •Pioneer in the area •Many features incorporated by others •Peer-to-Peer •Simple data model – unique keys, no schema • – http://www.nanostuffs.com/images/Nano/aws.png Dynamo •Consistent hashing –“Ring” of nodes –Virtual nodes – node has several positions in the ring •Replication –Successors in the ring • http://www.nanostuffs.com/images/Nano/aws.png Dynamo •Quorum –Coordinator –R+W > N •Gossip –Failure detection •Object versioning • http://www.nanostuffs.com/images/Nano/aws.png Voldemort •LinkedIn, now open-source •Based on Dynamo •Written in Java http://www.project-voldemort.com/voldemort/images/voldemort_logo.png http://news.bbcimg.co.uk/media/images/62937000/jpg/_62937665_011689190-1.jpg Wide-Column Stores •Key is associated with multiple attributes •Inspired by Google BigTable •Examples –Google BigTable –HBase –Cassandra BigTable •Proprietary –Only short info paper released •Sparse, distributed multi-dimensional sorted map • BigTable •Rows –Data sorted by row key •Tablets –Sequence of rows –Distributed •Columns, Column families –Unlimited numbers of columns • BigTable •Versioning –Timestamps –Garbage collection •Stores data on Google File System HBase •Inspired by BigTable •Open-source – Apache •Uses Hadoop Distributed File System https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSnqNlyVGucoLPoB3zVzq4mvzzHXr9FMdCgpZOwQuXRPcG ofFssHg Cassandra •Facebook –Fast inbox search •Now open-source Apache project •Data model of BigTable •Infrastructure of Dynamo • http://blog.strikeiron.com/Portals/87196/images/cassandra_logo.png http://victorstuff.com/wp-content/uploads/2013/07/facebookLogo-642x357.jpg Document Stores •Value is more than a string –JSON –BSON •Inspired by IBM Lotus Notes •Very flexible schema •Examples –MongoDB –CouchDB https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQwzBnlgKLeThhyqqJKRycyUptGisltmYTmLfctxOsoUe5 Be5-E CouchDB •Open-source Apache project •Schema free •JSON format •B-Tree storage •MVCC, no locking •No joins, no primary or foreign keys •Written in Erlang •REST API • MongoDB •Open-source •BSON format – similar to JSON •Queries can be objects •Multiple types of indexing •Master/slave replication •Written in C++ •Drivers in many languages •Most popular –23 000 questions on stackoverflow https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRVRbBAY19aHpuLUbt5HbXHJmmjy8rlvrB2SfJNbNiRUlt YqCDWNg Trends • Trends •Gaining in popularity… Trends •…but still got a long way to go Summary •NoSQL is suitable for modern web apps –Scalable –Distributed –Flexible •4 main types –Key-Value stores •Key to value mapping –Wide column stores / column families •More attributes associated with key –Document stores •Key to document mapping, not only to string –Graph databases •Connection between objects References •SlideShare –http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443 –http://www.slideshare.net/dstainer/introduction-to-nosql-databases –http://www.slideshare.net/quipo/nosql-databases-why-what-and-when •Google BigTable paper –http://static.googleusercontent.com/media/research.google.com/sk//archive/bigtable-osdi06.pdf •Stanford course –http://infolab.stanford.edu/~widom/cs145/NoSQL_activity.pdf •http://www.mongodb.com/learn/nosql •