SQL
Data modelling and storage is one of the primary aspects in system design. How you model and visualise the data entities, their relationship and behaviour with each other and the scale of read/writes determines your storage choice. The following lessons focusses on the two major storage choices available, viz. SQL and NoSQL.
This lesson will help you answer questions like:
What is SQL/RDBMS databases?
What properties are associated with SQL databases?
What are the pros and cons of choosing SQL ?
When would you use SQL databases ?
What is RDBMS
An RDBMS is a database, where data can be based on a relational model, described in the form of tables and columns. Each row of data is an individual data point, with each column value as an attribute. RDBMS provides CRUD operations- Create, Read, Update and Delete. SQL (Structured Query language) is a declarative language that is used to operate and manage relational databases. Hence commonly relational databases are also known as SQL databases
Properties
Data is stored in the form of tables and rows.
SQL database operations are ACID compliant:
(A)Atomicity: All changes to data are operated, as if they were a single operation. Atomicity ensures that either all the operations for a change are operated or none.
(C)Consistency: Data consistency is guaranteed when a database transaction finishes due to atomicity. This implies that all the data updates have finished successfully and the database is in a valid state, maintaining the data invariant. All subsequent read requests will return the correct data state.
(I)Isolation: Each transaction happens on a database, in an isolated environment, thereby implying that multiple transactions executing concurrently would result in the database in a state where each transaction was executed separately sequentially.
(D)Durability: Durability means that once a transaction has been completed, the changes will persist even in case of system failure.
Vertically Scalable: Higher specs of the system like memory, cpu and disk can increase the scale of the database.
Normalisation: Normalisation is the process of removing data redundancy, by splitting data in multiple tables and establishing relationships between them and enforcing integrity constraints. Denormalisation is the process of adding redundancy by adding duplicate data in the data to reduce joins. Normalisation has various levels; lower level of normalisation implies higher denormalisations, hence better query performance
Relational databases provide keys and indices to improve data organization and query performance.
Relational databases also support distributed databases, however joins, and partitioning will have to be performed at application level. Keys and indices prevent replication in a single database
Quick Q&A
How do you scale up your database, if amount of data (the number of inserts) increase ?
A simple way of improving the number of inserts is by introducing partitioning. Partitioning is dividing data into multiple tables in a single database. Horizontal partitioning allows data to be spread across multiple tables based on a partitioning key. For example in a database about users, partitioning can be done on user name, where all users with names starting with A could be in one partition and so on. Vertical partitioning allows tables to be normalised by dividing columns in multiple tables. Sharding, where data is split across multiple machines, can help as well. However this won't allow joins and referential integrity across tables to be maintained. Joins can then only be achieved at application level, where the application fetches data from the database and then performs the join.
How do you scale up your database, if the number of read requests increase ?
We can setup replication of SQL databases at the cost of stale data, and redirect read requests to these replicas. This is also known as master-slave architecture, where the write requests are handled by one instance, and the reads happen from slaves. This allows read requests to be distributed and prevents the data base from going down.
Let's say you have a distributed sql database, that is hosted on two nodes. How will it work ?
As mentioned above, read and write requests will have to be redirected based on sharding key. Also, foreign key constraints will be hard to maintain and joins will not be possible. You can setup indices and partitioning on individual nodes, but that would only improve the query performance on the single node.
Is ACID the only determining factor to choose a SQL database ?
SQL databases are easy to setup, and easy to model. The SQL query language is widely known and for all normal purposes with thousands of queries per second, SQL databases serve the purpose.
For More Reading
Last updated