Features
What is a Database Snapshot?
Databases are useful in many ways besides their core function of storing and serving data. For example, databases are useful in areas like reporting and auditing, just to mention a few.
When you want to use some data set from a database for something like reporting or analysis, it would be easier if you were to be able to go to the specific period you are interested in and get the data set as it were for the period of interest. Database snapshots become useful in such moments.
In this simple guide, we explore database snapshots with a focus on how they work.
IMPORTANT NOTE: Please note that database snapshots are not only a preserve of SQL databases as this is the common notion. No, the concept applies across all types of databases such as a graph database and others.
Also Read: What is a Graph Database?
What is a database snapshot?
A database snapshot is essentially a static, read only view of the source database. The key principle is that the snapshot will always be consistent with the source database, as at the moment the snapshot was created.
The snapshot will always be domiciled in the same server instance where the source database resides. As more changes are made to the source database, the size of the snapshot file increases.
Database snapshots are the closest we’ve come to time travel in data operations. They offer a powerful complement to backups, playing a crucial role without compromising the integrity of the source database.
Picture this as an example - you are in charge of a website for an e-commerce store that relies heavily on a database for customer and product information, orders, and inventory management.
While installing new storefront features such as a new shopping cart plugin, disaster strikes, an incorrect setting in the plugin installation clip causes a misconfiguration.
Panic sets in as your database spirals into chaos. Product images and descriptions are nowhere to be found, and customer complaints start pouring in.
To quickly restore order, you turn to database snapshots. Database snapshots will help you to:
- Restore any accidental deletion of product images and descriptions
- Revert the database to a snapshot taken just before the update, effectively restoring the site to a stable state
- Use the snapshot to identify the incorrect setting on the plugin script
- Give you the option to use a different snapshot to test the update before deploying it to the live store.
From this example, it’s clear that database snapshots are extremely important. So, how do they work?
Also Read: Snapshots in NebulaGraph - An Introduction
How database snapshots work
A snapshot captures the state of the database at a specific moment without duplicating the entire dataset.
For a simplistic demonstration, let’s say it’s now 2:00 PM. Some updates need to be made to a certain data set within the database. Before the changes are implemented, a snapshot of the dataset is taken as it is at 2:00 PM. Once the snapshot is taken, the changes are done and completed at 3:00 PM. Come 4:00 PM, another change is due. Again, another snapshot is taken which shows the state of the data as at 4:00 PM. So now there are two snapshots. If other changes are needed at 6:00 PM, the same procedure is followed.
To simplify it further, we can say that database snapshots work in a similar manner to how version control works. The focus is on taking a ‘snap’ of the current changes, and not the entire thing.
There are two main techniques at the center of the workings of database snapshots:
- Copy-on-write
- Redirect-on-write
Copy-on-write
In this technique, the original data blocks that are due for modification are copied into a snapshot area. This is done before the blocks are overwritten. This way, the original data will be retained in the snapshot even as changes take place in the source database.
Redirect-on-write
At the start of the write operation, the new data is written to a different block (location) while the original data remains in place. After the write operation is completed and recorded, the original data is retained in the snapshot while the newly written data is retained in the source.
The key difference between these two mechanisms is in the process, right from the start to the end. In copy-on-write, the process starts by first ‘copying’ the original blocks to the snapshots. In redirect-on-write, the process starts by writing the new data to a different block i.e redirecting.
But in both methods, the newly written data eventually ends up in the master while the old one ends up in the snapshot. So the result is the same but the processes differ slightly.
Besides these two standard approaches, the following techniques can also be employed to create a database snapshot.
- Differential snapshots: This technique captures the data changes that have been made since the last snapshot was taken. In other words, the focus is on storing the difference between the current state and the last snapshot. This mechanism also reduces the storage space requirements.
- Clone snapshots: A clone of the existing data blocks is made at the time of making changes. The original data is preserved in the snapshot.
- Thin provisioning: Here, snapshots are created without the need to consume more space. The storage space is allocated in the course of making changes. This ensures optimum utilization of space.
- Chain snapshots: In this mechanism, snapshots are organized into a series of chains. Each snapshot is created based on the previous snapshot. This method makes it easy to track changes in the course of time and thus preserve the consistency of data.
- Log snapshots: In this method, the data blocks are not copied. Instead, the snapshot will capture the changes by making records of the operations that make modifications to the data.
Key takeaways about how database snapshots work
- Copy-on-write and Redirect-on-write are the two main techniques that are used to create database snapshots
- A read-only static view of the database file of interest is created, instead of a duplicate data set.
- Reading data from the snapshot is essentially equivalent to reading from the primary data source.
Also Read: How to Backup and Restore Data With Snapshots in NebulaGraph
Benefits of database snapshots
The core advantage of database snapshots is that they are great for projects that always need modifications in real-time i.e when an application is still running.
Here are the advantages in detail:
Speed
Since the snapshots only capture the state of the database at a specific moment , they are quick to implement. This means they can be utilized in scenarios where speed is of utmost importance, and this is why they are highly recommended for real-time implementations.
Efficient space usage
As snapshots only require space that is equivalent to the modifications created, they come with the advantage of not requiring too much space at a go. This saves a lot of the expenses that would have gone to storage resources.
Database snapshots contribute to consistent performance
Since database snapshots do not come with the need to have the data set of interest duplicated, it means that system disruption is minimal . As a result, performance remains largely consistent even in the course of updates.
Database snapshots enhance analysis
This is possible thanks to the ability to create snapshots that cover a period of time. Such snapshots make it possible to query results that cover the given period. The results provide an opportunity to analyze aspects such as customer growth during a given period.
Also Read: Graph Query Language
Database snapshots can act as safeguards against errors
It's normal to experience errors in databases. Luckily, snapshots create the opportunity to revert to the previous state of the database before a snapshot was made. This minimizes data loss. In such cases the snapshot acts as a recovery option, and this is far much faster than having to restore from a backup. However, they should not be mistaken to be a substitute for backups. You still need to do backups - they are very important with a distinct and irreplaceable role.
Drawbacks of database snapshots
Database snapshots are great but not without drawbacks, just like any other type of technology out here.
Here are some of the drawbacks or limitations of database snapshots:
- System overload: You can easily end up overloading the system if you create so many snapshots. This normally happens when there is no clear strategy on when to create, why and how to manage the snapshots.
- Overreliance: Some risks can crop up if there is going to be too much reliance on snapshots. We advise that they are utilized in moderation.
- Dependence on source database: Since database snapshots depend on the original source database, this renders them a non-redundant storage. So you cannot expect them to offer protection against corruption resulting from causes such as disk errors. In other words if the source files get lost or compromised, it would be impossible to make a restoration from snapshots.
- Read-only: Snapshots cannot be upgraded since they are read-only.
Also Read: Amazon Neptune vs NebulaGraph, plus Memgraph vs NebulaGraph
Scenarios where database snapshots are useful
Snapshots become useful mainly in the following functions:
- Auditing
- Reporting
- Analysis
In terms of auditing, auditors can easily go to specific snapshots and audit the state of the database at that particular moment. This makes work easier especially when certain parts of the audit demand focus on a particular period.
Also Read: NebulaGraph Audit Logs
Snapshots also become instrumental when preparing reports for different purposes. You can easily jump between different periods and extract relevant information, saving lots of time that would have been spent rummaging through the entire database or backup.
Same to analysis. If you want to analyze a specific period, you simply go to past snapshots and narrow to the snapshot that corresponds to the period in time.
Key differences between database snapshots and database backups
Database Snapshots | Database Backups |
---|---|
A database snapshot is a read-only static view of the source data set | A database backup is a readable copy of the source data set |
A database snapshot always resides on the same server as the source database | A database backup can be kept in different locations including local storage, the cloud, or be sharable within the network |
A database snapshot can only be restored to same location as the source database | A backup can be restored to a different location |
Also Read: How NebulaGraph Supports Incremental Backup
Conclusion
For best results, we advise that you review the snapshots on a regular basis and make sure you are only keeping snapshots that will be needed. This way you don't have to consume a lot of storage resources. Meticulous organization and management are key to getting the most value from snapshots.
Remember the ultimate goal of database snapshots is to enable efficient changes to the database, save on space utilization, sustain high levels of performance, and aid in activities such as reporting, auditing , and analysis. So, regardless of the mechanism or database you are going to use, keep your eye on these eventual goals of a typical snapshot operation.
NebulaGraph supports database snapshots and is 100% flexible in deployment. You can deploy it anywhere you prefer including on-premise, public cloud, hybrid, Windows, macOS. It's available on AWS and Azure marketplaces. Start your free trial.