Issue link: https://insights.oneneck.com/i/1190444
11 DISASTER RECOVERY GUIDE – Powered by ZERTO Snapshots Many solutions use snapshots as a method to enable a quick restore. A snapshot is a way to "freeze" a live storage system or VM at a moment in time. Changes continue to be made to the files beyond the snapshot capture. If changes are made beyond the snapshot capture and the VM or storage system encounters an issue, there is a choice to reject those changes by reverting the VM or storage system back to the time of the snapshot creation. A snapshot is especially useful when making changes to a single VM that where a rollback may be necessary. There are two types of snapshots: storage snapshots, based on the hardware, and hypervisor snapshots. Storage snapshots: expensive Storage snapshots are taken on the entire storage volume as a whole and can expand exponentially in size, using a lot of first tier storage space. 30% of the net disk space is not unusual. This is particularly true when there are a lot of changes to the data on the storage after the time the snapshot is first taken. Most storage snapshot technologies also rely on the original disk. Virtual Machine snapshots: incomplete VM snapshots apply only to the specific individual VM and do not create copies of VMs. It's just a file that enables a virtual machine that already exists to be returned to a previous state (likewise for most storage-based snapshot technologies as well but with regards to the entire storage volume rather than only a single virtual machine). They are not protected in the case of hardware failure. If the files containing a virtual machine are lost, the associated snapshot files are rendered useless. The question is, are Snapshots adequate as a DR solution? • No real DR – Snapshots are used to save a point in time temporarily, not for a long-term solution. To create a copy of a VM to store, a backup or a DR site is needed, not a snapshot. • Performance – Virtual Machine snapshots have an enormous impact on the performance of a virtual machine, and can also impact the entire environment with additional hypervisor and storage overhead. • Management – Large numbers of snapshots are difficult to manage. • Frequency – Because snapshots are typically taken every 4 hours (more would have too much impact on performance and storage), still 4 hours of data is lost after roll back (see figure 4). The claim that an RPO of 15 minutes is feasible, doesn't scale beyond very small environments. In modern environments a more continuous replication solution is needed, without performance impact on the production environment. • Snapshots & the cloud – Though some DR solutions use snapshots and store them in a cloud environment (DRaaS), the snapshot still has to be created in the production environment before being replicated to the cloud. In this way the storage and performance impact remains the same. What is also important to know is which type of snapshot is being used by these solutions: is it a storage snapshot or a VM snapshot. Storage- based snapshot technologies require identical hardware, limiting cloud providers and hardware life cycles between the two organizations involved. Figure 4. Though snapshots shorten RPO, they usually are taken every 4 or 8 hours (because otherwise they will have too much impact on the performance and us to much disk space). This results in a better RPO than a traditional back-up, but still up to 4 or 8 hours of work can be lost when an incident occurs. 00:00 Backup 08:00 04:00 12:00 16:00 20:00 24:00 Snapshot Snapshot Snapshot 4H 16H+