Snapshots Are NOT BackupsComparing Storage-based Snapshot Technologies with Recovery Manager (RMAN) and Fast Recovery Area for Oracle Databases
by Tim Chien, Oracle
While storage snapshots are widely used to quickly create point-in-time virtual copies of data, they are also often marketed as valid “backup solutions”. This is an incorrect and dangerous assumption because snapshots, unless copied to secondary media (e.g. another storage array or tape), do not protect against media failures. While there are benefits of using snapshots for development or testing purposes on non-production systems, they should not be considered as valid data protection or backups of Oracle databases. Instead, customers should look to Recovery Manager (RMAN) and Fast Recovery Area (FRA) as the Oracle-supported solution to create and manage Oracle database backups. Note that since RMAN and Fast Recovery Area are built-in features of the Oracle database, this solution also applies to Oracle Exadata Database Machine, with the additional benefit of extremely high performance.
This article provides a comparison of storage-based snapshot technologies with RMAN and Fast Recovery Area backups.
Since its debut in Oracle8, Recovery Manager (RMAN) has offered a rich and evolving set of database-optimized backup and recovery capabilities, fulfilling a wide range of data protection requirements. For example, Oracle Database 10g Release 1 introduced incrementally updated backups, which allows a full image copy backup of a tablespace/datafile/database on disk to be updated in-place using a fast incremental backup - in effect, creating a more current full backup on disk in just the time it takes to apply the incremental. This backup strategy is further enhanced when combined with the Fast Recovery Area (FRA), a single disk location where all recovery-related files (including RMAN backups) can be stored and automatically managed by Oracle, relieving the DBA from having to oversee backup space management tasks and ensuring that all needed recovery-related files are always available per the user-defined retention policy. The diagram below illustrates this backup strategy:
Figure 1 – Oracle Suggested Backup Strategy
Storage snapshots have offered development and QA capabilities for database and non-database environments for many years, providing the ability to quickly create point-in-time storage-efficient virtual copies of the data. Snapshots do not require an initial copy, as they are not stored as physical copies of blocks, but rather as pointers to the blocks that existed when the snapshot was created. Because of this tight physical relationship, the snapshot is maintained on the same storage array as the original data. Snapshots are generally implemented either as copy-on-write or redirect-on-write-based methods.
In the copy-on-write case, after a snapshot is taken, and upon the first change to a storage block, the array copies the before-change block to a new location on disk, thus maintaining the before-change block for the snapshot and the new block for the active version of the database. In the diagram below, block C is updated, so the old block is copied to a new location, then the new block (C’) is written to the original location.
Figure 2 – Copy-on-Write Storage Snapshot
In the redirect-on-write case, the new block (C') is directly written to the snapshot storage, as shown in the diagram below. Thus, there are no double writes when a block changes, as in the copy-on-write case, but the active version of the blocks becomes fragmented over time.
Figure 3 – Redirect-on-Write Storage Snapshot
Snapshots have no awareness of the Oracle block structure (as they operate at a storage block-level) and more importantly, they are inherently physically different than backups (consisting of pointers instead of blocks). As a result, there are significant trade-offs that should be considered before using snapshots to provide data protection for the Oracle database.
The following sections provide details on the advantages/disadvantages of the RMAN incrementally updated backups and FRA solution versus storage snapshots.
RMAN is an integrated data protection solution for the Oracle Database, providing several degrees of protection. At a granular block level, RMAN fully validates the Oracle blocks as they are backed up and restored - blocks are validated via physical checksum comparison and logical checking within the block itself (e.g. verifying that row piece or index entry is consistent). Backups can be used to recover production data to the last available archived redo log in any data loss or physical corruption scenario, or to a specific point-in-time (per RMAN retention policy). Furthermore, the entire database or individual tablespaces/data files can be validated for physical and logical block correctness at the user’s discretion using the VALIDATE command. Likewise, a backup can be validated at any time, to ensure that it can be successfully restored, using the RESTORE VALIDATE command. RMAN also provides the block media recovery capability, which allows individual block corruptions in the database to be quickly repaired , while the unaffected data remains online and accessible to the user.
As previously mentioned, RMAN in combination with the FRA forms the foundation of Oracle’s recommended backup strategy, involving a one-time image copy backup to the FRA, daily fast incremental backups using RMAN’s block change tracking capability, and regular update of the image copy by applying the incremental backup. When using RMAN to back up the FRA files or the database itself to tape, Oracle Secure Backup provides an Oracle-optimized, RMAN-integrated backup approach, leveraging unused block compression, undo elimination, and shared memory buffers to offer the highest performing database backups to tape. Many leading third party backup vendors have also offered RMAN-integrated tape backup methods over the last several years.
Snapshots, on the other hand, are not designed for Oracle data protection. They have no knowledge of an Oracle block structure, and hence do not and cannot validate Oracle data when they are created. They cannot be used for any data loss or physical corruption scenarios. A block corruption that goes undetected can potentially affect a series of snapshots, if the block does not change over time. Since snapshots reside on the same array as the source database, they are vulnerable to failures that affect the storage array. That is why a snapshot, even though it is created very quickly, does not constitute a backup of the original data. For a snapshot to be used as a valid backup, it must be re-constituted as a full set of blocks to another storage array or to tape, which involves the same performance issues that are characteristic of a full copy. Finally, restoring a snapshot has the side effect of nullifying all snapshots that were taken after it, unless the snapshot is fully restored as a copy of the production data to an alternate location. Given these inherent deficiencies with snapshots, it is evident that only Oracle-aware RMAN backups can offer true data protection "peace of mind".
The RMAN incrementally updated backup method requires an initial image copy backup of the database, i.e. 1X copy of the database minus temp data files. After the full backup is taken, fast incremental backups and the incremental update of the copy are the only required backup operations thereafter. RMAN performs sequential Oracle block I/O reads on the database storage during the backup. Consequently, database performance can potentially be impacted during backups due to the additional I/O consumption. Note that fast incremental backups reduce I/O consumption by only reading the changed blocks relative to the last full or incremental backup – that too, in a highly Oracle optimized manner using RMAN’s block change tracking capability. In addition, the incremental update operation utilizes I/O only on the FRA storage and not on the production database storage.
With respect to copy-on-write- based snapshots, the database performance impact manifests in two ways. First, after a snapshot is created, the first write to a database block translates to two storage I/O writes – one for the copy of the original block to a new snapshot storage location and one for the write of the new block over the original block. The increased I/O usage can have a severe impact on production database performance. Secondly, after reverting the production database volume to a previous snapshot, the now-active version of the storage blocks includes references to the snapshot blocks, which are likely to be fragmented across the disk instead of being sequentially laid out (which the database still expects when I/O is issued). For example, in the previous diagram of the copy-on-write snapshot, an I/O request for block C is redirected to the snapshot version of block C, while I/O requests for block B are not redirected, since it did not change relative to the time the snapshot was taken. When the database issues a 1 MB I/O, instead of reading the data sequentially in a single large read, it will issue 128 random I/Os (assuming 8K block size). As multiple snapshots are created and restored over time, the resulting fragmented block layout can result in a potentially 10-100X slowdown in database performance.
Because of these reasons, it is never a good idea to create and use snapshots on production database storage. Snapshots, if used for development and QA purposes, should be created on secondary copies of data which do not support production workload. Oracle’s High Availability (HA) Development group has published a highly efficient way to achieve this, using Oracle Data Guard and the ZFS Storage Appliance – see this white paper for more details.
As previously discussed, the RMAN incrementally updated backup method requires an initial image copy backup, then incremental backups and incremental updates to the copy thereafter. Thus, the initial backup time is proportional to the size of the database and backup times thereafter are proportional to the volume of changed blocks between incrementals. If a copy needs to be preserved to satisfy the retention policy before being incrementally updated, RMAN can backup the copy to tape. Backing up the copy and other FRA files to tape also allows disk space to be automatically reclaimed by the FRA when additional space is needed for new files. When a recovery is needed, the full copy can either be restored to the production database storage, or used directly as the production data files via the RMAN SWITCH command (i.e. restore-free recovery). The restored data files are then recovered to a consistent point-in-time via the redo apply process.
For example, if datafile 4 is accidentally deleted or severely corrupted, the DBA can use these simple RMAN commands to quickly switch to the copy of the datafile maintained in the FRA and make it consistent with the rest of the database, without impacting the rest of the database and without needing to do any time- consuming restore operation:
A demonstration of this technique is available on the OTN HA Demonstration page titled "Recovery Manager - Fast Recovery with Switch to Copy".
On the other hand, snapshot creation is indeed a near-instant operation - there is no need to do full or incremental backups. Creating a snapshot is essentially creating a marker to indicate when before-change blocks will begin to be copied to new storage locations (as previously discussed). Reverting to a snapshot is also a near-instant operation - no physical copy is performed and I/O is redirected to the snapshot and current version of blocks as needed. As with all kinds of recovery, a restored database snapshot must be recovered to a consistent point-in-time before it can be used.
RMAN clones a production database using the DUPLICATE command. DUPLICATE restores a full backup to the clone database server and recovers the clone database to a consistent point by applying incrementals / redo as needed. Starting in Oracle Database 11g, Active DUPLICATE clones the database by copying the database files and required archived logs directly over the network to the clone database server, eliminating the need for intermediary backup storage. Using DUPLICATE, the time to create the clone is proportional to the size of the database and the clone will occupy the same amount of storage as the production database.
Snapshot-based clones, on the other hand, can be created near-instantly and occupy a fraction of the production database storage, depending on the storage block change pattern. Just as copy-on-write methods are used to create snapshots, the same methods are used to create snapshot-based clones. The snapshot clone physically occupies space equivalent to the volume of unique blocks that have changed, since the clone was created and not proportional to the database size itself. However, just as in the case of snapshots, there is additional database I/O impact due to copy-on-write – this impact is exacerbated for writable snapshot clones, where the clone database block changes are also tracked via copy-on-write. Because of the severe I/O performance impact, snapshot clones are not recommended to be utilized on the production database, but on a secondary copy of the database.
Storage-based snapshot technologies serve a different purpose compared to backup and data protection solutions. Since snapshots reside on the same array as the production database, they are vulnerable to array failures and thus should not be considered valid "backups" of the data. Snapshots can be effectively utilized for development/QA/test activities on a secondary copy of the production database, but should not be utilized on the production database itself due to the severe I/O impact of copy-on-write. For Oracle database backups, customers should leverage RMAN and Fast Recovery Area, along with Oracle Secure Backup for integrated tape backups, to provide complete data loss and corruption protection.
Further Reading: Related Solutions Available with the Oracle Database The Oracle Database offers several solutions that provide capabilities for easily reverting to point-in-time copies of the database or database objects, as well as maintaining copies for efficient cloning and testing.
|
Tim Chien is a Principal Product Manager with the Oracle Database High Availability Development team, focusing on backup and recovery.