FlashGrid cluster on AWS. Backup best practices.

This articles covers backup procedures recommended for different types of data including:

  • OS and software on the cluster nodes
  • Database files
  • Files on ACFS, if used

Backing up OS and software on the cluster nodes

It is strongly recommended to back up the OS and software volumes of all cluster nodes after the initial cluster configuration and before and after applying any changes, such as patch installation or security settings changes. OS and software on a cluster node can be backed up by creating an AMI for the instance or by creating snapshots of individual EBS volumes. AMI-based backup is recommended as it allows easier way to recover an instance in case it is terminated because of a failure or a human error.

To create a backup AMI for a cluster node:

  1. If the node is a database node, stop Oracle services.

    # crsctl stop crs

  2. Gracefully stop the instance.

    # flashgrid-node poweroff

  3. Create AMI for the instance.

IMPORTANT! Remove data volumes from the list of volumes that will be backed up to the AMI. Typically only the root volume (/dev/sda1) and the software volume (/dev/xvdz), if present, must be included in the backup AMI. Other volumes must be excluded. Failure to exclude data volumes from the AMI will create inconsistency in ASM disk groups if the data volumes are later restored from the AMI. Data volumes should never be backed up or restored at the volume level. Instead, database/ACFS file backup procedures must be used as described further in this article.

Restoring root or software volume of a cluster node

Root volume of an instance is an EBS volume that has the OS installed on it. Device name of the root volume is /dev/sda1.

Software volume of an instance is an EBS volume where Oracle software binaries are installed (contains the /u01 directory). Typically the software volume has device name /dev/xvdz.

The root volume or the software volume may need to be restored in case the volume fails, has file system corruption, or has logical corruption.

To restore root volume (sda1) or software volume (xvdz)

  1. In the backup AMI for the affected instance identify the snapshot id for the affected volume

  2. Using the snapshot id, create a new volume in the availability zone where the affected instance is located

  3. Stop the affected instance
  • If the OS is running then stop the instance gracefully using flashgrid-node poweroff command
  • If the OS is not running then stop the instance using AWS console or CLI
  1. Detach the affected volume from the instance

  2. Attach the newly created volume to the instance using the same device name (/dev/sda1 for root volume, /dev/xvdz for the software volume)

  3. Start the instance

Restoring an instance that was accidentally terminated

Setting instance termination protection is strongly recommended to prevent accidental termination of a cluster node instance.

To restore an instance that was terminated

  1. Launch a new instance using the backup AMI for the instance that was terminated
  • Make sure that correct instance type, VPC, subnet, and security group are selected.
  • Configure the same Primary IP that was used on the terminated instance.
  • Specify a placement group corresponding to the cluster
  • Make sure that only /dev/sda1 and /dev/xvdz (only on database nodes) volumes are configured. Remove any other volumes if they are present in the AMI.
  1. Attach data volumes to the new instance using the same device names (such as xvdba) that were previously used

  2. Log in to the instance and bring the data disks online:

    $ flashgrid-node online

Backing up and restoring database files

Use standard RMAN procedures for backing up and restoring database files. The two recommended options for backup storage destination are:

  • Amazon S3. Provides maximum flexibility with easy shared access to the backup files.
  • An EBS volume with a local file system. Provides maximum performance, with up to 500 MB/s of read/write bandwidth on a st1 type of volume.

For information about backing up to S3 see the following documentation from Oracle and AWS:

To configure an EBS volume as a backup storage destination

  1. Create an EBS volume in the availability zone where the instance running RMAN is located. st1 volume type is recommended.

  2. Attach the volume to the instance running RMAN. Select a device name in the xvdc to xvdg range - disks in this name range will be treated as local and will not be shared by FlashGrid Storage Fabric.

  3. Format the volume with a local file system (XFS recommended) and create a mount point for it.

  4. Use standard RMAN procedures to configure backup to the local file system.

Note that an EBS volume can be moved only between instances in the same availability zone. However, snapshot of the volume can be used to clone the volume to a different availability zone.

Backing up Grid Infrastructure configuration files

Please follow Grid Infrastructure Backup and Restore Best Practices

Backing up and restoring files on ACFS

For backing up and restoring files on ACFS use same tools and procedures that you would normally use for file-level backup and restore.