Note: Do not use this procedure if you need to restart the entire cluster. Instead, see instructions for restarting the entire cluster.
To reboot one node in a running cluster
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
Reboot the node using flashgrid-node command. It will gracefully put the corresponding failure group offline.
# flashgrid-node reboot
In some cases it may be desirable to restart all nodes of the cluster simultaneously instead of rebooting them one by one.
Note: Do not reboot all nodes simultaneously using reboot or flashgrid-node reboot command. This may lead to CRS failure to start if one node goes down when CRS is already starting on another node.
To restart the entire cluster
Stop all running databases.
Stop Oracle cluster services on all nodes.
# crsctl stop cluster -all
Stop all cluster node VMs using GCP console.
To power off one node in a running cluster
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
Stop FalshGrid services on the node. It will gracefully put the corresponding failure group offline, stop CRS, and stop Flashgrid services.
# flashgrid-node stop
To shut the entire cluster down
Stop Oracle cluster services on all nodes.
# crsctl stop cluster -all
Resizing database node VMs may be needed for performance or cost reasons. Resizing can be done for one node at a time without causing database downtime.
To resize database node VMs in a running cluster repeat the following steps on each database node, one node at a time
Update SGA and PGA sizing parameters for the databases according to the new VM memory size
If using HugePages then update the vm.nr_hugepages
parameter in /etc/sysctl.conf
according to the new VM memory size
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
Stop all local database instances running on the node.
Stop Oracle CRS on the node:
# crsctl stop crs
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Stop the VM using GCP console
Resize the VM using GCP console
Start the VM using GCP console
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
Start all database instances on the node
When adding new disks make sure that each disk group has disks of the same size and that the number of disks per node is the same.
To add new disks in a running cluster
Create new disks to the database node VMs on CLI and then attach them using --device-name=shared-NN according to your environment, this disks will be shared automatically by FlashGrid Storage Fabric. Example:
gcloud compute disks create mycluster-5c9d98fc-rac1-shared-7 --size 40 --type pd-ssd
gcloud compute disks create mycluster-5c9d98fc-rac2-shared-7 --size 40 --type pd-ssd
gcloud compute instances attach-disk mycluster-5c9d98fc-rac1 --disk=mycluster-5c9d98fc-rac1-shared-7 --device-name=shared-7
gcloud compute instances attach-disk mycluster-5c9d98fc-rac2 --disk=mycluster-5c9d98fc-rac2-shared-7 --device-name=shared-7
Note: Read-Write must be enabled for all new disks.
Confirm FlashGrid names of the new drives, e.g. rac2.shared-7
$ flashgrid-cluster drives
If the new drives are not listed then check that the corresponding devices (e.g. /dev/shared-7) are visible in the OS. If they are visible in the OS then run # flashgrid-node reload-config
and check output of flashgrid-cluster drives
again. If they are not visible in the OS or they appear with a different name then double-check that you have attached them with correct device name. If you need to detach the disks you can use:
gcloud compute instances detach-disk mycluster-5c9d98fc-rac1 --disk=mycluster-5c9d98fc-rac1-shared-7
gcloud compute instances detach-disk mycluster-5c9d98fc-rac2 --disk=mycluster-5c9d98fc-rac2-shared-7
Add the new disks to an existing disk group (or create a new disk group). Example:
$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac[12].shared-[3-5]
To remove disks from a running cluster
Determine FlashGrid names of the drives to be removed, e.g. rac2.shared-7:
$ flashgrid-cluster drives
SQL> alter diskgroup MYDG
drop disk RAC1$SHARED_7
drop disk RAC2$SHARED_7
rebalance wait;
Prepare the disks for removal. Example:
[fg@rac1 ~] $ sudo flashgrid-node stop-target /dev/flashgrid/rac1.shared-7
[fg@rac2 ~] $ sudo flashgrid-node stop-target /dev/flashgrid/rac2.shared-7
ASM will drop a disk from a disk group if the disk stays offline for longer than the disk repair time. If the disk was taken offline because of an intermittent problem, for example a network problem, then you can re-add such disk to the disk group. Force option must be used for re-adding such disk because it already contains ASM metadata.
Example of re-adding a regular disk:
# flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.shared-7 -f
Example of re-adding a quorum disk:
# flashgrid-dg add-disks -G MYDG -q /dev/flashgrid/racq.shared-5 -f
SkyCluster Node Update package is a single self-extracting bash script file that allows updating the following components:
Using this package makes it easier to have the update performed to the latest validated set of software components and helps avoid accidental installation of incompatible software versions.
Note: Please review corresponding release notes and check with FlashGrid support before performing any major version update. A major version consists of the first two numbers. The third number represents a revision (hotfix). For example, update from version 19.02.x to 19.05.x is major, but from 19.05.100 to 19.05.200 is a hotfix revision.
Note: Simultaneously updating FlashGrid software and applying Grid Infrastructure patches in rolling fashion is not recommended. FlashGrid services should not be stopped while GI cluster is in rolling patching mode.
To update software using SkyCluster Node Update package on a running cluster, repeat the following steps on each node, one node at a time
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
If the node is a database node,
a. Stop all local database instances running on the node.
b. Stop Oracle CRS on the node:
# crsctl stop crs
Stop the FlashGrid Diagnostics monitoring service:
# systemctl stop flashgrid-node-monitor
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Stop the FlashGrid Cloud Area Network service on the node:
# systemctl stop flashgrid-clan
Run the update script as root.
Example with kernel update:
# bash skycluster_node_update-19.5.17.85011.sh
Example without kernel update:
# bash skycluster_node_update-19.5.17.85011.sh skip-kernel-update
Reboot the node:
# reboot
Before updating the next node, wait until the node boots up, all disks are back online, and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No before it is safe to update the next node.
# flashgrid-cluster
Note: In most cases using SkyCluster Node Update package is recommended for updating FlashGrid software and OS kernel.
Note: Please review corresponding release notes and check with FlashGrid support before performing any major version update. A major version consists of the first two numbers. The third number represents a revision (hotfix). For example, update from version 19.02.x to 19.05.x is major, but from 19.05.100 to 19.05.200 is a hotfix revision.
Note: Simultaneously updating FlashGrid software and applying Grid Infrastructure patches in rolling fashion is not recommended. FlashGrid services should not be stopped while GI cluster is in rolling patching mode.
To update flashgrid-sf and/or flashgrid-clan RPMs on a running cluster repeat the following steps on each node, one node at a time
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
If the node is a database node,
a. Stop all local database instances running on the node.
b. Stop Oracle CRS on the node:
# crsctl stop crs
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Stop the FlashGrid Cloud Area Network service on the node:
# systemctl stop flashgrid-clan
flashgrid-sf
and/or flashgrid-clan
RPMs on the node using yum
or rpm
tool.Reboot the node:
# reboot
Wait until all disks are back online and resyncing operations complete on all disk groups before updating the next node. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
To update flashgrid-diags RPM on a node
Update the flashgrid-diags RPM using yum or rpm tool
# systemctl restart flashgrid-node-monitor
Note: Simultaneously updating OS and applying Grid Infrastructure patches in rolling fashion is not recommended. Nodes should not be rebooted while GI cluster is in rolling patching mode.
Note: Running yum update
without first stopping Oracle and FlashGrid services may result in the services restarting non-gracefully during the update.
To update OS on a running cluster repeat the following steps on each node, one node at a time
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
If the node is a database node,
a. Stop all local database instances running on the node.
b. Stop Oracle CRS on the node:
# crsctl stop crs
Stop FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Install OS updates:
# yum update
Reboot the node
# reboot
Before updating the next node, wait until the node boots up, all disks are back online, and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No before it is safe to update the next node.
# flashgrid-cluster
For applying single patches or Release Updates / Patch Set Updates to Grid Infrastructure or Database homes follow standard procedures documented by Oracle.
Note: While GI cluster is in rolling patching mode, do not reboot any nodes and do not stop FlashGrid services. Updating OS or FlashGrid software simultaneously with applying Grid Infrastructure patches in rolling fashion is not recommended.
Note: Before applying the latest Release Update from Oracle, we recommend to request confirmation from FlashGrid support . FlashGrid performs validation of every Relese Update to minimize risk of compatibility or reliability issues. Typical time to complete the validation is 3 weeks after the Release Update is publicly available.