Note: Do not use this procedure if you need to restart the entire cluster. Instead, see instructions for restarting the entire cluster.
To reboot one node in a running cluster
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
Reboot the node using flashgrid-node command. It will gracefully put the corresponding failure group offline.
# flashgrid-node reboot
In some cases it may be desirable to restart all nodes of the cluster simultaneously instead of rebooting them one by one.
Note: Do not reboot all nodes simultaneously using reboot or flashgrid-node reboot command. This may lead to CRS failure to start if one node goes down when CRS is already starting on another node.
To restart the entire cluster down
Stop all running databases.
Stop Oracle cluster services on all nodes.
# crsctl stop cluster -all
Stop all cluster node VMs using Azure console.
To power off one node in a running cluster
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
Stop FlashGrid services on the node. It will gracefully put the corresponding failure group offline, stop CRS, and stop Flashgrid services.
# flashgrid-node stop
To shut the entire cluster down
Stop Oracle cluster services on all nodes.
# crsctl stop cluster -all
Resizing database node VMs may be needed for performance or cost reasons. Resizing can be done for one node at a time without causing database downtime.
To resize database node VMs in a running cluster repeat the following steps on each database node, one node at a time
Update SGA and PGA sizing parameters for the databases according to the new VM memory size
Skip this step unless you have vm.nr_hugepages
parameter in /etc/sysctl.conf
manually configured. If you have it manually configured then update the parameters according to the new VM size. Note that starting with Storage Fabric 19.02 HugePages are configured automatically by default and manual change is not required.
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
Stop all local database instances running on the node.
Stop Oracle CRS on the node:
# crsctl stop crs
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Stop the VM using Azure console
Resize the VM using Azure console
Start the VM using Azure console
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
Start all database instances on the node
When adding new disks make sure that each disk group has disks of the same size and that the number of disks per node is the same.
To add new disks in a running cluster
Create and attach new disks to the database node VMs. Attach the disks using LUN numbers 1 through 49 - these LUNs will be shared automatically by FlashGrid Storage Fabric.
Note: Read-only caching must be enabled for all new disks. Read-Write and None modes are not supported and may create reliability problems.
Confirm FlashGrid names of the new drives, e.g. rac2.lun3
$ flashgrid-cluster drives
If the new drives are not listed then check that the corresponding devices (e.g. /dev/lun3) are visible in the OS. If they are visible in the OS then run # flashgrid-node reload-config
and check output of flashgrid-cluster drives
again. If they are not visible in the OS then double-check that you have attached them with correct LUN numbers.
Add the new disks to an existing disk group (or create a new disk group).
Example A (adding 2 disks rac1.lun3 and rac2.lun3):
$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac1.lun3 /dev/flashgrid/rac2.lun3
Example B (using wildcards for adding 6 disks lun3/lun4/lun5 on rac1/rac2 nodes):
$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac[12].lun[3-5]
1) Shutdown the first database node gracefully, follow 1-4 steps from https://kb.flashgrid.io/maintenance/maintenance-azure#powering-off-one-node
2) Resize disks from Azure portal
Increase all database disks belonging to the same diskgroup to the desired size. Make sure disks in the same diskgroup have the same sizes (except quorum).
3) Start the database node
4) Wait until flashgrid-cluster
shows cluster status "Good"
5) Repeat 1-4 steps for the next database nodes (no need to increase disks for quorum, it is only necessary for the database nodes)
6) Check new disk sizes using the following command:
# flashgrid-dg show -G <your_dg>
Phys_GiB column must show increased size.
7) Connect to the ASM from any database node and run:
# su - grid
$ sqlplus / as sysasm
SQL> ALTER DISKGROUP <your_dg> RESIZE ALL;
The above command will resize all disks in the specified diskgroup based on their size returned by OS.
8) Check new size:
# flashgrid-dg show -G <your_dg>
Phys_GiB and ASM_GiB should have now same increased size.
To remove disks from a running cluster
Determine FlashGrid names of the drives to be removed, e.g. rac2.lun3:
$ flashgrid-cluster drives
SQL> alter diskgroup MYDG
drop disk RAC1$LUN3
drop disk RAC2$LUN3
rebalance wait;
Prepare the disks for removal. Example:
[fg@rac1 ~] $ sudo flashgrid-node stop-target /dev/flashgrid/rac1.lun3
[fg@rac2 ~] $ sudo flashgrid-node stop-target /dev/flashgrid/rac2.lun3
ASM will drop a disk from a disk group if the disk stays offline for longer than the disk repair time. If the disk was taken offline because of an intermittent problem, for example a network problem, then you can re-add such disk to the disk group. Force option must be used for re-adding such disk because it already contains ASM metadata.
Example of re-adding a regular disk:
# flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.lun3 -f
Example of re-adding a quorum disk:
# flashgrid-dg add-disks -G MYDG -q /dev/flashgrid/racq.lun2 -f
FlashGrid Cloud Cluster Node Update package is a single self-extracting bash script file that allows updating the following components:
Using this package makes it easier to have the update performed to the latest validated set of software components and helps avoid accidental installation of incompatible software versions.
Note: Please review corresponding release notes and check with FlashGrid support before performing any major version update. A major version consists of the first two numbers. The third number represents a revision (hotfix). For example, update from version 19.02.x to 19.05.x is major, but from 19.05.100 to 19.05.200 is a hotfix revision.
Note: Simultaneously updating FlashGrid software and applying Grid Infrastructure patches in rolling fashion is not recommended. FlashGrid services should not be stopped while GI cluster is in rolling patching mode.
Note: For OS upgrade from RH/OL 7.5, please check Issues and Mandatory Steps before performing upgrade.
To update software using FlashGrid Cloud Cluster Node Update package on a running cluster, repeat the following steps on each node, one node at a time
Create backup snapshot of the OS disk
a. Flush OS buffers:
# sync
b. Create snapshot of the OS disk using Azure portal or CLI
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
If the node is a database node,
a. Stop all local database instances running on the node.
b. Stop Oracle CRS on the node:
# crsctl stop crs
Stop the FlashGrid Diagnostics monitoring service:
# systemctl stop flashgrid-node-monitor
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Stop the FlashGrid Cloud Area Network service on the node:
# systemctl stop flashgrid-clan
Run the update script as root.
Example with kernel update:
# bash flashgrid_cluster_node_update-19.5.17.85011.sh
Example without kernel update:
# bash flashgrid_cluster_node_update-19.5.17.85011.sh skip-kernel-update
Reboot the node:
# reboot
Before updating the next node, wait until the node boots up, all disks are back online, and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No before it is safe to update the next node.
# flashgrid-cluster
Note: In most cases using FlashGrid Cloud Cluster Node Update package is recommended for updating FlashGrid software and OS kernel.
Note: Please review corresponding release notes and check with FlashGrid support before performing any major version update. A major version consists of the first two numbers. The third number represents a revision (hotfix). For example, update from version 19.02.x to 19.05.x is major, but from 19.05.100 to 19.05.200 is a hotfix revision.
Note: Simultaneously updating FlashGrid software and applying Grid Infrastructure patches in rolling fashion is not recommended. FlashGrid services should not be stopped while GI cluster is in rolling patching mode.
To update flashgrid-sf and/or flashgrid-clan RPM on a running cluster repeat the following steps on each node, one node at a time
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
If the node is a database node,
a. Stop all local database instances running on the node.
b. Stop Oracle CRS on the node:
# crsctl stop crs
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Stop the FlashGrid Cloud Area Network service on the node:
# systemctl stop flashgrid-clan
Update the flashgrid-sf
and/or flashgrid-clan
RPM on the node using yum
or rpm
tool.
Run sudo flashgrid-clan-cfg deploy-config-local
after updating flashgrid-clan
RPM.
Reboot the node:
# reboot
Wait until all disks are back online and resyncing operations complete on all disk groups before updating the next node. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
To update flashgrid-diags RPM on a node
Update the flashgrid-diags RPM using yum or rpm tool
# systemctl restart flashgrid-node-monitor
Note: Simultaneously updating OS and applying Grid Infrastructure patches in rolling fashion is not recommended. Nodes should not be rebooted while GI cluster is in rolling patching mode.
Note: Running yum update
without first stopping Oracle and FlashGrid services may result in the services restarting non-gracefully during the update.
To update OS on a running cluster repeat the following steps on each node, one node at a time
Create backup snapshot of the OS disk
a. Flush OS buffers:
# sync
b. Create snapshot of the OS disk using Azure portal or CLI
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
If the node is a database node,
a. Stop all local database instances running on the node.
b. Stop Oracle CRS on the node:
# crsctl stop crs
Stop FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
Install OS updates:
# yum update
Reboot the node
# reboot
Before updating the next node, wait until the node boots up, all disks are back online, and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No before it is safe to update the next node.
# flashgrid-cluster
For applying single patches or Release Updates / Patch Set Updates to Grid Infrastructure or Database homes follow standard procedures documented by Oracle.
Note: While GI cluster is in rolling patching mode, do not reboot any nodes and do not stop FlashGrid services. Updating OS or FlashGrid software simultaneously with applying Grid Infrastructure patches in rolling fashion is not recommended.
Note: Before applying the latest Release Update from Oracle, we recommend to request confirmation from FlashGrid support . FlashGrid performs validation of every Relese Update to minimize risk of compatibility or reliability issues. Typical time to complete the validation is 3 weeks after the Release Update is publicly available.