Over the last month I have had the opportunity to add 4 additional nodes to a RAC environment. The current environment was 4 nodes running Suse 9. The decision was made that instead of replacing the existing nodes, we would add and make it an 8 node cluster. If you had the opportunity to read my post on CPU Utilization, you will understand that what we really did was increase our capacity by increasing the availability of CPU time for more transactions. Our Arrival Rate can increase without impacting our response time -- provided of course we can balance our workload across the nodes appropriately. On the horizon is an upgrade to 11gR2, so the new nodes are running Suse 10. The older nodes will be removed as we have the new one's stable and upgraded to Suse10.
The decision to run with a mix OS for a short period of time was not without at least one quirk. If you have an OCFS2 mount point you will not be able to mount it on both versions, its either 9 or 10. This is unfortunate even in the short term. The use of external tables or even jobs that still use UTL_FILE will have fail if they happen to start on one of the nodes the filesystem is not mounted on. A work around would be service groups, however that may require some code changes that might not make sense in the short term.
A work around that has been effective in the short term, is to set the local_listener on the nodes that do not have the OCFS2 filesystem to a listener where the filesystem is mounted. Let's say I have a host ORL1 and ORL2 that have the filesystem mounted, but I have ORL3 where the filesystem can not be mounted at this time.
Ensure that the TNSNAMES.ORA on ORL3 has the Listener for one of the two nodes where the filesystem is mounted:
(address = (protocol = tcp)(host = oral.medai.com)(port = 1521))
Then I change the parameter local_listener to one of the 2 hosts listeners:
alter system set local_listener=LISTENER_ORL1 scope=memory sid='orl3';
This changes the parameter only in memory and only for instance ORL3.
Now if any session connects to the listener on ORL3 it will be redirected to ORL1. Of course, this means no jobs will be directed to ORL3 which defeats the purpose of adding the additional nodes. This is only a temporary solution to adding the new nodes before removing the older nodes for upgrade of the OS.
Manipulating the local_listener parameter is also a way to redirect jobs away from a node that you may be preparing to perform maintenance. This allows jobs to continue to process that have already started, but will prevent additional jobs from starting on the instance. Allows you to bounce the instance, or node in order to perform maintenance without having to kill processes. Doesn't solve the issue if you are using inter-instance parallelism and you want to take down a instance or node for maintenance as that is a whole other discussion.
Although its not supported and Oracle strongly suggests not running in a mix OS version, it does work and will allow you the opportunity to upgrade your OS one host at a time. But its not without quirks and you need to ready to either handle the quirks or research the quirks. The OCFS2 is just quirk, I'm sure there are more I just haven't been able to pinpoint the actual issues we have experienced with the different OS versions. Although I will say each node that I have add different problems have appeared, but I think that's just Oracle and the utilities they have created to make things easier.
If anyone has some other quirks or gotchas please feel to list them out.