Monday, April 12, 2010

ASM Issues When Adding A New Node

It seems with every new node that is added a different issue slaps us in the face.  It would be so nice if just one node went in smoothly and without incident... of course it would be even better if resources were available to test adding the nodes, but that's not the issues I want to document in this post.

There have been a few issues with ASM during this exercise of adding nodes; ASM Instance Number incorrect, and Disk partitions not shared across all nodes -- actually has happened twice but for 2 different reasons.

Issue 1:  ASM Instance Number Incorrect.  The ASM instances are using an spfile

$ORACLE_HOME/dbs/spfile+ASMx.ora

The x is replaced with the number of the ASM instance.

Once the configuration is set in DBCA, the utility asks this is an ASM database would you like to extend the ASM instance?  Of course, the answer is yes or there will be problems when the database instance is extended -- Disks will not be available.

However, this bombs with the Instance Number is incorrect.  It seems that the spfile that DBCA decides to use is a generic version

$ORACLE_HOME/dbs/spfile+ASM.ora

That's an spfile without the instance number attached.  Since you can't edit the spfile directly, back on the original install host create a pfile from the spfile while logged into the ASM instance

create pfile from spfile;

I'm good with the location, but if you want to specify the directory and filename that can be done with

create pfile='/oracle/products/dbs/init+ASM1.ora' from spfile;

Now edit the pfile, I add the new instance number as well as any instance numbers that maybe missing.  Then recreate the spfile file.

create spfile from pfile;

Then I copy this file over to the new nodes spfile+ASM.ora generic file.

Rerun DBCA and the ASM instance is extended, as does the database.

Issue 2a:  Disk Partitions not shared across all nodes

The first time the message was received only one of the partitions wasn't being seen.  The fix:
1.  Ensuring that Oracle was in the Disk group (sets the permissions to access the disks)
2.  Check the raw devices in the /raw/dev file to ensure they are correctly specified and not duplicated.  This
     will be on the new node.

And in case anyone is wondering how I did 1 and 2, I'll have to defer to my boss as he's actually the one that handles the raw devices and ensuring the powerpath ids all match up.

Issue 2b:  Disk Partitions not shared across all nodes and all of the diskgroups are listed

Metalink Note: describes this as symbolic link is used on the instance the initial install was performed from, and the spfile needs to be re-created without the symbolic link and the ASM instance bounced.

Since this is my fourth node and this is the only one I have had this issue with I'm inclined to believe it has something to do with how I created the spfile and the pfile that points to the spfile.  So tomorrow I get to experiment on how to correct this situation.  My goal is to correct it, without having to bounce any of the ASM instances.

So tomorrow should be an interesting day.

I still don't have a solution to this problem. I did open a SR with Oracle which they haven't provided any useful information to try only asked for information that was already provided and files that were already uploaded. I often wonder if the support analyst read the actual information or just make assumptions so that they can push the issue back to the customer and get the clock off of them.

No comments:

Post a Comment