Friday, October 17, 2008

A Different Way to Add More Space to ASM DiskGroup in a Linux/RAC Environment

Usually when you need to add more space to an ASM DiskGroup, you add more disks of same size to the DiskGroup. Here is a different way to accomplish the same goal, it sounds a little scary at first but it actually works. In a RAC environment with EMC SAN as storage, we have a DiskGroup A that contains one LUN of 100GB. To increase the size of this DiskGroup to 200GB, you can create a LUN of 200GB, in Navisphere, then migrate the existing 100GB LUN to this larger LUN. This migration process can run in the background and be invisible to OS/Oracle/End users. After the migration is done, you get a larger LUN/Disk in DiskGroup A, however, to make ASM "see" the extra 100GB, two more things will need to happen:

  1. Make OS recognize the extra space. Before the LUN migration, there was one partition of 100GB on that LUN, after the migration, the partition is still 100GB even though the LUN is 200GB. Re-partition is needed to make OS aware of the added extra space on the LUN. Unfortunately, fdisk doesn't provide a method to extend the partition, which leads to the scary part: the original partition has to be deleted, and a new partition is then created to take all of the LUN space. As long as you make the start of data the same as the old partition, no data will be lost. In our case, ASM sits on top of raw device without ASMLib, so the re-partition process can be done online without downtime.


     

  2. Make ASM recognize the extra space. After step 1, OS recognizes the extra space, but ASM still doesn't. You can verify it by running the following query:

    select name, total_mb, free_mb from v$asm_diskgroup;

    To make ASM recognize the extra space, you'll need to run:

    alter diskgroup {diskgroup name} resize all rebalance power 11;

    Now verify it again with the first query, you'll see that ASM now "sees" all the space from that LUN.

Wednesday, October 1, 2008

WebUtil 1.0.6 Installation/Configuration

Recently I have been working on upgrading an Oracle Forms/Reports application from Oracle Application Server 10g (9.0.4) to 10g Release 2(10.1.2.2.0), which requires upgrading WebUtil 1.0.5 to 1.0.6. I followed Oracle® Forms Developer WebUtil User's Guide Release 1.0.6, but had problem to make it work. First, I had this error when the application tried to use WebUtil: "oracle.forms.webutil.file.FileFunctions bean not found. WEBUTIL_FILE.FILE_SELECTION_DIALOG_INT will not work". This turned out to be a configuration issue in formsweb.conf. Once I copied all the [webutil] configuration section to the [application] configuration section, this error disappeared and it started to install/download WebUtil. During the download process, I ran into another error: "WUC-24: Error reading URL http://hostname:port/forms/webutil/jacob.dll". I was puzzled at first because the URL seemed to be correct and Jacob.dll was in that directory. Finally I re-extracted Jacob.dll from jacob_18.zip to my ORACLE_HOME/forms/webutil directory, and that resolved the error.

Tuesday, September 16, 2008

Dedicated vs Shared Connection

We have an application that's vendor canned installed on windows platform. It was on oracle 10g release 1. It has been running fine with about 200 concurrent connections until we applied a vendor provided patch on it. Right after we applied that patch, we started to get phone calls complaining about not being able to login the application. At that point, there were only about 130 connections. It turned out to be a dedicated vs shared connection issue.

The connection was supposed to be "shared", vendor configuration was to run on the default 1521 port, which our policy doesn't allow. So we changed to run on a non-default port. However this led to all supposedly "shared" connection to become "dedicated" connection from day 1 ever since we started to use the application.

So why do we start to see this issue after the patch? Before the patch, SGA was configured as 1200M, with 2GB of memory limit per application on 32-bit windows, that leaves about 2048M – 1200M = 848M for other processes/memories, including server processes that handle connections. The patch increased the SGA to 1656M, which leaves about 2048M – 1565M = 392M for other processes/memories. You see, we only have 392M instead of 848M to handle connections and other things after the patch, which explained why we hit the problem with only about 130 connections.

The issue was resolved by setting up the local_listener parameter. Once we set it up to register the service with the listener on the non-default port as follows, the connections become "shared" and the server was able to handle 200+ connections without any issue.

local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=hostname)(PORT=port number))'

Sunday, September 14, 2008

Upgrade Linux Kernel on RAC Servers

Every once a while, you may need to upgrade your linux kernel on your RAC servers. Upgrading kernel itself is pretty straight forward, however you need to make sure if other components in your RAC environment need to be upgraded as well, such as PowerPath, OCFS2, ASMlib, etc. We use ASM directly on top of raw devices, so we don't need to worry about ASMlib upgrade. Here are the detailed steps to get the kernel upgraded from 2.6.9-55 to 2.6.9-67, and PowerPath and OCFS2 upgraded to a corresponding version.

Pre-upgrade Steps:


Download kernel-largesmp-2.6.9-67.EL.x86_64.rpm from RedHat

Download PowerPath 5.1 EMCpower.LINUX-5.1.0-194.rhel.x86_64.rpm from EMC

Download OCFS2 ocfs2-2.6.9-67.ELlargesmp-1.2.8-2.el4.x86_64.rpm from oracle

Upgrade Steps:

Shutdown all RAC components gracefully on the server
a. setup environment variable

b. $ srvctl stop instance -d {database name} -i {instance name} -o transactional
c. $ srvctl stop asm -n {node}
d. $ srvctl stop nodeapps -n {node}

e. # crsctl stop crs


Comment out the last 3 lines in /etc/inittab, then reboot server, this is to disable oracle clusterware upon server reboot

#h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1

#h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1

#h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1

Install the new kernel
# rpm -ivh kernel-largesmp-2.6.9-67.EL.x86_64.rpm


Edit /etc/grub.conf to enable new kernel upon reboot, then reboot server


Upgrade PowerPath 5.1 and reboot server, check all PowerPath device names are mapped correctly
# powermt save
# rpm -Uvh EMCpower.LINUX-5.1.0-194.rhel.x86_64.rpm
# reboot


Upgrade OCFS2 1.2.8-2 and reboot server, check all cluster filesystems are mounted correctly
# rpm -Uvh ocfs2-2.6.9-67.ELlargesmp-1.2.8-2.el4.x86_64.rpm
# reboot


Uncomment out the last 3 lines in /etc/inittab, then reboot server, check all oracle services are up correctly.