Tuesday, September 16, 2008

Dedicated vs Shared Connection

We have an application that's vendor canned installed on windows platform. It was on oracle 10g release 1. It has been running fine with about 200 concurrent connections until we applied a vendor provided patch on it. Right after we applied that patch, we started to get phone calls complaining about not being able to login the application. At that point, there were only about 130 connections. It turned out to be a dedicated vs shared connection issue.

The connection was supposed to be "shared", vendor configuration was to run on the default 1521 port, which our policy doesn't allow. So we changed to run on a non-default port. However this led to all supposedly "shared" connection to become "dedicated" connection from day 1 ever since we started to use the application.

So why do we start to see this issue after the patch? Before the patch, SGA was configured as 1200M, with 2GB of memory limit per application on 32-bit windows, that leaves about 2048M – 1200M = 848M for other processes/memories, including server processes that handle connections. The patch increased the SGA to 1656M, which leaves about 2048M – 1565M = 392M for other processes/memories. You see, we only have 392M instead of 848M to handle connections and other things after the patch, which explained why we hit the problem with only about 130 connections.

The issue was resolved by setting up the local_listener parameter. Once we set it up to register the service with the listener on the non-default port as follows, the connections become "shared" and the server was able to handle 200+ connections without any issue.

local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=hostname)(PORT=port number))'

Sunday, September 14, 2008

Upgrade Linux Kernel on RAC Servers

Every once a while, you may need to upgrade your linux kernel on your RAC servers. Upgrading kernel itself is pretty straight forward, however you need to make sure if other components in your RAC environment need to be upgraded as well, such as PowerPath, OCFS2, ASMlib, etc. We use ASM directly on top of raw devices, so we don't need to worry about ASMlib upgrade. Here are the detailed steps to get the kernel upgraded from 2.6.9-55 to 2.6.9-67, and PowerPath and OCFS2 upgraded to a corresponding version.

Pre-upgrade Steps:


Download kernel-largesmp-2.6.9-67.EL.x86_64.rpm from RedHat

Download PowerPath 5.1 EMCpower.LINUX-5.1.0-194.rhel.x86_64.rpm from EMC

Download OCFS2 ocfs2-2.6.9-67.ELlargesmp-1.2.8-2.el4.x86_64.rpm from oracle

Upgrade Steps:

Shutdown all RAC components gracefully on the server
a. setup environment variable

b. $ srvctl stop instance -d {database name} -i {instance name} -o transactional
c. $ srvctl stop asm -n {node}
d. $ srvctl stop nodeapps -n {node}

e. # crsctl stop crs


Comment out the last 3 lines in /etc/inittab, then reboot server, this is to disable oracle clusterware upon server reboot

#h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1

#h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1

#h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1

Install the new kernel
# rpm -ivh kernel-largesmp-2.6.9-67.EL.x86_64.rpm


Edit /etc/grub.conf to enable new kernel upon reboot, then reboot server


Upgrade PowerPath 5.1 and reboot server, check all PowerPath device names are mapped correctly
# powermt save
# rpm -Uvh EMCpower.LINUX-5.1.0-194.rhel.x86_64.rpm
# reboot


Upgrade OCFS2 1.2.8-2 and reboot server, check all cluster filesystems are mounted correctly
# rpm -Uvh ocfs2-2.6.9-67.ELlargesmp-1.2.8-2.el4.x86_64.rpm
# reboot


Uncomment out the last 3 lines in /etc/inittab, then reboot server, check all oracle services are up correctly.