Random Notes

Friday, April 22, 2016

Decommission a Data Node

Existing hadoop cluster: hadoop1-6, decommissioning a data node: hadoop5.

1. Edit hdfs-site.xml on hadoop1 to include:
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop/etc/hadoop/dfs.exclude</value>
</property>

2. Edit /home/hadoop/hadoop/etc/hadoop/dfs.exclude to add a line:
hadoop5

3. Run: distribute-exclude.sh dfs.exclude

4. Run: refresh-namenodes.sh

Thursday, April 21, 2016

Adding Data Node to an Existing Hadoop Cluster

Existing hadoop cluster: hadoop1-5, adding a new data node: hadoop6.

1. Clone an existing data node VM to hadoop6.

2. Edit /etc/hosts file to include hadoop6 ip address and hostname, then copy to the rest nodes in the cluster.

3. Edit slaves file to include hadoop6 hostname, then copy to the rest nodes in the cluster.

4. Delete HADOOP_DATA_DIR on hadoop6.

5. Start data node on hadoop6.
hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

6. Balance data node
hdfs balancer

Monday, September 14, 2015

OPatch failed with error code = 41 on Windows 2008 R2

I ran into this error when I was installing patch 17 to 11.2.0.3 on Windows 2008 R2. Recommended actions: OPatch needs to modify files which are being used by some processes. Even though I stopped all of the Oracle related services, it still failed with this same error. Here is how to resolve this issue:

1. Reboot server to safe mode
2. Rename ORACLE_HOME directory to something else
3. Reboot server to normal mode
4. Rename ORACLE_HOME directory to what it was
5. Apply patch as usual.

Tuesday, August 25, 2015

Set up TDE in 12c RAC

1. In 12c RAC, keystore location can be in either ASM or ACFS filesystem. We are using ASM in this setup:
Edit RDBMS HOME's sqlnet.ora to add below lines:
ENCRYPTION_WALLET_LOCATION=
(SOURCE=
   (METHOD=FILE)
     (METHOD_DATE=
       (DIRECTORY=+PSDATA/WALLET/$ORACLE_UNQNAME/)))

2. Set ORACLE_UNQNAME in .bash_profile:
In .bash_profile, set ORACLE_UNQNAME after ORACLE_ID is set:
    export ORACLE_UNQNAME=`$ORACLE_HOME/bin/srvctl config database |grep -w ${ORACLE_SID%?}`

3. Set ORACLE_UNQNAME in CRS:
Set ORACLE_UNQNAME in CRS, otherwise v$encryption_wallet and gv$encryption_wallet show different information.
    srvctl setenv database -d TDEDEMO -T "ORACLE_UNQNAME=TDEDEMO"

4. Create the keystore:
SQL> administer key management create keystore '+PSDATA/WALLET/TDEDEMO/' identified by "password1";

keystore altered.

5. Open the keystore:
SQL> administer key management set keystore open identified by "password1";

keystore altered.

6. Create the master key:
SQL> administer key management create key identified by "password1" with backup using 'TDEDEMO';

keystore altered.

7. Activate the master key:
SQL> select key_id from v$encryption_keys;

KEY_ID
------------------------------------------------------------------------------
AazYtFb200+Nv7T4i/i5e4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

SQL> administer key management use key 'AazYtFb200+Nv7T4i/i5e4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' identified by "password1" with backup using 'TDEDEMO';

keystore altered.

8. Backup the keystore (can't backup from ASM to non-ASM, or vice versa, with error ORA-46620: backup for the keystore cannot be taken)
SQL> administer key management backup keystore using 'TDEDEMO' identified by "password1" to '/psoft/backup/wallets/';
administer key management backup keystore using 'TDEDEMO' identified by "password1" to '/psoft/backup/wallets/'
*
ERROR at line 1:
ORA-46620: backup for the keystore cannot be taken

SQL> administer key management backup keystore using 'TDEDEMO' identified by "password1" to '+PSFLASH/WALLET/TDEDEMO/';

keystore altered.

9. Create an auto_login keystore:
administer key management create auto_login keystore from keystore '+PSDATA/WALLET/TDEDEMO/' identified by "password1";

Wednesday, November 26, 2014

Change 12cR1 Public IP/VIP/SCAN

Old IPs:
SCAN   10.100.10.235
       10.100.10.236
       10.100.10.237

Hosts:   10.100.10.231
       10.100.10.232

VIPs:   10.100.10.233
       10.100.10.234

New IPs:
SCAN   10.100.101.125
       10.100.101.126
       10.100.101.127

Hosts:   10.100.101.121
       10.100.101.122

VIPs:   10.100.101.123
       10.100.101.124

Step 1: As grid: (on first node only)
       oifcfg delif -global eth0/10.100.10.0
       oifcfg setif -global eth0/10.100.101.0:public

Step 2: As grid:
       [grid@psdb1 ~]$ srvctl config nodeapps -a :verify the current config
       Network 1 exists
       Subnet IPv4: 10.100.10.0/255.255.255.0/eth0, static
       Subnet IPv6:
       VIP exists: network number 1, hosting node psdb1
       VIP Name: psdb1-vip.lab.hsc.net.ou.edu
       VIP IPv4 Address: 10.100.10.233
       VIP IPv6 Address:
       VIP exists: network number 1, hosting node psdb2
       VIP Name: psdb2-vip.lab.hsc.net.ou.edu
       VIP IPv4 Address: 10.100.10.234
       VIP IPv6 Address:

       srvctl stop vip -n psdb1 -f
       srvctl relocate scan -scannumber 1 -node psdb2 (relocate scan to psdb2)
       srvctl relocate scan -scannumber 2 -node psdb2 (relocate scan to psdb2)
       srvctl relocate scan -scannumber 3 -node psdb2 (relocate scan to psdb2)


Step 3:   as root:
       vi /etc/sysconfig/network-scripts/ifcfg-eth0, change IP/GATEWAY
       Go to VM console:
       ifdown eth0
       ifup eth0
       Change NIC to the new VLAN
       ssh to the new IP
       change hostname/hostname-vip in DNS
       srvctl modify network -k 1 -S 10.100.101.0/255.255.255.0/eth0 (on first node only)
       srvctl modify nodeapps -n psdb1 -A psdb1-vip/255.255.255.0/eth0
       srvctl config nodeapps -a : verify changes

Step 4: as grid:
       srvctl start vip -n psdb1 (databases still up and servicing from psdb2)

step 5: as root: (on first node)
       srvctl stop scan_listener
       srvctl stop scan
       srvctl status scan :verify scan is off
       srvctl status scan_listener: verify scan_listener is off
       change SCAN IPs in DNS
       srvctl config scan: shows old SCAN IPs
       srvctl modify scan -n psdb-scan

Step 6: Repeat Step 2 to Step 4 for the rest nodes in the cluster

Step 7: as root:
       srvctl start scan
       srvctl start scan_listener

step 8: Rolling bounce all databases

Friday, September 26, 2014

Prevent Firewall from Closing Idle Connections between App Server and Database Server (2)

In my previous blog, I explained how to use tcp keepalive to prevent firewall from closing idle connection between app server and database server. Here I am going to explain another mechanism to accomplish the same goal, this mechanism is called oracle DCD (Dead Connection Detection). In fact this would be the preferred mechanism because it's simpler to setup, as long as you can confirm DCD packets are indeed sent out and firewall does recognize it as valid traffic, as Oracle note 257650.1 states "some later firewalls and updated firewall firmware may not see DCD packets as a valid traffic possibly because the packets that DCD sends are actually empty packets."

In one of our environments, knowing firewall has a 30 minutes timeout, we configured DCD to be sent out every 25 minutes by setting SQLNET.EXPIRE_TIME=25. However we still see TNS timeout errors in database alert logs described in my previous blog. This led me to think either DCD is not working, or our firewall doesn't recognize it as valid traffic, as Oracle note 257650.1 stated. So I went ahead and strace-ed oracle server process to confirm that DCD packets were indeed sent out (oracle note 438923.1), and had our security guy confirm that the DCD packets were recognized as valid traffic and passed through by firewall. So why are we still seeing TNS timeouts in the database alert logs?

Then I found oracle note 395505.1 describing how the DCD is triggered and stated "the first DCD probe packet would go only after 2 * expire_time and successive one's would be sent every expiry_time provided no activity in that next span too", I then strace-ed again and confirmed this statement. Now I know why we are still seeing TNS timeouts: it's because if the client makes a connection to the database server and stays idle for 30 minutes, firewall would close it because no DCD packet has yet been sent. So to make sure the first DCD packet is sent within first 30 minutes of connection, SQLNET.EXPIRE_TIME has to be set to 30/2-1=14. To prove this, I set SQLNET.EXPIRE_TIME=14, TNS timeout errors disappeared from alert logs, then I set SQLNET.EXPIRE_TIME=16, TNS timeout error re-occurred.

In conclusion, if you have firewall policy with timeout setting of x minutes, you need to at least set SQLNET.EXPIRE_TIME = x/2 - 1 or smaller.

Prevent Firewall from Closing Idle Connections between App Server and Database Server (1)

If you see TNS timeout errors in your database alert logs like below, you lost the connection between your app server and database server.

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=64656))

In my case, it was because the connection was idle and thus got closed out by firewall policy between my app server and database server. One way to prevent this from happening is to use TCP keepalive on my Linux app server to keep the connection active. Below shows how to implement this:

1. On OS level, 3 parameters are involved: tcp_keepalive_time, tcp_keepalive_intvl, and tcp_keepalive_probes. What we need to deal with is the first parameter. We need to make tcp_keepalive_time smaller than the timeout value of firewall policy that opened the ports between app server and database server. By default the firewall timeout was 30 minutes, so we want to set tcp_keepalive_time to less than 30 minutes, say 25 minutes. As root, edit /etc/sysctl.conf and add below line to the end:
net.ipv4.tcp_keepalive_time = 1500
then run “sysctl –p” to make it effective immediately and permanently.

2. Edit database connection string in tnsnames.ora file as oracle, add (ENABLE=BROKEN) in the description section, for example: DBServiceName =

(DESCRIPTION =

(ENABLE=BROKEN)

(ADDRESS = (PROTOCOL = TCP)(HOST = YourDBServerIP)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = DBServiceName)

)

3. Bounce app server.

4 4. Run "netstat –ntop |grep ESTAB", make sure app server processes are now running with TCP Keep Alive enabled, The command should return things like below:

tcp 0 0 10.100.10.230:53802 10.100.10.234:1521 ESTABLISHED 5887/PSAPPSRV keepalive (395.84/0/0)

If the processes are running without TCP Keep Alive enabled, we should see things like below:

tcp 0 0 10.100.10.230:22208 10.100.10.233:1521 ESTABLISHED 3874/PSAPPSRV off (0.00/0/0)

Once this was done, the TNS timeout errors in the database logs disappeared.

Monday, June 16, 2014

ORA-12514 while tnsping works okay

I have a 2-node 12cR1 RAC cluster that has been running great. Last Friday we replaced our juniper firewall with palo alto firewall, then I rebooted the RAC nodes. When I tried to start up my application server, it failed to start. I then noticed I couldn't connect to my database with error of ORA-12514, even though tnsping works fine. Initially we thought the firewall change might have broken it because all was working great before the firewall change and server reboot. However, since tnsping works fine, it means there is nothing blocking the application server from reaching to the database through the defined port. puzzles. puzzles. Then I noticed the remote_listener parameter was empty, which led me to look at sqlnet.ora file, and found EZCONNECT was no longer configured in NAMES.DIRECTORY_PATH. At this point, I remember that I copied a set of sqlnet.ora and ldap.ora file from a different server so that I can use LDAP to replace tnsnames.ora file for database services repository a few weeks ago. Apparently this new sqlnet.ora file doesn't have EZCONNECT configured in it. It didn't break anything until the RAC nodes rebooted, which coincided with the firewall change.

Thursday, March 13, 2014

Oracle Unbreakable Enterprise Kernel and PeopleSoft

I have been building a PeopleSoft HCM 9.2 demo environment on PeopleTools 8.53.11 this past week. Since Oracle Linux is certified with PeopleSoft as well, I chose to use it for my OS. Everything went smooth, as I have done this kind of build many times. Got the app server and web server started, I was eager to get to the sign on page to try to login. However, the sign on page took a very long time (more than 30 minutes) to show up, with an error message: "CHECK APPSERVER LOGS. THE SITE BOOTED WITH INTERNAL DEFAULT SETTINGS, BECAUSE of: bea.jolt.ServiceException:bea.jolt.JoltRemoteService(.GETALL)call():Timeout\nbea.jolt.SessionException:Connection recv error\nbea.jolt.JoltException:[1]NwHddlr.recv():Timeout Error". Looked through all the app server logs and didn't find anything out of ordinary. Google showed no hits. Another strange thing was that the app server wouldn't shut down, even right after it's just brought up, it always hangs at the second process:

Shutting down server processes ...

Server Id = 250 Group Id = JREPGRP Machine = xxx.xxx.xxx: shutdown succeeded
Server Id = 200 Group Id = JSLGRP Machine = xxx.xxx.xxx:

Opened a case with oracle support, not much help. Compared all the settings with our other working environments, not much difference. Really bothered me why it's not working. As a last effort, I switched back to boot with the RedHat Compatible Kernel, and issue disappeared! The problem kernel in my case is: 3.8.13-26.2.1.el6uek.x86_64. I would have never imaged that the unbreakable kernel would cause this issue, as I just built an Oracle 12c RAC with this kernel and my HCM 9.2 demo database has been running on this 12c RAC very well.

Wednesday, July 24, 2013

scsi_id returns nothing on OEL6 running on VMware

Today while I was trying to setup UDEV for an Oracle 12cR1 RAC environment, I found scsi_id returns nothing. After googling around, the solution is as below:

1. shutdown the VM
2. right click the VM, then left click 'Edit Settings'
3. click 'Options' tab
4. click on 'General', then 'Configuration Parameters'
5. click 'Add Row'
6. add a parameter 'disk.EnableUUID' and set it to 'True', click 'OK'

boot up the server and now scsi_id -gud /dev/sdx returns values.

Tuesday, April 2, 2013

Virus Scan of Documents Uploaded to PeopleSoft

Lately I was assigned a task to configure PeopleSoft Web Server with McAfee VirusScan Enterprise for Linux (which we already owned). It turned out that McAfee VSEL doesn't support ICAP, therefore it won't work with PeopleSoft. Then I was able to download a "Symantec Protection Engine for Cloud Services" trial version and get it to work with weblogic web server. The sample VirusScan.xml file that PeopleSoft provided with PIA installation worked like a charm in a tools 8.52 environment.

Friday, January 25, 2013

PeopleTools 8.52 and libpsio_dir.so

I recently upgraded our SA and HRMS's PeopleTools from 8.50.20 to 8.52.10, the upgrade itself was pretty straight forward and went smoothly. However, it broke the LDAP authentication. When you go to "PeopleTools -> Security -> Directory -> Configure Directory", click the "search" button, click the Directory ID, then click the "Test Connectivity" tab, it just hangs there. Under the "Directory Setup" tab, all settings were confirmed correct. After some research, it turns out the problem was PS_HOME/bin/libpsio_dir.so. It appears PeopleTools 8.52 now delivers libpsio_dir.so under PS_HOME/bin/interfacedrivers, if the old 8.50.20 PS_HOME/bin/ libpsio_dir.so still exists, it will get called instead of the new 8.52.10 PS_HOME/bin/interfacedrivers/libpsio_dir.so, thus causing the "hang" issue. So remember to remove the old libpsio_dir.so after the upgrade is done.

Friday, December 14, 2012

Setup PeopleTools 8.52 PIA

Yesterday I was setting up a PIA for an CRM9.1 Demo environment and ran into below warning:

This version of PeopleSoft PeopleTools requires a 64-bit Oracle WebLogic
installation. The selected Oracle WebLogic is a 32-bit installation. Are you
sure you want to continue?

Since I didn't install the weblogic on the server, I had no idea if the weblogic that was pointed to is 32 bit or 64 bit. After tweaking a few things and still the same issue, I decided to reinstall weblogic. To ensure weblogic is 64 bit, the key is to use 64 bit java to install weblogic. Below is how to check if your java is 32 bit or 64 bit:

./java -d64 -version
Running a 64-bit JVM is not supported on this platform.

This means this java is not 64 bit.

./java -d64 -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Oracle JRockit(R) (build R28.1.0-123-138454-1.6.0_20-20101014-1350-linux-x86_64, compiled mode)

This means this java is 64 bit.

Once I reinstalled weblogic with 64 bit java, the warning during setting up 8.52 PIA disappeared.

Friday, November 9, 2012

RMAN Duplication from Active RAC database to non-RAC database

I ran into below error when trying to RMAN duplicate from an active RAC database to a non-RAC database, the issue turned out to be oracle version mismatch between Target and Auxiliary. In my case, both Target and Auxiliary are 11.2.0.2.3, however, PSU 11.2.0.2.3 has 2 portions, one for GI_HOME and one for RDBMS_HOME. On my RAC cluster, both portions of PSU 11.2.0.2.3 were applied, while on my non-RAC host, only database portion of PSU 11.2.0.2.3 was applied. I ended up installing a new RDBMS_HOME and applying both portions of PSU 11.2.0.2.3 to this new RDBMS_HOME, the issue resolved by using this new RDBMS_HOME.

copying current control file
Oracle instance started
Segmentation fault

Another issue I ran into was as below, I was puzzled for a little bit because I know my Auxiliary's SYS password was correct and I was able to use it to startup/shutdown the instance. I had to change my Auxiliary's SYS password to be identical to my Target's SYS password. Once this was done, all went well.

contents of Memory Script:
{
   sql clone "alter system set db_name =
''SA9TOLD'' comment=
''Reset to original value by RMAN'' scope=spfile";
   sql clone "alter system reset db_unique_name scope=spfile";
   shutdown clone immediate;
}
executing Memory Script

sql statement: alter system set db_name = ''SA9TOLD'' comment= ''Reset to original value by RMAN'' scope=spfile

sql statement: alter system reset db_unique_name scope=spfile

Oracle instance shut down
released channel: ch1
released channel: ch2
released channel: ch3
released channel: ch4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 11/09/2012 13:27:49
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on ch1 channel at 11/09/2012 13:27:39
ORA-17629: Cannot connect to the remote database server
ORA-17627: ORA-01017: invalid username/password; logon denied
ORA-17629: Cannot connect to the remote database server

Sunday, July 29, 2012

Log Corruption on Standby Database.

Oracle 11.2.0.2 on Oracle Enterprise Linux 6.1, kernel bug caused log corruption on standby database. It was not a certified configuration by Oracle (historical reason beyond my control). Upgraded to OEL 6.2, log corruption disappeared. I had to use below note to roll forward standby database at times before the kernel was upgraded.

Steps to perform for Rolling forward a standby database using RMAN incremental backup when primary and standby are in ASM filesystem [ID 836986.1]

This worked well in my Non-ASM filesystem with OMF.

Tuesday, October 11, 2011

Run 10.2.0.3 and 11.1.0.7 RDBMS on 11.2.0.2 Grid Infrastructure

Recently I've been tasked to a project that needs to run 10.2.0.3 and 11.1.0.7 databases on the latest 11.2.0.2 Grid Infrastructure, I thought that would be easy, as the latest Grid Infrastructure should support the older RDBMS well, but it turns out it's not as easy as I thought. Here are the notes that describe what needs to be done to make it work:

1. Install 10.2.0.3 RDBMS: I got an error stating that it's not a correct oracle clusterware version, I ignored this error and the installation went just fine.

2. Create 10.2.0.3 database using dbca: I ran into "ORA-29702: error occurred in Cluster Group Service operation", had to "crsctl pin css -n node1 node2" to get past this, then I got "DBCA could not startup the ASM instance configured on this node. To processd with database creation using ASM you need the ASM instance to be up and running. Do you want to recreate the ASM instance on this node?" I applied patch 8288940 to get past this. Then I got "Encountered file error when copying listeners from home=/opt/app/11.2.0/grid", had to create a symbolic link "listener.ora" under the RDBMS $ORACLE_HOME/network/admin to point to the endpoints_listener.ora under $GI_HOME/network/admin, and setup TNS_ADMIN variable to point to $ORACLE_HOME/network/admin. Once all these are done, I was finally able to create databases using dbca.

3. To use srvctl to start/stop 10.2.0.3 database, I ran into "/opt/app/oracle/product/10.2.0/db_1/jdk/jre/bin/java: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory". I had to comment out "LD_ASSUME_KERNEL" in $ORACLE_HOME/bin/srvctl to fix this issue.

4. Install 11.1.0.7 RDBMS: This worked flawlessly.

5. Create 11.1.0.7 database using dbca: Same issues with creating 10.2.0.3 database, apply the same fixes, I was able to create the database successfully, but at the end it fails to start the database with error PRKP-1001 and CRS-0215. However you can start and stop the database in SQL/Plus.

6. To use srvctl to start/stop 11.1.0.7 database, I ran into the same error PRKP-1001 and CRS-0215. I had to apply patch 9294495 to fix it.

Thursday, August 18, 2011

Install 11.1.0.1.0 Grid Control Agent on Windows 2008R2 using agentDownload.vbs

First download wget for windows, you can find it easily on the internet. Suppose you put wget.exe under C:\, you need to add this to the PATH environment variable; otherwise you will run into the following error:

cmd is:WGET.exe http://mksllc01p:4889/agent_download/10.2.0.4.0/agent_download.rsp
C:\WGET\agentDownload.vbs(77, 4) (null): The system cannot find the file specified.

Now you can use wget to download the agentDownload.vbs file from your OMS server as below:

Wget http://:/agentdownload/11.1.0.1.0/windows_x64/agentDownload.vbs

Once this is done, you are ready to install:

C:\>cscript agentDownload.vbs m r b y

It’ll download the response file and get the agent installed.

After the installation is finished, the agent fails to start. After looking at the logs, it turns out to be time zone setting mismatch between the agent and the setting on the OMS. Looking at \sysman\config\emd.properties, the timezone is set to agentTZRegion=America/Karachi, after changing it to the correct time zone, agent started successfully.

Friday, October 17, 2008

A Different Way to Add More Space to ASM DiskGroup in a Linux/RAC Environment

Usually when you need to add more space to an ASM DiskGroup, you add more disks of same size to the DiskGroup. Here is a different way to accomplish the same goal, it sounds a little scary at first but it actually works. In a RAC environment with EMC SAN as storage, we have a DiskGroup A that contains one LUN of 100GB. To increase the size of this DiskGroup to 200GB, you can create a LUN of 200GB, in Navisphere, then migrate the existing 100GB LUN to this larger LUN. This migration process can run in the background and be invisible to OS/Oracle/End users. After the migration is done, you get a larger LUN/Disk in DiskGroup A, however, to make ASM "see" the extra 100GB, two more things will need to happen:

Make OS recognize the extra space. Before the LUN migration, there was one partition of 100GB on that LUN, after the migration, the partition is still 100GB even though the LUN is 200GB. Re-partition is needed to make OS aware of the added extra space on the LUN. Unfortunately, fdisk doesn't provide a method to extend the partition, which leads to the scary part: the original partition has to be deleted, and a new partition is then created to take all of the LUN space. As long as you make the start of data the same as the old partition, no data will be lost. In our case, ASM sits on top of raw device without ASMLib, so the re-partition process can be done online without downtime.
Make ASM recognize the extra space. After step 1, OS recognizes the extra space, but ASM still doesn't. You can verify it by running the following query:

select name, total_mb, free_mb from v$asm_diskgroup;

To make ASM recognize the extra space, you'll need to run:

alter diskgroup {diskgroup name} resize all rebalance power 11;

Now verify it again with the first query, you'll see that ASM now "sees" all the space from that LUN.

Wednesday, October 1, 2008

WebUtil 1.0.6 Installation/Configuration

Recently I have been working on upgrading an Oracle Forms/Reports application from Oracle Application Server 10g (9.0.4) to 10g Release 2(10.1.2.2.0), which requires upgrading WebUtil 1.0.5 to 1.0.6. I followed Oracle® Forms Developer WebUtil User's Guide Release 1.0.6, but had problem to make it work. First, I had this error when the application tried to use WebUtil: "oracle.forms.webutil.file.FileFunctions bean not found. WEBUTIL_FILE.FILE_SELECTION_DIALOG_INT will not work". This turned out to be a configuration issue in formsweb.conf. Once I copied all the [webutil] configuration section to the [application] configuration section, this error disappeared and it started to install/download WebUtil. During the download process, I ran into another error: "WUC-24: Error reading URL http://hostname:port/forms/webutil/jacob.dll". I was puzzled at first because the URL seemed to be correct and Jacob.dll was in that directory. Finally I re-extracted Jacob.dll from jacob_18.zip to my ORACLE_HOME/forms/webutil directory, and that resolved the error.

Tuesday, September 16, 2008

Dedicated vs Shared Connection

We have an application that's vendor canned installed on windows platform. It was on oracle 10g release 1. It has been running fine with about 200 concurrent connections until we applied a vendor provided patch on it. Right after we applied that patch, we started to get phone calls complaining about not being able to login the application. At that point, there were only about 130 connections. It turned out to be a dedicated vs shared connection issue.

The connection was supposed to be "shared", vendor configuration was to run on the default 1521 port, which our policy doesn't allow. So we changed to run on a non-default port. However this led to all supposedly "shared" connection to become "dedicated" connection from day 1 ever since we started to use the application.

So why do we start to see this issue after the patch? Before the patch, SGA was configured as 1200M, with 2GB of memory limit per application on 32-bit windows, that leaves about 2048M – 1200M = 848M for other processes/memories, including server processes that handle connections. The patch increased the SGA to 1656M, which leaves about 2048M – 1565M = 392M for other processes/memories. You see, we only have 392M instead of 848M to handle connections and other things after the patch, which explained why we hit the problem with only about 130 connections.

The issue was resolved by setting up the local_listener parameter. Once we set it up to register the service with the listener on the non-default port as follows, the connections become "shared" and the server was able to handle 200+ connections without any issue.

local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=hostname)(PORT=port number))'