Monday, July 27, 2009

R12 AND OC4J STANDARD

1)OC4J is used for executing Servlets,Java Server Pages (JSP), Enterprise Java Beans (EJB)
2)OC4J replaces the older JServ implementation for running servlets on the web server
3)Oracle Application Server 10gR3 (10.1.3) is the latest production version

OC4J is based on J2EE standards:Specific directory structure,File requirements (content & naming conventions),XML file definitionOC4J instances run in JVM’s and communicate through mod_oc4j / Apache

R12 creates 3 OC4J instances:
**************************************
Oacore: runs OA Framework-based applications
Forms: runs Forms-base applications
OAFM: runs web services, mapviewer, ascontrol


Number of OC4J instances for each group will be determined by corresponding nprocs context variable ( s_oacore_nprocs, s_forms_nprocs/s_frmsrv_nprocs, s_oafm_nprocs).

Servlets deployed to OC4J server adhere to J2EE specification
Deployment is through an Enterprise Archive (EAR) file, which contains the application definition and Web Application (WAR) files, which in turn includes the web application code (JAR), and associated configuration files (servlet definitions), JSP code,HTML

Forms.EAR 10.1.2 is deployed to the OC4J container in Application Server 10.1.3AutoConfig is used for configuration management similar to Release 11i.

Important files used for configuation of OC4J instances

Oc4j.properties : defines basic Apps directory aliasing,dbc file location is defined in this fileServer.xml : defines J2EE applications and their shared libraries for runtime OC4J.
Orion-application.xml : defines location of Java classes of all J2EE web modules deployed under J2EE. applicationOrion-web.xml : defines servlet level parameters for J2EE web modules.

These files are synomynous with jserv.conf,jserv.properties in 11i.

Oracle Process Manager and Notification server (OPMN) manages AS components and consists of:

Oracle Notification Server (ONS):

Delivers notifications between components
OHS<->OPMN<->OC4J

Process Manager (PM) start,stop, restart, death detection

($ADMIN_SCRIPTS_HOME contains the Apps equivalent scripts called ad*)
Single configuration file(opmn.xml) is used OPMN to manage the services.
Config file location is given as $ORA_CONFIG_HOME/10.1.3/opmn/conf/opmn.xmlServices managed by opmn are 1) HTTP_Server
2) oacore
3) forms
4) oafm

Sunday, July 26, 2009

Overview Of Data Guard Concepts

Data Guard is software that maintains a standby database, or real-time copy of a primary database. Data Guard is an excellent High Availability (HA) solution, and can be used for Disaster Recovery (DR) when the standby site is in a different geographical location than the primary site.

When the sites are identical, and the physical location of the production database is transparent to the user, the production and standby roles can easily switch between sites for many different types of unplanned or planned outages.

Oracle Data Guard manages the two databases by providing remote archiving, managed recovery, switchover and failover features. A secondary site that is identical to the primary site allows predictable performance and response time after failing over or switching over from the primary site. An identical secondary site also allows for identical procedures, processes, and management between sites. The secondary site is leveraged for all unplanned outages not resolved automatically or quickly on the primary site, and for many planned outages when maintenance is required on the primary site.

Data Guard with a physical standby database provides benefits, which fall into two broad classes:
Availability and disaster protection - provides protection from human errors, data failures, and from physical corruptions due to device failure. Provides switchover operations for primary site maintenance, and different database protection modes to minimize or create no data loss environments. A specified delay of redo application at the standby database can be configured to ensure that a logical corruption or error such as dropping a table will be detected before the change is applied to the standby database. Using the standby database, most database failures are resolved faster than by using on-disk backups since the amount of database recovery is dramatically reduced. The standby database can be geographically separate from the primary database, a feature that provides Disaster Recovery against local catastrophic events.
Data Guard, therefore, provides a higher degree of availability than other HA methods that do not employ a second database, such as Real Application Clusters (RAC) or Highly Available Disk Arrays (HADA).

Manageability - provides a framework for remote archiving services and managed standby recovery, contains role management services such as switchover and failover, and allows offloading of backups and read-only activities from the production database. The Data Guard broker provides the Data Guard Manager GUI and command-line interface to automate the management and monitoring of the Data Guard environment.
Operational Requirements
Below are operational requirements for maintaining a standby database. Some of these requirements are more lax then Data Guard best practices would dictate (see Best Practices For Data Guard Configurations below.
· The primary database must run in ARCHIVELOG mode.
· The primary and standby databases must be the same database release.
To use the Data Guard broker, the database server must be licensed for Oracle9i Enterprise Edition or Personal Edition. The operating system on the primary and standby sites must be the same, but the operating system release does not need to be the same. The hardware and operating system architecture on the primary and standby locations must be the same. For example, a Data Guard configuration with a primary database on a 32-bit Linux system must be configured with a standby database on a 32-bit Linux system.
· The primary database can be a single instance database or a multi-instance Real Application Clusters database. The standby databases can be single instance databases or multi-instance Real Application Clusters databases, and these standby databases can be a mix of both physical and logical types.
· If using a physical standby database, log transport services must be configured to specify a dedicated server process rather than a shared server (dispatcher) process in managed recovery mode. Although the read-only mode allows a shared server process, you must have a dedicated server once you open the database again in managed recovery mode.
· The hardware (for example, the number of CPUs, memory size, storage configuration) can be different between the primary and standby systems.
· Each primary database and standby database must have its own control file.
· If you place your primary and standby databases on the same system, you must adjust the initialization parameters correctly.

Much of the material in this section is taken from Oracle9i Data Guard Concepts and Administration.
Oracle9i Data Guard Concepts and Administration, Section 5.1 Introduction to Log Transport Services. This requirement is easy to miss, only found referenced in a Note in this section.

Oracle Data Guard

Oracle Data Guard can be deployed to maintain standby databases at a secondary site. These standby databases are maintained as synchronized copies of the production database.
If the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch the standby database to the production role, thus minimizing the downtime associated with the outage, and preventing any data loss.
The ability to create standby databases was first offered as a feature in Oracle 8i, although users were creating manual standby databases since Oracle 7.3.

Features introduced in Oracle 8i
------------------------------------------
Read-only physical standby database
Managed recovery of standby
Remote archiving of redo log files

Features introduced in Oracle 9i
---------------------------------------
Integrated zero-data-loss capability
Data Guard Broker w/ Data Guard Manager GUI
Command Line Interface (CLI)
Switchover and Failover operations
Automatic gap resolution
Automatic Synchronization
Logical standby databases
Maximum protection / availability
Enhanced Data Guard Broker
Cascaded redo log destinations

Features introduced in Oracle 10g
------------------------------------
Real-time apply
Recovery through OPEN RESETLOGS
Simplified configuration with VALID_FOR attribute
Standby redo log support on logical standby databases
Improved redo transmission security
Improved support for RAC
Zero downtime instantiation of logical standby databases
Fast-start Failover
Flashback Database across Data Guard switchovers
Asynchronous Redo Transmission
Faster Redo Apply failover

Features introduced with Oracle 11g
---------------------------------------------------------
Standby databases can remain open while doing recovery
Heterogeneous platform support (standby can be on a different platform). For example, production on Linux and standby on Windows.

Oracle 10g Laptop/nodeless RAC Howto

Oracle 10g Laptop/nodeless RAC Howto
This procedure is only for experimaental purpose for testing RAC features.

I tried it on a Desktop it works
=====================

Oracle's model of clustering involves multiple instances (software processes)talking to a single database (physical datafiles). .
Let's get started, what will we need to do? Here's a quick outline of thesteps involved:
1. setup ip addresses of the virtual servers
2. setup ssh and rsh with autologin configured
3. setup the raw devices Oracle's ASM software will use
4. install the clusterware softare, and then Oracle's 10g software
5. setup the listener and an ASM instance
6. create an instance, start it, and register with srvct
7. create a second instance & undo tablespace, & register it

1. Setup IP Addresses
----------------------
Oracle wants to have a few interfaces available to it. To follow our analogyof a hitchhiker traveling across America, we'll name our server route66.

So add that name to your /etc/hosts/ file along with the private and vipnames:
192.168.0.19 route66
192.168.0.75 route66-priv
#192.168.0.76 route66-vip

Notice that we've commented out route66-vip. We'll explain more aboutthis later, but suffice it to say now that the clusterware installeris very finicky about this.

In order for these two additional names to be reachable, we needethernet devices to associate with those IPs.
It's a fairly straightforwardthing to create with ifconfig as follows:

$ /sbin/ifconfig eth0:1 192.168.0.75 netmask 255.255.255.0 broadcast 192.168.0.255$ /sbin/ifconfig eth0:2 192.168.0.76 netmask 255.255.255.0 broadcast 192.168.0.255

If your IPs, or network is configured differently, adjust the IP or broadcastaddress accordingly.

2. setup ssh and rsh with autologin
------------------------------------
Most modern Linux systems do *NOT* come with rsh installed. That's forgood reason, because it's completely insecure, and shouldn't be used atall. Why Oracle's installer requires it is beyond me, but you'll needit. You can probably disable it once the clusterware is installed.
Head over to http://rpmfind.net and see if you can find a copy foryour distro. You might also have luck using up2date or yumm if youalready have those configured, as they handle dependencies, and alwaysdownload the *right* version. With rpm, install this way:

$ rpm Uvh rsh-server-0.17-34.1.i386.rpm$ rpm Uvh rsh-0.17-34.1.i386.rpm
Next enable autologin by adding names to your /home/oracle/.rhosts file.After starting rsh, you should be able to login as follows:
$ rsh route66-priv

Once that works, move on the the sshd part. Most likely ssh is already onyour system, so just start it (as root):
$ /etc/rc.d/init.d/sshd start
Next, as the "oracle" user, generate the keys:
$ ssh-keygen -t dsa
Normally you would copy id_dsa.pub to a remote system, but for us wejust want to login to self. So copy as follows:

$ cd .ssh
$ cp id_dsa.pub authorized_keys
$ chmod 644 authorized_keys

Verify that you can login now:
$ ssh route66-priv
3. setup the raw devices
-------------------------
Most of the time when you think of files on a Unix system, you'rethinking of files as represented through a filesystem. A filesystemprovides you a way to interact with the underlying disk hardwarethrough the use of files. A filesystem provides buffering, toimprove I/O performance automatically. However in the case of adatabase like Oracle, it already has a sophisticated mechanism forbuffering which is smart in that it knows everything about it'sfiles, and how it wants to read and write to them. So for anapplication like Oracle unbuffered I/O is ideal. It bypasses awhole layer of software, making your overall throughput faster!You achieve this feat of magic using raw devices. We're going tohand them over to Oracle's Automatic Storage Manager in a minutebut first let's get to work creating the device files for ourRAC setup.
Create three 2G disks. These will be used as general storage spacefor our ASM instance:
$ mkdir /asmdisks
$ dd if=/dev/zero of=/asmdisks/disk1 bs=1024k count=2000
$ dd if=/dev/zero of=/asmdisks/disk2 bs=1024k count=2000
$ dd if=/dev/zero of=/asmdisks/disk3 bs=1024k count=2000

Create two more smaller disks, one for the Oracle Cluster Registry,and another for the voting disk:
$ dd if=/dev/zero of=/asmdisks/disk4 bs=1024k count=100
$ dd if=/dev/zero of=/asmdisks/disk5 bs=1024k count=20

Now we use a loopback device to make Linux treat these FILES asraw devices.
$ /sbin/losetup /dev/loop1 /asmdisks/disk1
$ raw /dev/raw/raw1 /dev/loop1
$ chown oracle.dba /dev/raw/raw1
You'll want to run those same three commands on disk2 through disk5 now.

4. Install the Clusterware & Oracle's 10g Software
--------------------------------------------------
Finally we're done with the Operating System setup, and we can move onto Oracle. The first step will be to install the clusterware. I'lltell you in advance that this was the most difficult step in the entireRAC on a laptop saga. Oracle's installer tries to *HELP* you all alongthe way, which really means standing in front of you!

First let's make a couple of symlinks to our OCR and voting disks:
$ ln -sf /dev/raw/raw4 /home/oracle/product/disk_ocr
$ ln -sf /dev/raw/raw5 /home/oracle/product/disk_vot

As with any Oracle install, you'll need a user, and group alreadycreated, and you'll want to set the usual environment variables suchas ORACLE_HOME, ORACLE_SID, etc. Remember that previous to this pointyou already have ssh and rsh autologin working. If you're not surego back and test again. That will certainly hold you up here, andgive you all sorts of confusing error messages.
If you're running on an uncertified version of Linux, you may wantto fire up the clusterware installer as follows:

$ ./runInstaller -ignoreSysPrereqs

If your Linux distro is still giving you trouble, you might trydownloading from centos.org where you can find complete ISOs forRHEL, various versions. You can also safely ignore memory warningsduring startup. If you're short on memory, it will certainly slow thingsdown, but we're hitchhikers right?

You'll be asked to specify the cluster configuration details. You'llwant route66-vip to be commented out, so if you haven't done that andget an error to the affect of route66-vip already in use go ahead and edit your /etc/hosts file.
I also got messages saying "route66-priv not reachable". Check againthat sshd is running, and possibly disable your firewall rules:
$ /etc/rc.d/init.d/iptables stop
Also verify that eth0:1, and eth0:2 are created. Have you rebootedsince you created them? Be sure they're still there with:
$ /sbin/ifconfig -a
Specify the network interface. This defaults to PRIVATE, just editand specify PUBLIC.
The next two steps ask for the OCR disk and voting disk. Be sure tospecify external redundancy. This is your way of telling Oracle thatyou'll take care of mirroring these important disks yourself, as lossof either of them will get you in deep doodoo. Of course we'rehitchhikers so we're not trying to build a system that is never goingto breakdown, but rather we want to get the feeling of the wind blowingin our hair. Click through to install and you should be in good shape.At the completion, the installer will ask you to run the root.shscript. I found this worked fine up until the vipca (virtual ipconfiguration assistant). I then ran this one manually. You'll needto uncomment route66-vip from your /etc/hosts file as well. Onceall configuration assistants have completed successfully, return tothe installer and click continue, and it will do various other sanitychecks of your cluster configuration.
Since the clusterware install is rather testy, you'll probably be doingit a few times before you get it right.

Friday, July 24, 2009

Autoconfig Reverts to old Context File Values

Symptoms
=========
Each time AutoConfig contextfile is amended and Autoconfig is run the changes made to the Contextfile are not implemented in the instance and the changed values in the Contextfile revert to their previous values.
Cause
========
Each Contextfile version has a serial number given in the parameter:

When autoconfig is run, this serial number is written to the table: FND_OAM_CONTEXT_FILES against the version of the Context file:

For example for a Context file containing the following information:
- apps_contextfile
67

However it is possible that the Apps Tier Contextfile serial number can become unsynchronised with the serial number information in the table.If the serial number in the filesystem reads a lower value compared with that showing in the table for the same contextfile version, then each time Autoconfig is run, AutoConfig will see that the table is showing the higher value and it will replace the Contextfile in the filesystem with the parameter values from the contextfile associated with the serial number in the table. This results in any changes made to the filesystem Contextfile being ignored.
The adconfig log file will show the following:
---------------------------------------------------------------- File system Context file :/u11/app//appl/admin/apps_contextfile.xml
Checking the Context file for possible updates from the Database Comparing the Context file stored in database with the Context file in the file system
Result : File system Context is below par with respect to the data base Context
Action to be taken : Copy the Data Base Context onto the file system Result : Context file successfully copied
Solution1. Manually change the serial number parameter "s_contextserial" in the Apps Tier Contextfile on the Filesystem to a value which is one more than the highest version of the serial number in the FND_OAM_CONTEXT_FILES table, for the relative Contextfile version. 2. Run AutoConfig.

Thursday, July 23, 2009

OC4J expands to Oracle Containers For Java.

Originally based on the IronFlare Orion Application Server, has developed solely under Oracle's control since Oracle Corporation acquired the source.( this is the reason why you see config files are named as orion-web.xml, orion-application.xml etc.,)

In laymans terms OC4J can be described as "Oracle's Implementation of J2EE specification set".Sample specification set can be found here.For example you have finance application which got developed using JSP and servlet specifications.

You can package them as war/ear files and deploy them in OC4J containers which will run the applications as per the clients' request from web browser. There are lot more done by OC4J but above one was a simple example.

It will have xml based config files.

In R12 we have 3 groups of OC4J's. OC4J replaces Jserv(Java servlet containers) which came with earlier 11i techstack.
oacore OC4J - Supports framework based applications

forms OC4J - Supports forms based applications

oafm OC4J - expands to Oracle Application Fusion Middleware - for mapviewer, webservices, ascontrol

Number of OC4J instances for each group will be determined by corresponding nprocs context variable ( s_oacore_nprocs, s_forms_nprocs/s_frmsrv_nprocs, s_oafm_nprocs).-bash-

To be more precise, forms.ear application is deployed in forms OC4J to server forms based applications. ( $IAS_ORACLE-HOME/j2ee/forms/applications/forms.ear)oafm.ear, mapviewer.ear, ascontrol.ear applications (can be found at $IAS_ORACLE-HOME/j2ee/oafm/applications/) are deployed under oafm OC4J container.

But remember, even if one file change in ear file, will leads to creation of new ear file and redeployment, which is time consuming. hence the dummy ear file is used and config files are tweaked to support adpatching in ebiz R12.OC4J deployment will create specific directory structure and similar config files. I will cover them in detail in forthcoming posts.I hope You understood bit about OC4J

Why does R12 have both 10.1.2 and 10.1.3 homes

R12 Ebiz and Application Server 10g
Application server versions 9iAS:
Oracle 9iAS R1: 1.0.2.0 - 1.0.2.2.2
Oracle 9iAS R2: 9.0.2.0.0 - 9.0.3.0.0

Oracle 10g AS:
Oracle 10gAS R1: 9.0.4.0.0 - 9.0.4.3
Oracle 10gAS R2: 10.1.2 - 10.1.2.2.0
Oracle 10gAS R3: 10.1.3.0 - 10.1.3.3

In R12, 10.1.2 AS and 10.1.3 AS Homes are newly introduced in lieu of 8.0.6 and iAS(1.0.2.2) - 11i Architecture.

You may ask why do we have 10.1.2. AS and 10.1.3 AS?
Well, here is the answer.10.1.2 AS installation will be supporting forms based applications.
It is Stanalone 10.1.2 forms/reports server installation. Other components are not included.10.1.3 AS techstack will be used by java based applications.10.1.3 AS instance brings latest OC4J code which is successor of 10.1.2 AS.
10.1.3 AS release doesn't contain forms/reports products. Hence to take advantage of latest oc4j code 10.1.3 AS got introduced.
But to support ebiz forms applications 10.1.2 AS introduced.
Recently 10.1.3.3 got certified with R12 Suite.

Saturday, July 11, 2009

New Feature of 10.2g: Eliminate Control File Re-Creation

Before Oracle 10.2g if we need to change the limit of MAXLOGFILE, MAXLOGMEMBERS, MAXLOGHISTORY, MAXDATAFILES, and MAXINSTANCES then the possible solutions is either RE-create new controlfile or create a new database.

But from Oracle 10.2g all sections of the control file are now automatically extended when they run out of space. This means that there is no longer a requirement to re-create the control file when changes in the configuration parameter the MAXLOGFILE, MAXLOGMEMBERS, MAXLOGHISTORY, MAXDATAFILES, and MAXINSTANCES are needed.

Two different Sections of Control file:--------------------------------------------
1)Circularly reusable Sections:The CONTROL_FILE_RECORD_KEEP_TIME specifies the minimum number of days before a reusable record in the control file can be reused. Example of circularly reusable records are archive log records and various backup records.

2)Not circularly reusable Sections:Records such as datafile, tablespace, and redo thread records, which are never reused unless the corresponding object is dropped from the tablespace.For the circularly reusable sections the behaviour remain same with previous version.The new feature in Oracle 10.2g is that for the non-reusable records we now also extend the control file size if we go over the previous limit.

Though the values for MAXLOGFILE, MAXLOGMEMBERS,MAXLOGHISTORY, MAXDATAFILES, and MAXINSTANCES are still useful since they initialize the control file at a certain size, but they no longer set a hard limit for the number of records in the control file.

Therefore 10.2g onwards, we can get rid of re-creating controlfile whenever we need change parameter like MAXLOGFILE, MAXLOGMEMBERS,MAXLOGHISTORY, MAXDATAFILES, and MAXINSTANCES and keep the database alive.

Oracle Real Application Clusters is a resource-sharing system that increases availability and performance by distributing the workload across multiple nodes.


Oracle Real Application Clusters allows database files to be accessed from multiple instances running on different nodes of a cluster. These nodes are connected to each other by a high speed interconnect.

Each machine or node performs database processing, and all nodes share access to the same database.


This configuration can be used on systems that share resources such as disks, and that have very fast communication between machines or nodes.Failure of one node does not make data inaccessible for all users; the system ensures continued data availability.

=====================================
Let's begin with a brief overview of RAC architecture.
A cluster is a set of 2 or more machines (nodes) that share or coordinate resources to perform the same task.
A RAC database is 2 or more instances running on a set of clustered nodes, with all instances accessing a shared set of database files.
Depending on the O/S platform, a RAC database may be deployed on a cluster that uses vendor clusterware plus Oracle's own clusterware (Cluster Ready Services), or on a cluster that solely uses Oracle's own clusterware.

Thus, every RAC sits on a cluster that is running Cluster Ready Services. srvctl is the primary tool DBAs use to configure CRS for their RAC database and processes.
Cluster Ready Services, or CRS, is a new feature for 10g RAC. Essentially, it is Oracle's own clusterware. On most platforms, Oracle supports vendor clusterware; in these cases, CRS interoperates with the vendor clusterware, providing high availability support and service and workload management.
On Linux and Windows clusters, CRS serves as the sole clusterware. In all cases, CRS provides a standard cluster interface that is consistent across all platforms.
CRS consists of four processes (crsd, occsd, evmd, and evmlogger) and two disks: the Oracle Cluster Registry (OCR), and the voting disk.
CRS manages the following resources:
The ASM instances on each node
Databases
The instances on each node
Oracle Services on each node

The cluster nodes themselves, including the following processes, or "nodeapps":
VIP
GSD
The listener
The ONS daemon

CRS stores information about these resources in the OCR. If the information in the OCR for one of these resources becomes damaged or inconsistent, then CRS is no longer able to manage that resource. Fortunately, the OCR automatically backs itself up regularly and frequently.
Interacting with CRS and the OCR:
srvctl is the tool Oracle recommends that DBAs use to interact with CRS and the cluster registry. Oracle does provide several tools to interface with the cluster registry and CRS more directly, at a lower level, but these tools are deliberately undocumented and intended only for use by Oracle Support. srvctl, in contrast, is well documented and easy to use. Using other tools to modify the OCR or manage CRS without the assistance of Oracle Support runs the risk of damaging the OCR.

2) Important Components in RAC
a)Node
b)Interconnect
c)Shared Storage
d)Clusterware


3) Interconnect


The interconnect links the nodes together, RAC needs a high speed interconnect network for cluster communication and cache fusion. The interconnect must be low latency.Some of the interconnect used in RAC are, Gigabit Ethernet, High Speed Switch, Memory Channels or Infiniband (IB).


4) Shared Disk Storage.


RAC requires shared disk access to the following files

controlfiles, datafiles, redolog files, tempfile, undofile, OCR and Voting disk etc.


To be able to read/write by all members in a cluster at the same time a shared disk storage system must be used. Available options in this case are:


Raw volumes: These are directly attached raw devices that require storage that operates in block mode such as fiber channel or iSCSI.


NFS attached storage: Network file storage can be used in a supported configuration to provide a shared repository for all RAC database files, preferably through a high-speed private network. For example, Netapp Filer offers CFS functionality via NFS to the server machines. These file systems are mounted by using special mount options.Cluster File System: One or more cluster file systems can be used to hold all RAC files. Cluster file systems require block mode storage such as fiber channel or iSCSI. It can not be used on top of NAS (NFS), (some options: AIX GPFS, RedHat GFS,OCFS2,Veritas Storage Solution)


Automatic Storage Management (ASM): is a portable, dedicated, and optimized storage for Oracle database files.


Storage Area Network (SAN): is a shared dedicated high-speed network connectingstorage elements and the backend of the servers.


5) Clusterware


To make sure Real Application Cluster can work properly on the nodes as a single database, Clusterware must be installed. Before Oracle 10g vendor Clusterware was needed except for windows and Linux. With the introduction of Oracle 10g, oracle introduce Oracle Clusterware it offers a complete, integrated Clusterware management solution on all platforms Oracle Database 10g runs on.

This clusterware functionality includes mechanisms for cluster connectivity, messaging and locking, cluster control and recovery, and a services provisioning framework. No 3rd party clusterware management software need be purchased. But if wanted it can still be used, but still Oracle Clusterware needs to be installed.

6)RAC Software Principles for Oracle 10g
A few additional background processes associated with a RAC instance that are not there for a single-instance database. These processes are primarily used to maintain database coherency among each instance. They manage what is called the global resources:LMON: Global Enqueue Service MonitorLMD0: Global Enqueue Service DaemonLMSx: Global Cache Service Processes, where x can range from 0 to 10LCK0: Lock processDIAG: Diagnosibility processAt the cluster level, you find the main processes of the Clusterware software. They provide a standard cluster interface on all platforms and perform high-availability operations. You find these processes on each node of the cluster:

How to Clean Up After a Failed Oracle Clusterware (CRS) Installation

Source:-
Please refers to steps in Metalink Note:
239998.1
Broad Level Steps:-
=================
rm -f /etc/init.d/init.cssd
rm -f /etc/init.d/init.crs
rm -f /etc/init.d/init.crsd
rm -f /etc/init.d/init.evmd
rm -f /etc/rc2.d/K96init.crs
rm -f /etc/rc2.d/S96init.crs
rm -f /etc/rc3.d/K96init.crs
rm -f /etc/rc3.d/S96init.crs
rm -f /etc/rc5.d/K96init.crs
rm -f /etc/rc5.d/S96init.crs
rm -Rf /etc/oracle/*
rm -f /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab
rm -rf /oracrs/oracle/product
rm -rf /oracrs/oracle/oraInventory

Clusterware installation Steps:

http://onlineappsdba.com/index.php/2009/06/06/oracle-rac-clusterware-installation-overview/

Listener Configuration for RAC LOCAL_LISTENER REMOTE_LISTENER

The TNS listener (tnslsnr process on *nix) process listens on a specific network address for connection requests to one of the services from one of the database instances that it services. When requested, it either spawns a server process (dedicated server environment) and connects the user to that process or forwards the connection request to a dispatcher (shared server environment) for service to the database service requested.

Alternatively, if the listener knows of more than one instance providing the requested service, it may direct the client to an alternate listener (usually on a different node) that will service the request.

In any Oracle database configuration, listeners define the instances as local or remote (in single-instance environments, normally everything is local). You can see this behavior when examining the “lsnrctl services ” output (lsnrctl syntax reference here). A listener’s services are those services that have been registered with it by instances. A listener will accept registration from any instance (this may be a weak point of security, but that’s another topic) and listeners have no outbound communication with any other entity in the Oracle environment (or beyond).

The remote_listener parameter specifies a list of listening endpoints that the local instance should contact to register its services. This list is usually defined in a TNS entry in the tnsnames.ora file and then the TNS alias set as the value of the remote_listener parameter. Here’s a sample of what that entry might look like:
LISTENERS_CLUSTERNAME= (ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=node1-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=node2-vip)(PORPORT=1521)) ) (ADDRESS=(PROTOCOL=TCP)(HOST=node3-vip)(PORT=1521)) )

The local_listener parameter is sometimes confusing. It defines where to connect to the local instance, but its most important function is related to remote listeners. The contents of the local_listener parameter are passed along to the remote listeners during remote registration so that when those remote listeners wish to refer a connection request to the local instance, they refer the client (requestor) to the proper listening endpoint so it can get connected.

The local_listener should contain the ADDRESS section of the TNS entry and the HOST portion should reference the VIP address, like this:
(ADDRESS=(PROTOCOL=TCP)(HOST=node3-vip)(PORT=1521))

To properly (manually) configure listeners in a RAC environment, follow the steps like the ones below. Note that in most cases, the Oracle Net Configuration Assistant (netca) will do this for you as part of the database creation process.

Create individual listener.ora files for each listener. Make sure that the HOST= lines in the listener.ora definition reference the VIP addresses (and only the VIP address). I prefer to specify IP addresses instead of hostnames or DNS names here to avoid possible lookup issues and/or confusion.

Create a TNS entry (on each node) that looks like the one below to specify a single TNS entry that references all the listeners in the cluster. Note that the HOST= parts reference the VIP addresses of each node (I used names instead of IP addresses here to avoid reader confusion–I’d put in IP addresses in the HOST= attributes when using this for a real configuration.). 3. LISTENERS_CLUSTERNAME = (ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = node2-vip)(PORT = 1521)) )

Set the remote_listener parameter in the instances (a global parameter, not an instance-specific parameter) to be the name of the TNS entry you created in the previous step. This is done with “alter system set remote_listener = 'LISTENERS_CLUSTERNAME'; “

Set the local_listener parameter to be the ADDRESS string for the local instance. This parameter must be an instance-specific parameter with each instance having a similar, but unique value since each instance runs on a different HOST. If the local instance (called inst1 in the example here) runs on a node with the node VIP of 10.3.121.54, then set the local_listener parameter accordingly for each instance (it is instance-specific, so use the sid= syntax):
alter system set local_listener = '(ADDRESS=(PROTOCOL=TCP)(HOST=10.3.121.54)(PORT=1521))' sid='inst1';

On each instance, you can run “alter system register;” to force immediate registration with the listeners. If you don’t do this, the listener registration will usually be updated within a minute or two anyway (automatically), but this command can help shorten debugging cycles when necessary.

Friday, July 10, 2009

Oracle Clusterware and network bond for redundancy i.e bond0,bond2 etc

The Private and Public interface used in case of RAC 10g should have redundancy
in network interface so that in case a network interface fails the backup interface can take over.

To create a channel bonding interface, first create a file in the /etc/sysconfig/network-scripts/ directory.

You can copy an existing ifcfg-eth# to ifcfg-bond#, where # is a number for the interface.

In general # start with 0. Update the ifcfg-bond#,
change the DEVICE= directive must be bond#.

When the bonding device config file is created you need to update the interfaces which you want to bond and add the master=bond# to the interface configuration fileas well as add slave=yes.

Remove the IP adress information. Repeat this step for 2 interfaces.Below is an example of the configuration file of the interfaces.


Example: interface bond0
GATEWAY=192.168.100.1
DEVICE=bond0
BONDIG_OPTS="mode=1 miimon=500
"BOOTPROTO=none
NETMASK=255.255.255.0
IPADDR=192.168.100.10
ONBOOT=yesUSERCTL=no
Example: interface ifcfg-eth2
GATEWAY=192.168.100.1
TYPE=Ethernet
DEVICE=eth2
BOOTPROTO=none
NETMASK=255.255.255.0
MASTER=bond0 <=== here the bond0 is defined
SLAVE=yes <== here the interface role is defined
ONBOOT=yes
USERCTL=no
IPV6
INIT=no
PEERDNS=yes
Beside the interface configuration you need to update the /etc/modprobe.conf and add the following line:alias bond0 bondinginstall bond0 /sbin/modprobe bonding -o bonding0 mode=1 miimon=500
When the above steps are performed you can start the interfaces and check the result.ifup bond0ifup eth0ifup eth2
cat /proc/net/bonding/bond0
example output:
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
Bonding Mode: load balancing (round-robin)MII Status: upMII Polling Interval (ms): 0Up Delay (ms): 0Down Delay (ms): 0
Slave Interface: eth0MII Status: upLink Failure Count: 0Permanent HW addr: 00:0c:29:8a:5e:21
Slave Interface: eth2MII Status: upLink Failure Count: 0Permanent HW addr: 00:0c:29:8a:5e:0d

Wednesday, July 8, 2009

Viewing definition of an object using DBMS_METADATA

set long 50000
set pagesiz 50

select dbms_metadata.get_ddl('PACKAGE', 'ZPB_BUILD_METADATA', 'APPS') from dual;
select dbms_metadata.get_ddl('PACKAGE_BODY', 'ZPB_BUILD_METADATA', 'APPS') from dual;

also using

select text from dba_source where name='' and owner='';

ipcrs ipcrm shmid semid

When you are forced to terminate Oracle on a UNIX server, you must perform the following steps:

1)Kill all Oracle processes associated with the ORACLE_SID

2)Use the ipcs –pmb command to identify all held RAM memory segments.

3)ipcs -a gives the following information
a)------ Shared Memory Segments --------
b)------ Semaphore Arrays --------

4)ipcrm -m shmid removes the shared memory segments.

5)ipcrm -s semid removes the semaphores

6)we can create a single command to terminate all Oracle processes associated with your hung database instance.
root> ps -efgrep $ORACLE_SID \grep -v grepawk '{print $2}'xargs -i kill -9 {}

Tuesday, July 7, 2009

The Voting disk is one of the required parts of the Oracle Cluster environment.
Another name for the Voting disk is quorum disk.

The voting disk is used to determine quorum in case of failure and provides a second heartbeat mechanism to validate cluster health. The Disk heartbeat is maintained in the voting disk.

The voting disk is used to ascertain cluster state.

If a node eviction needs to take place the voting disk is updated with the "eviction message".

If we look at the structure of the voting disk, each node of the cluster has his own part in the voting disk. When a new node is added it will add the information to a new part of the voting disk.

From Oracle 10g release 2 you can define multiple voting disks. Where in Oracle 10g release 1, you need to make sure the voting disk is mirrored to make sure the voting disk is not the single point of failure.

How to add, remove or backup the voting disk see:
http://download-east.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm

Cluster Registry is one of the required components in an Oracle Cluster environment.

It is a registry which contains all the information about the cluster environment. You need to think of node names, ip addresses, an application resources like listener,vip, gsd but also the databases/instance. Also the parameters like need to startup, dependencies are stored in the OCR.


The OCR is created during the CRS installation when the root.sh script is executed. When root.sh is executed it will read the ocr.loc file which is create during installation and pointing the the OCR file /device.

To make sure all the nodes in the cluster can read the OCR the ocr location must be on shared storage.

The location of the ocr.loc depended on the platform used.
Linux: /etc/oracle/ocr.loc
Aix: /etc/oracle/ocr.loc
Solaris : /var/opt/oracle
Windows : HKEY_LOCAL_MACHINE\SOFTWARE\Oracle\OCR

If we look in the ocr.loc file we see the following.

bash-3.00$ cat /etc/oracle/ocr.loc ocrconfig_loc=/oracrs/oradata/data01/ocrdisk1

ocrmirrorconfig_loc=/oracrs/oradata/data02/ocrdisk2
local_only=FALSE

The value of local_only = true indicates "Single instance only" and false means using "RAC"


The orc.loc is the location where the CRS stack will check for the OCR during startup. When the OCR is found it will be read for the voting disk location and the other information. If for some reason the orc.loc or the location in the ocr.loc is not available the cluster will not be started.

From oracle 10g release 2, it is possible to define more OCR locations (mirroring).
Clients of the OCR are srvctl, css, crs, dbua, vipca and em
.

Tools which can be used:
ocrconfig - configuration tool for Oracle Cluster Registry
ocrdump – utility to dump the contents of the OCR in a file.
ocrcheck – utility to verify the OCR integrity.

Ocrconfig:
http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/ocrsyntax.htm#RACAD835

ocrdump:
http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/appsupport.htm#sthref1123

ocrcheck:
http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/appsupport.htm#BEHJIJIB

VIPCA
=========
Oracle Clusterware requires a virtual IP address for each node in the cluster.

This IP address must be on the same subnet as the public IP address for the node and should be an address that is assigned a name in the Domain Name Service, but is unused and cannot be pinged in the network before installation of Oracle Clusterware.

The VIP is a node application (nodeapp) defined in the OCR that is managed by Oracle Clusterware. The VIP is configured with the VIPCA utility. The root script calls the VIPCA utility in silent mode.


Why Oracle 10g has a VIP?

Protects database clients from long TCP IP timeouts (>10 minutes).

During normal operation works the same as hostname.

During failure it removes network timeout from connection request time ,client fails immediately to the next address in the list.

Oracle RAC Details

=================

About Virtual IP
Why is there a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?
It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.

The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately. This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.
Going one step further is making use of Transparent Application Failover (TAF). With TAF successfully configured, it is possible to completely avoid ORA-3113 errors alltogether! TAF will be discussed in more detail in Section 28 ("Transparent Application Failover - (TAF)").
Without using VIPs, clients connected to a node that died will often wait a 10-minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs (Source - Metalink Note 220970.1).

Background Processes in a RAC database

LMS

This background process copy read consistent blocks from the holding instance buffer cache to the requesting instance. LMSn also performs rollback on uncommitted transactions for blocks that are being requested for consistent read by another instance.

This background process is also called Global Cache Services.
This is the name you often see back in wait events (GCS).
Default 2 LMS background processes are started.

LMON

This background process monitors the entire cluster database. LMON checks and manages instance deaths and perform recovery for the Global Cache Service//LMS.
Joining an leaving instances are managed by LMON. LMON manage also all the global resource in the RAC database. LMON register the instance/database with the node monitoring part of the cluster (CSSD).

This background process is also called Global Enqueue Monitoring.
LMON provide services are also referred to cluster group service (CGS).

LMD

This background process manage access to the blocks and global enqueues. Also global deadlock detection and remote resource request are handled by LMD. LMD also manage lock requests for GCS /LMS.

This background process is also called Global Enqueue Service Deamon. In wait events you will see GES.

DIAG

The diagnostic Daemon (DIAG) captures diagnostic data in case there is a process failure within the instance. The diag log can be used to investigate why there was a process failure.

LCK

The Lock Process is also for non-RAC environments LCK manage local noncache requests (row cache, lock requests, library locks). And it also managed shared resource requests cross instance. It keeps a list of invalid and valid lock elements. And if needed past information to the GCS.

Interconnect setup in RAC environment.

The interconnect is a very important part of the cluster environment it is on of the aorta’s of a cluster environment. The interconnect is used as physical layer between the cluster nodes to perform heartbeats as well as the cache fusion is using it. The interconnect must be a private connection. Cross over cable is not support.

In a day to day operation it is proven that when the interconnect is configured correctly the interconnect will not be the bottleneck in case of performance issues. In the rest of this article will be focus on the how to validate the interconnect is really used. A DBA must be able to validate the interconnect settings in case of performance problems. Out of scope is the physical attachment of the interconnect.

Although you should thread performance issues in a Cluster environment the way you would normally also do in no-cluster environments here some area’s you can focus on.
Normally the average interconnect latency using gigabit must be < 5ms.

Latency around 2ms are normal.

Solution Validation of the interconnect ?

There are several ways to validate the interconnect.

This can be down using the x$ksxpia table, by using oradebug, or using queries on gv$ views(not possible in Oracle 9i). Besides the queries it is also possible to validate the use of the interconnect from the alert.log of your instance. Below we will list the options how to use them. Available interfaces? This query shows all the interfaces which are know within the Oracle database instances.

The query will work on Oracle 10g and 11g, not in 9i.

set linesize 120
col name for a22
col ip_address for a15

select inst_id,name,ip_address,is_public from gv$configured_interconnects order by 1,2; This

set linesize 120
col name for a22
col ip_address for a15

select inst_id,name,ip_address,is_public from gv$cluster_interconnects order by 1,2;

SQL> connect / as sysdba
SQL> alter session set tracefile_identifier=oradebug_interc
SQL> oradebug setmypid
SQL> oradebug ipc
SQL> exit

Now if you open the tracefile, in the bdump location, you can find the IP address used for the interconnect. Here is the result of the above oradebug ipc command.

dmno 0x7902775e admport:
SSKGXPT 0x10569c44 flags SSKGXPT_READPENDING active network 0
info for network 0
socket no 7 IP 145.72.220.83 UDP 53032
HACMP network_id 0 sflags SSKGXPT_UP
context timestamp 0
no ports
sconno accono ertt state seq# sent async sync rtrans acks

Query x$ksxpia
The last option is to query the x$ksxpia, which is instance specific query. A query on this view is providing the information from which setting the interconnect is picked up. Depending on the environment this can be useful to indicate if an where the configuration went wrong. This query will work in Oracle 9i,10g and 11g.

Below is an example of the output.

col picked_ksxpia format a15
col indx format 9999
col name_ksxpia format a5
col ip_ksxpia format a20

select * from x$KSXPIA

ADDR INDX INST_ID PUB_KSXPIA PICKED_KSXPIA NAME_ IP_KSXPIA ---------------- ----- ---------- ------------------------------ --------------- ----- -------------------- 00000001105D6540 0 1 N OCR en7 145.72.220.83 00000001105D6540 1 1 Y OCR en6 145.72.220.10

Note: The pub_ksxpia indicate if the interface is a public or private one. The picked_ksxpia indicated from where the information is collect. In the example the interface information from the OCR is used. Other values are OSD which is meaning third party clusterware is used. And CI is also possible which means the cluster_interconnect parameter is set. (Last I recommend not to do).

Using Alert.log
Instead of using one or more of the above queries you can also check the alert.log of the instances involved. During the startup of the RDBMS instance the interface used for the public and private connect is mentioned. Note this start from Oracle 10g, and is not available in Oracle 9i.

Example: alert.log

Interface type 1 eth1 10.10.10.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 192.168.2.0 configured from OCR for use as a public interface

Using Oracle Cluster Registry to validate settings.
This method can be used to validate the settings in the OCR. But it does not mean it is also used by the database instances. To validate the settings of ocr you need to use the oifcfg command to retrieve the information from the OCR.

The oifcfg iflist list all the interfaces available on the Operating system and will not get this from the OCR.
The oifcfg getif list the configuration from the OCR.

An example:

racworkshop1:/export/home/oracle$ oifcfg iflist
eth0 192.168.2.0
eth1 10.10.10.0
racworkshop1:/export/home/oracle$ oifcfg getif
eth0 192.168.2.0 global public
eth1 10.10.10.0 global cluster_interconnect

Source: http://www.rachelp.nl/index_kb.php?menu=articles&actie=show&id=35

Oracle Cluster Ready Services (CRS)

Oracle Database 10g contains many enhancements, but one of the most interesting is the introduction of Oracle Cluster Ready Services (CRS), Oracle's full-stack clusterware.

Oracle CRS is Oracle's own clusterware tightly coupled with Oracle Real Application Clusters (RAC).

CRS must be installed prior to the installation of Oracle RAC.

It can also work over any third-party clustering software but there is no longer a requirement to buy and deploy such software.

In short, Oracle CRS is primarily responsible for managing the high-availability (HA) architecture of Oracle RAC with the help of Cluster Ready Services Daemon (CRSD), Oracle Cluster Synchronization Server Daemon (OCSSD) and the Event Manager Daemon (EVMD).

The CRSD manages the HA functionality by starting, stopping, and failing over the application resources and maintaining the profiles and current states in the Oracle Cluster Registry (OCR).

The OCSSD manages the participating nodes in the cluster by using the voting disk. The OCSSD also protects against the data corruption potentially caused by "split brain" syndrome by forcing a machine to reboot.

Although Oracle CRS replaces the Oracle Cluster Manager (ORACM) in Oracle9i RAC, it does continue support for the Global Services Daemon (GSD), which in Oracle9i is responsible for communicating with the Oracle RAC database.
In Oracle 10g, GSD's sole purpose is to serve Oracle9i clients (such as SRVCTL, Database Configuration Assistant, and Oracle Enterprise Manager).

Source :-
http://download.oracle.com/docs/html/B10766_08/toc.htm

Monday, July 6, 2009

How does Oracle Clustware start?

Note#
=======
From 10.1.0.4
===============================
/etc/init.d/init.crs start
/etc/init.d/init.crs stop
/etc/init.d/init.crs enable
/etc/init.d/init.crs disable

starts crsd,cssd,evmd daemons.

From 10.2.X
===============================

crsctl stop crs
crsctl start crs
crsctl enable crs
crsctl disable crs

The startup of the Oracle clusterware daemons is based on start scripts that are executed as root user.

The environment variables are set in these scripts.
The clusterware start scripts can be found in:
· /etc/init.d (Sun, Linux)
· /sbin (HPUX, HP Itanium, Tru64)
· /etc (AIX)
They are named:
· init.cssd: starts ocssd.bin, oclsomon, oprocd and oclsvmon daemons
· init.evmd: starts evmd.bin daemon
. init.crsd: starts crsd.bin daemon
· init.crs : enabler/start/disabler script


Automatic startup of the clusterware daemons rely on these 3 steps (two Unix OS system mechanisms and a check of whether the clusterware is startable.

1. Execution of the init.d rc*.d scripts that enable the clusterware to start or not after a reboot

Note: The init.* scripts should never normally be run from the command line, if you do require manual disabling or enabling of CRS startup other than for the diagnostic purposes listed in this note then invoke crsctl as the root user as follows:

$ crsctl disable crs
Oracle Clusterware is disabled for start-up after a reboot.

$ crsctl enable crs
Oracle Clusterware is enabled for start-up after a reboot.

Running 'init.crs enable' allows the clusterware to autostart at system reboot ( this is the default
setting).

Running 'init.crs disable' (does the reverse) prevents the clusterware from autostarting at system reboot.

When the rc*.d scripts are executed at the correct OS runlevel, then the 'init.crs start' execution
(executed in run level 3 or 5 for the Unix versions) will check the automatic setting (enabled or
disabled) and set the clusterware in startable or non startable mode accordingly.



2. The inittab mechanism via the three respawnable clusterware scripts
A successful installation will populate the inittab file with entries to start the CRS stack:



3. The prerequisite check with the 'init.cssd startcheck' execution

The clusterware scripts further run a check script to know whether the clusterware is startable before launching the clusterware processes, that is, to check whether basic prerequisites are met and permit the clusterware to start :

init.cssd startcheck
That last script needs to return code 0 to permit the clusterware to start. In case of errors
/tmp/crsctl.xxxx logging files are written with the error message.

It executes as user oracle this command via a
'su -l oracle'
crsctl check boot


Execution of this command should return nothing in order to permit the clusterware to start.
Note: 'crsctl check boot' may return other errors such as:
'no read access to the ocr', 'clustered ip is not defined' or '$CRS_HOME is not mounted'
and prevent the clusterware from starting.
Once the above three prerequisites are met, the clusterware starts via *.bin executables which can be confirmed using the command:

'ps -ef grep .bin':
oracle 19611 19610 0 Dec17 /opt/app/oracle/product/crs/bin/oclsomon.bin
oracle 19547 18245 0 Dec17 /opt/app/oracle/product/crs/bin/ocssd.bin
oracle 18215 18005 0 Dec17 /opt/app/oracle/product/crs/bin/evmd.bin
root 18649 16555 0 Dec17 /opt/app/oracle/product/crs/bin/crsd.bin

Syntax of FNDCPASS command in Oracle Apps to change Passwords

FNDCPASS logon 0 Y system/password mode username new_password

Where logon is username/password[@connect]System/password is password of the system account of that databaseMode is SYSTEM/USER/ORACLEUsername is the username where you want to change its passwordnew_password is the new password in unencrypted format

Example:
$ FNDCPASS apps/apps 0 Y system/manager SYSTEM APPLSYS WELCOME
$ FNDCPASS apps/apps 0 Y system/manager ORACLE GL GL1
$ FNDCPASS apps/apps 0 Y system/manager USER VISION WELCOME

Example:
$ FNDCPASS apps/apps 0 Y system/manager ALLORACLE WELCOME

To change APPS/APPLSYS password, we need to give mode as SYSTEM

To change product schema passwords, i.e., GL, AP, AR, etc., we need to give mode as ORACLE


To change end user passwords, i.e., SYSADMIN, OPERATIONS etc., we need give mode as USER

Note: The FNDCPASS has a new mode, “ALLORACLE”, in which all Oracle Application schema passwords can be changed in one call. Apply the patch (Patch No# 4745998) to have this option, if not available currently with your Apps.

Syntax:
FNDCPASS 0 Y ALLORACLE


Note: Till 11.5.9 there is bug in FNDCPASS, which allows FNDCPASS to change APPS&APPLSYS passwords. Doing so will corrupt the data in FND meta data tables and cause to the application unusable. Because of that it is recommend taking backup of the tables FND_USER and
FND_ORACLE_USERID before changing the passwords.

After changing the APPS/APPLSYS or APPLSYSPUB user, following extra manual steps needs to be done.

If you changed the APPS (and APPLSYS) password, update the password in these files:

iAS_TOP/Apache/modplsql/cfg/wdbsvr.app

ORACLE_HOME/reports60/server/CGIcmd.dat

If you changed the APPLSYSPUB password, update the password in these files:

FND_TOP/resource/appsweb.cfg OA_HTML/bin/appsweb.cfg

FND_TOP/secure/HOSTNAME_DBNAME.dbc

Kernel Upgrade process

1)To check which kernels are installed, run the following command:

$ rpm -qa grep kernel

2)To check which kernel is currently running, execute the following command:

$ uname -r

3)For example, to install the 2.4.21-32.0.1.ELhugemem kernel, download the kernel-hugemem RPM and execute the following command:

# rpm -ivh kernel-hugemem-2.4.21-32.0.1.EL.i686.rpm

4) Never upgrade the kernel using the RPM option '-U'.

The previous kernel should always be available if the newer kernel does not boot or work properly.

5)To make sure the right kernel is booted, check the /etc/grub.conf file if you use GRUB and change the "default" attribute if necessary.

Here is an example:

default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Red Hat Enterprise Linux AS (2.4.21-32.0.1.ELhugemem)

root (hd0,0)
kernel /vmlinuz-2.4.21-32.0.1.ELhugemem ro root=/dev/sda2
initrd /initrd-2.4.21-32.0.1.ELhugemem.img
title Red Hat Enterprise Linux AS (2.4.21-32.0.1.ELsmp)

root (hd0,0)
kernel /vmlinuz-2.4.21-32.0.1.ELsmp ro root=/dev/sda2
initrd /initrd-2.4.21-32.0.1.ELsmp.img

In this example, the "default" attribute is set to "0" which means that the 2.4.21-32.0.1.ELhugemem kernel will be booted.

If the "default" attribute would be set to "1", then 2.4.21-32.0.1.ELsmp would be booted.

6)After you installed the newer kernel reboot the system.

7)Once you are sure that you don't need the old kernel anymore, you can remove the old kernel by running:

# rpm -e

When you remove a kernel, you don't need to update /etc/grub.conf.


What are .lct and .ldt files in Patch Directory?

Ans:

The patch metadata LDT files (also called datafiles) are FNDLOAD data files included in the top-level directory of all recent patches. The LDT files contain prerequisite patch information and a manifest of all files in the patch with their version numbers. The Patch Information Bundle metadata also include information about the relationships between patches, such as which minipacks are contained in the recommended.LCT files (also called configfiles) are the configuration files which are used to download/upload data. Without configfiles, datafiles are useless.

Troubleshooting & Important CRSCTL commands

Note#
Any command that has to query information regarding Cluster can be run as oracle user whereas any command which changes the configuration of cluster has to be run from root user.

Start CRS
==========
crsctl start crs
init.crs start
crs_start -all(as oracle)

Stop CRS
==========
crsctl stop crs (as root)
crs_stop -all (as oracle)
init.crs stop (as root) Check CRS

==========
crsctl check crs

Enable Oracle Clusterware
===================
crsctl enable crs

Disable Oracle Clusterware
====================
crsctl disable crs

Check location of Voting Disk
=====================
crsctl query css votedisk
0 /oracrs/oradata/data01/vdisk1
0 /oracrs/oradata/data01/vdisk2
0 /oracrs/oradata/data01/vdisk3

located 3 votedisk(s).

Check location of OCR Disk
=====================
$CRS_ORACLE_HOME/bin/ocrcheck

Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 2024
Available space (kbytes) : 260096
ID : 2110452402
Device/File Name : /oracrs/oradata/data01/ocrdisk1
Device/File integrity check succeeded
Device/File Name : /oracrs/oradata/data02/ocrdisk2

Device/File integrity check succeeded
Cluster registry integrity check succeeded
======================================

All the clusterware processes are normally retrieved via OS commands like:
ps -ef grep -E 'initd.binoclssleepevmloggeroprocddiskmonPID'

There are general processes, i.e. processes that need to be started on all platforms/releases
and specific processes, i.e. processes that need to be started on some CRS versions/platforms
A.)
The general processes are
ocssd.bin
evmd.bin
evmlogger.bin
crsd.bin
B)
The specific processes are

oprocd: run on Unix when vendor Clusterware is not running. On Linux, only starting with
10.2.0.4.
oclsvmon.bin: normally run when a third party clusterware is running
oclsomon.bin: check program of the ocssd.bin (starting in 10.2.0.1)
diskmon.bin: new 11.1.0.7 process for exadata
oclskd.bin: new 11.1.0.6 process to reboot nodes in case rdbms instances are hanging

There are three fatal processes, i.e. processes whose abnormal halt or kill will provoque a node
reboot (see note:265769.1):
1. the ocssd.bin
2. the oprocd.bin
3. the oclsomon.bin


The other processes are automatically restarted when they go away.


When the clusterware is not allowed to start on boot
This state is reached when:
1. 'crsctl stop crs' has been issued and the clusterware is stopped
or
2. the automatic startup of the clusterware has been disabled and the node has been rebooted, e.g.
./init.crs disable
Automatic startup disabled for system boot.

The 'ps' command only show the three inittab processes with spawned sleeping processes in a
30seconds loop

ps -ef grep -E 'initd.binoclsoprocddiskmonevmloggersleepPID'

UID PID PPID C STIME TTY TIME CMD
root 1 0 0 16:55 ? 00:00:00 init [5]
root 19770 1 0 18:00 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run
root 19854 1 0 18:00 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 19906 1 0 18:00 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 22143 19770 0 18:02 ? 00:00:00 /bin/sleep 30
root 22255 19854 0 18:02 ? 00:00:00 /bin/sleep 30
root 22266 19906 0 18:02 ? 00:00:00 /bin/sleep 30

The clusterware can be reenabled via './init.crs enable' execution or/and via 'crsctl start crs'


When the clusterware is allowed to start on boot, but can't start because some
prerequisites are not met


This state is reached when the node has reboot and some prerequisites are missing, e.g.

1. OCR is not accessible
2. Cluster interconnect can't accept tcp connections
3. CRS_HOME is not mounted

How to Troubleshoot?
'crsctl check boot' (run as oracle) show errors, e.g.
$ crsctl check boot
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device:
PROC-26: Error while accessing the physical storage Operating System error [No such file or
directory] [2]

The three inittab processes are sleeping for 60seconds in a loop in 'init.cssd startcheck'
ps -ef grep -E 'initd.binoclsoprocddiskmonevmloggersleepPID'
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 18:28 ? 00:00:00 init [5]
root 4969 1 0 18:29 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run
root 5060 1 0 18:29 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 5064 1 0 18:29 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 5405 4969 0 18:29 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 5719 5060 0 18:29 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 5819 5064 0 18:29 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 6986 5405 0 18:30 ? 00:00:00 /bin/sleep 60
root 6987 5819 0 18:30 ? 00:00:00 /bin/sleep 60
root 7025 5719 0 18:30 ? 00:00:00 /bin/sleep 60

Once the 'crsctl check boot' will return nothing (no error messages anymore), then the clusterware processes will start.

Note#
CRS is designed to run at level 3 or 5 (GUI).
cat /etc/inittab.

Sunday, July 5, 2009

Steps to increase the Margin time/DIAGWAIT

These are following steps to increase the Margin time.

The Modification process need Downtime and You need to stop cluster service in all member nodes.

1. Stop The CRS Process#crsctl stop crs #

2. Ensure that Clusterware stack is down and not running
#ps -ef egrep "crsd.binocssd.binevmd.binoprocd"
This should return no processes.

3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root: #crsctl set css diagwait 13 -force

4. Check if diagwait is successfully set.#crsctl get css diagwait

5. Restart the Oracle Clusterware on all the nodes by executing: #crsctl start crs
(Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)

6. The oprocd daemon process will show with -m 10000# ps -efl grep oprocd# 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f

The value of m = 10000 shows that the changes have been affected

Rollback Procedure-If You need to unset oprocd value due any reason
#crsctl unset css diagwait

Useful Metalink Notes for Clusterware
========================================
239989.1 10g RAC: Stopping Reboot Loops When CRS Problems Occur
259301.1 CRS and 10g RAC
This note contains a useful awk script to improve the output of crs_stat -ls
436067.1 Windows CRS_STAT script to display long names correctly
309541.1 How to start/stop the 10g CRS Clusterware
263897.1 How to stop Cluster Ready Services (CRS)
298073.1 How to remove CRS auto start and restart for a RAC instance
295871.1 How to verify if CRS install is valid
316583.1 VIPCA fails complaining that interface is not public
341214.1 How to cleanup after a failed (or successful) Oracle Clusterware installation
280589.1 How to install Oracle 10g CRS on a cluster where one or more nodes are not to be configured with CRS immediately
357808.1 CRS Diagnostics
272331.1 CRS 10g Diagnostic Guide
330358.1 CRS 10g R2 Diagnostic Collection Guide
331168.1 Oracle Clusterware consolidated logging in 10gR2
342590.1 CRS logs not being written
357808.1 Diagnosability for CRS/EVM/RACG
459694.1 Procwatcher: Script to Monitor and Examine Oracle and CRS Processes
289690.1 Data Gathering for Troubleshooting RAC and CRS issues
265769.1 Troubleshooting CRS Reboots
240001.1 Troubleshooting CRS root.sh problems (10g RAC)
239989.1 10g RAC - Stopping Reboot Loops when CRS problems occur
294430.1 CSS Timeout Computation in 10g RAC
284752.1 10gRAC: Steps to Increase CSS Misscount, Reboottime and Disktimeout
462616.1 Reconfiguring the CSS disktimeout of 10gR2 Clusterware for proper LUN failover
293819.1 Placement of voting and OCR disk file in 10g RAC
317628.1 How to replace a corrupt OCR mirror file
452486.1 Moving OCR and Voting Disk to another location
399482.1 How to recreate OCR/Voting disk accidentally deleted
358620.1 How to recreate OCR/Voting disk in 10gR1/R2 RAC
279793.1 How to Restore a Lost Voting Disk in 10g
264847.1 How to Configure Virtual IPs for 10g RAC
283684.1 How to change interconnect/public interface IP subnet in a 10g cluster
276434.1 Modifying the VIP or VIP Hostname of an Oracle 10g Clusterware Node
294336.1 Changing the check interval for the Oracle 10g VIP
219361.1 Troubleshooting Instance Evictions (ORA-29740)
297498.1 Resolving Instance Evictions on Windows platforms
315125.1 What to check if the Cluster Synchronization Services daemon (OCSSD) does not start
270512.1 Adding a node to a 10g RAC Cluster
269320.1 Removing a node from a 10g RAC Cluster
338706.1 Cluster Ready Services (CRS) rolling upgrade
399031.1 Step-by-step installation of Oracle Clusterware one-off and bundle patches for Oracle 10g
401783.1 Changes in Oracle Clusterware after applying 10.2.0.3 Patchset
405820.1 Known Issues After Applying 10.2 CRS bundle patches
316817.1 Cluster Verification Utility (CLUVFY) FAQ
372358.1 Shared disk check with the Cluster Verification Utility
338924.1 CLUVFY Fails with error - could not find a suitable set of interfaces for VIPs

How to check cluster is at what version?

crsctl query crs softwareversion
crsctl query crs activeversion

The output of both should match.
softwareversion shows the version of the software on that particular node of cluster where this command is run.

activeversion shows the version of entire cluster.
So once you upgrade the entire cluster i.e all nodes run crsctl query crs activeversion to check,if cluster has been upgraded on all nodes.

1)crsctl query crs activeversion
CRS active version on the cluster is [10.2.0.4.0]
2)crsctl query crs softwareversion
CRS software version on node [mynode1] is [10.2.0.4.0]

CLUSTER VERIFICATION UTILITY FAQ

The Oracle Clusterware utility for Oracle RAC comes in two forms: the cluvfy utility which is available after installation of the clusterware software and the runcluvfy.sh shell script which can be used before installation.

In this example we are using cluvfy utility.
export CV_HOME=/oracrs/oradata i.e where cvupack.zip was unzipped.
export CV_JDKHOME=/usr/java/jdk1.4.2 i. location of JRE 1.4 on the box.
Usage:
./cluvfy stage -pre crsinst -n node1,node2 -verbose

Refrence from Metalink note 316817.1

CLUSTER VERIFICATION UTILITY FAQ
=======================================
Concept
What is CVU? What are its objectives and features?
What is a stage?
What is a component?
What is nodelist?
Do I have to be root to use CVU?
What about discovery? Does CVU discover installed components?
What about locale? Does CVU support other languages?
@How do I report a bug?

Installation
What are the requirements for CVU?
How do I manually install CVU?
From where can I download CVU?
What Linux versions are supported ?
How do I make Cluvfy work with Suse 9 ES?
What Windows versions are supported?
What Solaris versions are supported?
What AIX versions are supported
What HP-UX versions are supported?

Usage
How do I know about cluvfy commands? The usage text of cluvfy does not show individual commands.
What are the default values for the command line arguments?
Do I have to type the nodelist every time for the CVU commands? Is there any shortcut?
How do I get detailed output of a check?
How do I check network or node connectivity related issues?
How do I check whether OCFS is properly configured?
How do I check the CRS stack and other sub-components of it?
How do I check user accounts and administrative permissions related issues?
How do I check minimal system requirements on the nodes?
Can I check if the storage is shared among the nodes?
Is there a way to compare nodes?
Why the peer comparison with -refnode says “passed” when the group or user does not exist?
Is there a way to verify that the CRS is working properly before proceeding with RAC install?
At what point cluvfy is usable? Can I use cluvfy before installing CRS?
How do I turn on tracing?
Where can I find the CVU trace files?
Why cluvfy reports “unknown” on a particular node?
What does cluvfy error “Could not find a suitable set of interfaces for VIPs” mean?
Where can I find the disk rpm?

General Questions:
How do I check that user equivalence through SSH is setup properly?
How can I check the requirements for installing Oracle Clusterware or RAC from Oracle Database Release 10g Release1 (10.1) ?
What is CVU`s configuration file? How do I use it?
How do I run CVU from installation media?
What database versions are supported by CVU?

Limitations:
What are the known issues with gf release?
What kinds of storage does cluvfy check for shared-ness?

What is CVU? What are its objectives and features?
CVU brings ease to RAC users by verifying all the important components that need to be verified at different stages in a RAC environment. The wide domain of deployment of CVU ranges from initial hardware setup through fully operational cluster for RAC deployment and covers all the intermediate stages of installation and configuration of various components. The command line tool is cluvfy. Cluvfy is a non-intrusive utility and will not adversely affect the system or operations stack.

What is a stage?
CVU supports the notion of Stage verification. It identifies all the important stages in RAC deployment and provides each stage with its own entry and exit criteria. The entry criteria for a stage define a specific set of verification tasks to be performed before initiating that stage. This pre-check saves the user from entering into a stage unless its pre-requisite conditions are met. The exit criteria for a stage define another specific set of verification tasks to be performed after completion of the stage. The post-check ensures that the activities for that stage have been completed successfully. It identifies any stage specific problem before it propagates to subsequent stages; thus making it difficult to find its root cause. An example of a stage is “pre-check of database installation”, which checks whether the system meets the criteria for RAC install.

What is a component?
CVU supports the notion of Component verification. The verifications in this category are not associated with any specific stage. The user can verify the correctness of a specific cluster component. A component can range from a basic one, like free disk space to a complex one like CRS Stack. The integrity check for CRS stack will transparently span over verification of multiple sub-components associated with CRS stack. This encapsulation of a set of tasks within specific component verification should be of a great ease to the user.

What is nodelist?
Nodelist is a comma separated list of hostnames without domain. Cluvfy will ignore any domain while processing the nodelist. If duplicate entities after removing the domain exist, cluvfy will eliminate the duplicate names while processing. Wherever supported, you can use ‘-n all’ to check on all the cluster nodes. Check this for more information on nodelist and shortcuts.
[ go to the top ]

Do I have to be root to use CVU?
No. CVU is intended for database and system administrators. CVU assumes the current user as oracle user.

What about discovery? Does CVU discover installed components?
At present, CVU discovery is limited to these components. CVU discovers available network interfaces if you do not specify any interface or IP address in its command line. For storage related verification, CVU discovers all the supported storage types if you do not specify a particular storage. CVU discovers CRS HOME if one is available.

What about locale? Does CVU support other languages?
CVU supports all the languages that are supported by other Oracle products.

@How do I report a(or tons of) bug?
is not covered in those documents, file a bug against product# 5,
@component: OPSM and sub-component: CLUVFY. Please provide the relevant log file while filing a bug.
[ go to the top ]

What are the requirements for CVU?

CVU requires:
1._ An area with at least 30MB for containing software bits on the invocation node.
2._ Java 1.4.1 location on the invocation node.
3._ A work directory with at least 25MB on all the nodes. CVU will attempt to copy the necessary bits as required to this location. Make sure, the location exists on all nodes and it has write permission for CVU user. This dir is set through the CV_DESTLOC environment variable. If this variable does not exist, CVU will use “/tmp” as the work dir.

How do I manually install CVU?
Here is how one can install CVU from a zip file(cvupack.zip).
1.) create a cvhome( say /home/mycvhome ) directory. It should have at least 30M of free disk space.
2.) cd /home/mycvhome
3.) copy the cvupack.zip file to /home/mycvhome
4.) unzip the file:
Example : unzip cvupack.zip
5.) set these environmental variables:
CV_HOME: This should point to the cvhome.
Example: setenv CV_HOME /home/mycvhome
CV_JDKHOME: This should point to a valid jdk1.4 home with hybrid support. By default the installation points to the right JDK
Example: setenv CV_JDKHOME /usr/local/packages/jdk14
CV_DESTLOC (optional ): This should point to a writable area on *all* nodes. The tool will attempt to copy the necessary bits as required to this location. Make sure, the location exists on all nodes and it has write permission for CVU user. It is strongly recommended that you should set this variable. If this variable has not been set, CVU will use “/tmp” as the default.
Example : setenv CV_DESTLOC /tmp/cvu_temp

To verify, run /home/mycvhome/bin/cluvfy. This should show the usage.

From where do I download CVU
http://www.oracle.com/technology/products/database/clustering/cvu/cvu_download_homepage.html

What Linux distributions are supported?
This release supports
RedHat 2.1AS (Note that the CVU for 2.1 and other versions are not binary compatible)
RedHat 3 (Update 2 or higher)
RedHat 4
Suse 9.

How do I make Cluvfy work with Suse 9 ES?
For this you will have to edit the configuration file called cvu_config under
CV_HOME/cv/admin directory. Modify the property CV_ASSUME_DISTID=Taroon to CV_ASSUME_DISTID=Pensacola

What Windows versions are supported?
This release supports Windows 2000 and Windows 2003

What Solaris versions are supported?
This release supports Solaris 8, Solaris 9 and Solaris 10

What AIX are versions are supported?
This release supports AIX 5L (5.1,5.2,5.3)

What HP-UX versions are supported?
This release supports 11.11 and 11.23

[ go to the top ]

How do I know about cluvfy commands? The usage text of cluvfy does not show individual commands.
Cluvfy has context sensitive help built into it. Cluvfy shows the most appropriate usage text based on the cluvfy command line arguments.

If you type ‘cluvfy’ on the command prompt, cluvfy displays the high level generic usage text, which talks about valid stage and component syntax.

If you type ‘cluvfy comp -list’, cluvfy will show valid components with brief description on each of them. If you type ‘cluvfy comp -help’, cluvfy will show detail syntax for each of the valid components. Similarly, ‘cluvfy stage -list’ and ‘cluvfy stage -help’ will list valid stages and their syntax respectively.

If you type an invalid command, cluvfy will show the appropriate usage for that particular command. For example, if you type ‘cluvfy stage -pre dbinst’, cluvfy will show the syntax for pre-check of dbinst stage.
[ go to the top ]

What are the default values for the command line arguments?
Here are the default values and behavior for different stage and component commands:

For component nodecon:
If no -i is provided, then cluvfy will get into the discovery mode.
For component nodereach:
If no -srcnode is provided, then the local(node of invocation) will be used as the source node.
For components cfs, ocr, crs, space, clumgr:
If no -n argument is provided, then the local node will be used.
For components sys and admprv:
If no -n argument is provided, then the local node will be used.
If no -osdba argument is provided, then ‘dba’ will be used.
If no -orainv argument is provided, then ‘oinstall’ will be used.
For component peer:
If no -osdba argument is provided, then ‘dba’ will be used.
If no -orainv argument is provided, then ‘oinstall’ will be used.

For stage -post hwos:
If no -s argument is provided, then cluvfy will get into the discovery mode for shared storage verification.
For stage -pre crsint:
If no -c argument is provided, then cluvfy will skip OCR related checks.
If no -q argument is provided, then cluvfy will skip voting disk related checks.
If no -osdba argument is provided, then ‘dba’ will be used.
If no -orainv argument is provided, then ‘oinstall’ will be used.
For stage -pre dbinst:
If no -osdba argument is provided, then ‘dba’ will be used.
If no -orainv argument is provided, then ‘oinstall’ will be used.
[ go to the top ]

Do I have to type the nodelist every time for the CVU commands? Is there any shortcut?
You do not have to type the nodelist every time for the CVU commands. Typing the nodelist for a large cluster is painful and error prone. Here are few short cuts.

To provide all the nodes of the cluster, type ‘-n all’. Cluvfy will attempt to get the nodelist in the following order:
1. If a vendor clusterware is available, it will pick all the configured nodes from the vendor clusterware using lsnodes utility.
2. If CRS is installed, it will pick all the configured nodes from Oracle clusterware using olsnodes utility.
3. If neither the Vendor Clusterware or Oracle clusterware is installed, then it searches for a value of CV_NODE_ALL in the configuration file.
4. If none of the above, it will look for the CV_NODE_ALL environmental variable. If this variable is not defined, it will complain.

To provide a partial list(some of the nodes of the cluster) of nodes, you can set an environmental variable and use it in the CVU command. For example:
setenv MYNODES node1,node3,node5
cluvfy comp nodecon -n $MYNODES
[ go to the top ]

How do I get detail output of a check?
Cluvfy supports a verbose feature. By default, cluvfy reports in non-verbose mode and just reports the summary of a test. To get detailed output of a check, use the flag ‘-verbose’ in the command line. This will produce detail output of individual checks and where applicable will show per-node result in a tabular fashion.

How do I check network or node connectivity related issues?
Use component verifications commands like ‘nodereach’ or ‘nodecon’ for this purpose. For detail syntax of these commands, type cluvfy comp -help on the command prompt.

If the ‘cluvfy comp nodecon’ command is invoked without -i argument, cluvfy will attempt to discover all the available interfaces and the corresponding IP address & subnet. Then cluvfy will try to verify the node connectivity per subnet. It would also obtain the list of interfaces that are suitable for use as VIPs and the list of interfaces to private interconnects. You can run this command in verbose mode to find out the mappings between the interfaces, IP addresses and subnets.

You can check the connectivity among the nodes by specifying the interface name(s) through -i argument.
[ go to the top ]

Can I check if the storage is shared among the nodes?
Yes, you can use ‘comp ssa’ command to check the sharedness of the storage. Please refer to the known issues section for the type of storage supported by cluvfy.

How do I check whether OCFS is properly configured?
You can use the component command ‘cfs’ to check this. Provide the OCFS file system you want to check through the -f argument. Note that, the sharedness check for the file sytem is supported for OCFS version 1.0.14 or higher.

How do I check the CRS stack and other sub-components of it?
Cluvfy provides commands to check a particular sub-component of the CRS stack as well as the whole CRS stack. You can use the ‘comp ocr’ command to check the integrity of OCR. Similarly, you can use ‘comp crs’ and ‘comp clumgr’ commands to check integrity of crs and clustermanager sub-components. You can use the `comp nodeapp` command to check whether the node applications, namely VIP, GSD and ONS, have been configured properly.

To check whether the Oracle Clusterware has been installed properly, run the stage command ’stage -post crsinst’.

How do I check user accounts and administrative permissions related issues?
Use admprv component verification command. Refer to the usage text for detail instruction and type of supported operations. To check whether the privilege is sufficient for user equivalence, use ‘-o user_equiv’ argument. Similarly, the ‘-o crs_inst’ will verify whether the user has the correct permissions for installing CRS. The ‘-o db_inst’ will check for permissions required for installing RAC and ‘-o db_config’ will check for permissions required for creating a RAC database or modifying a RAC database configuration.
[ go to the top ]

How do I check minimal system requirements on the nodes?
The component verification command sys is meant for that. To check the system requirement for RAC, use ‘-p database’ argument. To check the system requirement for CRS, use ‘-p crs’ argument. To check the system requirements for installing the Oracle Clusterware or RAC from Oracle Database 10g release 1 (10.1), use the -r 10gR1 argument.

Is there a way to compare nodes?
You can use the peer comparison feature of cluvfy for this purpose. The command ‘comp peer’ will list the values of different nodes for several pre-selected properties. You can use the peer command with -refnode argument to compare those properties of other nodes against the reference node. To compare the properties pertaining to Oracle Database 10g release 1 (10.1), use the -r 10gR1 argument.

Why the peer comparison with -refnode says passed when the group or user does not exist?
Peer comparison with the -refnode feature acts like a baseline feature. It compares the system properties of other nodes against the reference node. If the value does not match( not equal to reference node value ), then it flags that as a deviation from the reference node. If a group or user does not exist on reference node as well as on the other node, it will report this as ‘passed’ since there is no deviation from the reference node. Similarly, it will report as ‘failed’ for a node with higher total memory than the reference node for the above reason.
[ go to the top ]

Is there a way to verify that the CRS is working properly before proceeding with RAC install?
Yes. You can use the post-check command for cluster services setup(-post crsinst) to verify CRS status. A more appropriate test would be to use the pre-check command for database installation(-pre dbinst). This will check whether the current state of the system is suitable for RAC install.

At what point cluvfy is usable? Can I use cluvfy before installing CRS?
You can run cluvfy at any time, even before CRS installation. In fact, cluvfy is designed to assist the user as soon as the hardware and OS is up. If you invoke a command which requires CRS or RAC on local node, cluvfy will report an error if those required products are not yet installed.

How do I turn on tracing?
Set the environmental variable SRVM_TRACE to true. For example, in tcsh “setenv SRVM_TRACE true” will turn on tracing.

Where can I find the CVU trace files?
CVU log files can be found under $CV_HOME/cv/log directory. The log files are automatically rotated and the latest log file has the name cvutrace.log.0. It is a good idea to clean up unwanted log files or archive them to reclaim disk place.
Note that, no trace files will be generated if tracing has not been turned on.

Why cluvfy reports “unknown” on a particular node?
Cluvfy reports unknown when it can not conclude for sure if the check passed or failed. Please refer to the Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide for details on this.

What does cluvfy error “Could not find a suitable set of interfaces for VIPs” mean?
Cluvfy reports this error when it could not discover at least one subnet that connects all the nodes using the same interface name and does not support IP addresses like 10.*,172.16.*-172.31.* and 192.168.*. Related Note 316583.1

Where can I find the disk rpm
The disk rpm can be found in the cvuqdisk-1.0.1-1.rpm under “Disk1/rpm

[ go to the top ]

How do I check that user equivalence through SSH is setup properly
To verify user accounts and administrative permissions-related issues, use the component verification command admprv as follows:
cluvfy comp admprv [ -n node_list ] [-verbose]
| -o user_equiv [-sshonly]
| -o crs_inst [-orainv orainventory_group ]
| -o db_inst [-orainv orainventory_group ] [-osdba osdba_group ]
| -o db_config -d oracle_home
For example cluvfy comp admprv -n all -o user_equiv -verbose. More details are in the Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide

How can I check the requirements for installing Oracle Clusterware or RAC from Oracle Database Release 10g Release1 (10.1) ?
runcluvfy.sh stage -pre crsinst -r 10gR1 -n node1,node2. More details are in the Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide

What is CVU`s configuration file? How do I use it?
Please review the documentation at Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide

How do I run CVU from installation media?
After mounting the CRS DVD, cd to the Disk1/cluvfy and execute runcluvfy with the same arguments as cluvfy.
For example
./runcluvfy stage -pre crsinst -n ,

What database versions are supported by CVU?
Current CVU release supports only 10g RAC and CRS and is not backward compatible. In other words, CVU can not check or verify pre-10g products.

What are the known issues with this release?
Shared storage accessibility(ssa) check reports
1).Current release of cluvfy has the following limitations on Linux regarding shared storage accessibility check.
a. Currently NAS storage ( r/w, no attribute caching) and OCFS( version 1.0.14 or higher ) are supported.
b For sharedness check on NAS, cluvfy requires the user to have write permission on the specified path. If the cluvfy user does not have write permission, cluvfy reports the path as not-shared.

2.) CVU complains missing packages in Suse
The preinstallation stage verification checks for Oracle Clusterware and Oracle Real Applications Clusters and reports missing packages. Ignore the following missing packages and continue with the installation:
compat-gcc-7.3-2.96.128
compat-gcc-c++-7.3-2.96.128
compat-libstdc++-7.3-2.96.128
compat-libstdc++-devel-7.3-2.96.1

3.) Cluvfy complains about missing Vendor Clusterware packages (e.g. Sun Cluster, ORCLudlm) when deployment is planned with Oracle Clusterware without any vendor clusterware. This is a known issue and is documented in all the release notes

What kind of Storage does cluvfy check for shared-ness?
Cluvfy currently can only check for scsi disks and may error out for special devices like EMC powerpath