Thursday, December 17, 2009

Oracle Cluster Health Monitor(IPD/OS)

Oracle Cluster Health Monitor(IPD/OS)
This tool (formerly known as Instantaneous Problem Detector for Clusters) is designed to detect and analyze operating system (OS) and cluster resource related degradation and failures in order to bring more explanatory power to many issues that occur in clusters where Oracle Clusterware and Oracle RAC are running such as node eviction... (README)

After read README, that's easy to install and test.
But I have only one node (Oracle RAC 11gR2). But I'd like to test it(just learn it), so
On Cluster:
- Create "crfuser" user and unzip crfpack-linux.zip file.
# useradd -d /opt/crfuser -s /bin/sh -g oinstall crfuser
# su - crfuser
$ unzip /tmp/crfpack-linux.zip
- Make "crfuser" user remote all nodes (but, i have only one node.)
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/crfuser/.ssh/id_rsa):
Created directory '/opt/crfuser/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /opt/crfuser/.ssh/id_rsa.
Your public key has been saved in /opt/crfuser/.ssh/id_rsa.pub.
The key fingerprint is:
28:dc:bd:69:d2:23:8f:44:0d:65:05:83:e3:10:50:6a crfuser@RHEL5-TEST

$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys

$ ssh rhel5-test hostname
RHEL5-TEST
- Install and then enable tool all nodes
$ cd install
$ ./crfinst.pl -i rhel5-test -b /oracle/oracrfdb -m rhel5-test

Performing checks on nodes: "rhel5-test" ...

Generating cluster wide configuration file...

Creating a bundle for remote nodes...

Installing on nodes "rhel5-test" ...

Configuration complete on nodes "rhel5-test" ...

Please run "/opt/crfuser/install/crfinst.pl -f, optionally specifying BDB location with -b as root on each node to complete the install process.

# /opt/crfuser/install/crfinst.pl -f

Installation completed successfully at /usr/lib/oracrf...


# /etc/init.d/init.crfd enable


# /etc/init.d/init.crfd status

OSysmond running with PID=17346.
OLoggerd running with PID=17455.

oproxyd running with PID=17462.
Check:
# ps -aef | grep oracrf
root 17345 17236 0 16:12 ? 00:00:00 /bin/sh /usr/lib/oracrf/
bin/crfcheck
root 17346 17236 1 16:12 ? 00:00:00 /usr/lib/oracrf/bin/osysmond
root 17455 1 0 16:13 ? 00:00:00 /usr/lib/oracrf/bin/ologgerd -M -d /oracle/oracrfdb
crfuser 17462 1 0 16:13 ? 00:00:00 /usr/lib/oracrf/bin/oproxyd
# ls /oracle/oracrfdb/
crfalert.bdb crfconn.bdb crfhosts.bdb crfts.b
db __db.002 __db.004 __db.006 rhel5-test.ldb crfclust.bdb crfcpu.bdb crfloclts.bdb __db.001 __db.003 __db.005 log.0000000001
The OS Tool consists of three daemons: ologgerd, oproxyd and osysmond. There is one ologgerd master daemon on only one node in the installed set of nodes and there is one osysmond on every node. If there is more than 1 node in the installed set of nodes, another node is chosen to house the standby for the master ologgerd. If master daemon suffers a death (because daemon is not able come up after a fixed number of retries or node where master was running is down), standby takes over as master and selects a new standby. Master manages the OS metric database in Berkeley DB and interacts with the standby to manage a replica of the master OS metrics database.

osysmond
is the monitoring and OS metric collection dae
mon that sends the data to ologgerd. ologgerd receives the information from all the nodes and persists in a Berkeley DB based database.

oproxyd
is the proxy daemon which handles connections on the public interface. If the tool is configured with private node names, only orpoxyd is liste
ning on the public interface for external clients (like oclumon and crfgui). This serves as a security measure against attacks on ologgerd master daemon. It runs on all the nodes and is highly avaliable.

Test:

$ /usr/lib/oracrf/bin/oclumon -h

For help from command line : oclumon -h
For help in interactive mode : -h
Currently supported verbs are :
showtrail, showobjects, dumpnodeview, manage, version,
debug, quit and help

$ /usr/lib/oracrf/bin/oclumon dumpnodeview


dumpnodeview: Node name not given. Querying for the local host

----------------------------------------
Node: rhel5-test Clock: '12-17-09 09.20.28 UTC' Seria
lNo:434
----------------------------------------


SYSTEM:
#cpus: 2 cpu: 2.84 cpuq: 2 physmemfree: 18236 mcache:
596736 swapfree: 2458264 ior: 0 iow: 121 ios: 7 netr: 27.21 netw: 26.87 procs: 252 rtprocs: 13 #fds: 4059 #sysfdlimit: 6815744 #disks: 1 #nics: 2 nicErrors: 0

TOP CONSUMERS:

topcpu: 'osysmond(17346) 1.89' topprivmem: 'ocssd.bin(8077) 220036' topshm: 'ora_smon_orcl_1(20233) 68896' topfd: 'ocssd.bin(8077) 90' topthread: 'crsd.bin(2905) 58'

$ /usr/lib/oracrf/bin/oclumon dumpnodeview -v -n rhel5-test -last "00:00:10"
On Client:
- Install
# ./crfinst.pl -g

Installation completed sucessfully at /usr/lib/oracrf...
Test:
# /usr/lib/oracrf/bin/crfgui -m rhel5-test
Cluster Health Analyzer V1.10
Look for Loggerd via node rhel5-test

...Connected to Loggerd on rhel5-test
Note: Node rhel5-test is now up
Cluster 'MyCluster', 1 nodes. Ext time=2009-12-17 09:22:33
Making Window: IPD Cluster Monitor V1.10 on test01, Logger V1.03.20090322, Cluster "MyCluster" (View 0), Refre
sh rate: 1 sec
that's just a example and funny...

No comments: