Thursday, May 26, 2011

exachk - WARNING!!! sm_priority is not set to recommended value of 5 on infiniband switch

I used exachk and found WARNING => sm_priority is not set to recommended value of 5 on infiniband switch exasw-ib1
So, I checked them by CheckSWProfile.sh script.
# ./CheckSWProfile.sh -I exasw-ib1,exasw-ib2,exasw-ib3
Checking if switch exasw-ib1 is pingable...
Checking if switch exasw-ib2 is pingable...
Checking if switch exasw-ib3 is pingable...
Use the default password for all switches? (y/n) [n]: y
[ERROR] OpenSM configurations mismatch for switch exasw-ib1
Found: controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=8
Required: controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5
then I changed sm_priority (8 to 5) in /etc/opensm/opensm.conf on infiniband switch (exasw-ib1)
# vi /etc/opensm/opensm.conf
#Begin /etc/opensm/opensm.conf
.
.
# SM priority used for deciding who is the master
# Range goes from 0 (lowest priority) to 15 (highest).
sm_priority 5
.
.
.
# End /etc/opensm/opensm.conf

After changed ... restart Subnet Manager on infiniband switch
# /etc/init.d/opensmd

Usage: opensmd {start|stop|restart|status}

# /etc/init.d/opensmd restart
Stopping IB Subnet Manager.. [ OK ]
Starting IB Subnet Manager. [ OK ]
back to database server again... and used CheckSWProfile.sh script.
# ./CheckSWProfile.sh -I exasw-ib1,exasw-ib2,exasw-ib3
Checking if switch exasw-ib1 is pingable...
Checking if switch exasw-ib2 is pingable...
Checking if switch exasw-ib3 is pingable...
Use the default password for all switches? (y/n) [n]: y
[INFO] SUCCESS All switches have correct software and firmware version:
SWVer: 1.1.3-2
[INFO] SUCCESS All switches have correct opensm configuration:
controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5

8 comments:

Klaas-Jan Jongsma said...

I think this is a some sort of mistake in the exachk file, you better check it with Oracle Support.

IB1 is your spine switch, you need to set it on a sm priority of 8 in case you need to hook it up to another exadata rack (or exalogic). Setting it to 5 might cause problems in multirack environments.

Klaas-Jan Jongsma said...

I think this is a mistake in the exachk script, you better double check this with Oracle Support.

IB1 is your spine switch, used to hook up multiple Exadata or Exalogic machines. It needs to be set to a sm priority of 8. Your leave switches should be at sm 5. Setting your spine switch to 5 as well might cause problems in multirack environments.

Surachart Opun said...

Thank you so much. I'll check with Oracle Support.

Surachart Opun said...

Still find out.
If check from Doc:
By default, the Subnet Manager within the management controller is set to 0 priority. If there is more than one Subnet Manager in your InfiniBand fabric, you must set the priority of each Subnet Manager appropriately. The Subnet Manager with the highest priority is the primary (or Master) Subnet Manager.

Surachart Opun said...

someone suggested me.. review Exadata Doc before.

I use X2-2 HALF RACK

Setting the Subnet Manager Master on Exadata Database Machine Full Rack and Exadata Database Machine Half Rack

Exadata Database Machine Full Racks and Oracle Exadata Database Machine X2-2 Half Racks have three Sun Datacenter InfiniBand Switch 36 switches. The switch at rack unit 1 (U1) is referred to as the spine switch. The switches at rack unit 20 (U20) and rack unit 24 (U24) are referred to as leaf switches. The spine switch is the Subnet Manager Master for the InfiniBand subnet. It has priority 5.

Vern Wagman said...

Hi,

This is Bug 12600905 - exachk fails to account for spine switch.

With all IB switches at the same sm_priority, the subnet manager master may migrate off the spine switch.

You need to identify to which switch the subnet manager has migrated (if it migrated off the designated spine), migrate it back onto the designated spine switch if not already there, and set sm_priority back to 8.

Please review the "Setting the Subnet Manager Master on Exadata Database Machine Full Rack and
Exadata Database Machine Half Rack" section in Chapter 6 of the "Oracle Exadata Database Machine Owner's Guide". The doc set is typically located at /opt/oracle/cell/doc/doc on a storage server. Copy the contents of that directory over to a PC and open the index.html file.

If you have any questions after review of the material, please open a Service Request with Oracle Support for assistance.

Thanks.

Surachart Opun said...

Thank You for your commended.

Surachart Opun said...

Thank You everyone for comment.
I ignored Warning:
[ERROR] OpenSM configurations mismatch for switch exasw-ib1
Found: controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=8
Required: controlled_handover=TRUE log_max_size=8 polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5

Why.. on new Exadata DOC.

Exadata Database Machine Full Racks and Oracle Exadata Database Machine X2-2 Half Racks have three Sun Datacenter InfiniBand Switch 36 switches. The switch at
rack unit 1 (U1) is referred to as the spine switch. The switches at rack unit 20 (U20) and rack unit 24 (U24) in Oracle Exadata Database Machine X2-2 racks, or unit 21(U21) and rack unit 23 (U23) in Oracle Exadata Database Machine X2-8 Full Racks are referred to as leaf switches. The spine switch is the Subnet Manager Master for the InfiniBand subnet. It has priority 8.

But found about Subnet Manager Master:
# getmaster
20110225 16:11:52 OpenSM Master on Switch : 0x0021286ccca9a0a0 ports 36 Sun DCS 36 QDR switch exasw-ib2 enhanced port 0 lid 4 lmc 0

Master on exasw-ib2, So

# ssh exasw-ib2

# disablesm
Stopping IB Subnet Manager.. [ OK ]
# enablesm
Starting IB Subnet Manager. [ OK ]
# getmaster
20110530 11:08:31 OpenSM Master on Switch : 0x0021286cd635a0a0 ports 36 Sun DCS 36 QDR switch exasw-ib1 enhanced port 0 lid 1 lmc 0

I think ... it's ok. I reviewed on exdata doc and infiniband switch docs also about Subnet Manager Master for the InfiniBand subnet.