Grid Infrastructure network issue on ODA X5-2

I ran into a situation where client I had supported was having network timeouts and a huge amount of packet drops on their network when trying to connect to their ODA from the application stack.  After many hours (2 days to be exact) of troubleshooting, it was discovered that there was an issue with the Grid Infrastructure network configuration vs what the client's network was configured with.

The client's default gateway was set to 10.10.10.1.

However when looking at their network configuration set in the GI, it reflected 10.10.0.0 as this is what Oracle was expecting it to be based off of ipcalc.

ipcalc -bnm 10.10.40.42 255.255.0.0
NETMASK=255.255.0.0
BROADCAST=10.10.255.255
NETWORK=10.10.0.0

################


[root@oraprd1 .ssh]# oifcfg getif
ibbond0  192.168.16.0  global  cluster_interconnect,asm
bond0  10.10.0.0  global  public

################

# srvctl config network
Network 1 exists
Subnet IPv4: 10.10.0.0/255.255.0.0/bond0, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

##############


[root@oraprd2 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.10.10.1      0.0.0.0         UG    0      0        0 bond0
10.10.0.0       0.0.0.0         255.255.0.0     U     0      0        0 bond0
192.168.16.0    0.0.0.0         255.255.255.0   U     0      0        0 ibbond0

###################



Could this be the culprit of the problems that client has been having since having their ODA installed in their DC?  I investigated further.

I tried to update it by following MOS note ID 1504734.1

srvctl modify network -netnum 1 -subnet 10.10.10.0/255.255.0.0/bond0

This wouldn't work as when I checked the network in GI after making the change, it still reflected 10.10.0.0.

[root@oraprd1 network-scripts]# srvctl start nodeapps -n oraprd1

PRCR-1013 : Failed to start resource ora.net1.network
PRCR-1064 : Failed to start resource ora.net1.network on node oraprd1
CRS-5017: The resource action "ora.net1.network start" encountered the following error:
CRS-5008: Invalid attribute value: bond0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/oraprd1/crs/trace/crsd_orarootagent_root.trc".

CRS-2674: Start of 'ora.net1.network' on 'oraprd1' failed
PRCR-1079 : Failed to start resource ora.oraprd1.vip
CRS-5017: The resource action "ora.net1.network start" encountered the following error:
CRS-5008: Invalid attribute value: bond0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/oraprd1/crs/trace/crsd_orarootagent_root.trc".

CRS-2674: Start of 'ora.net1.network' on 'oraprd1' failed
CRS-5017: The resource action "ora.net1.network start" encountered the following error:
CRS-5008: Invalid attribute value: bond0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/oraprd2/crs/trace/crsd_orarootagent_root.trc".

CRS-2674: Start of 'ora.net1.network' on 'oraprd2' failed
CRS-2632: There are no more servers to try to place resource 'ora.oraprd1.vip' on that would satisfy its placement policy
PRCR-1013 : Failed to start resource ora.ons
PRCR-1064 : Failed to start resource ora.ons on node oraprd1
CRS-5017: The resource action "ora.net1.network start" encountered the following error:
CRS-5008: Invalid attribute value: bond0 for the network interface
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/oraprd1/crs/trace/crsd_orarootagent_root.trc".

CRS-2674: Start of 'ora.net1.network' on 'oraprd1' failed


So I suggested to the client that we switch the subnet in its entirety and switch to an entirely different VLAN.

After client defined the new VLAN IPs, I went to work changing the IPs on their entire ODA (VIPs, Public IPs, SCAN IPs, ILOM IPs) to those of the new VLAN IPs.

Upon doing so, the network packet drops ceased and the timeouts ceased as well.  Problem solved.





Comments

Popular posts from this blog

RMAN-10038: database session for channel prm3 terminated unexpectedly

ORA-17630: Mismatch in the remote file protocol version client 2 server 3

ORA-00338: log {n} of thread {n} is more recent than control file