Problems in bringing a two node cluter.

Submitted by aditya.sinha on Sat, 2008-02-23 05:20.

Hi ,
I have made a two node cluster with Node1 and Node2.I have made separate SG's to run on these two nodes.That is, SG1 ( which is named as Component1SG ) is running on Node1 and SG2 (which is named as Component2SG) is running on node2.

Now I have configured the component1 to have a RMD facility .That is , I am providing one function in "clCompAppMain.c" of Component1 to be used by any Rmd Client.

Now whenever I try to bring any Node , its showing the following errors in the
log file:

***********************************************************
The following is taken in case of second node(with NodeId 3).But same
problem also comes in first node.
******************************************************************

**********************************************

Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00265 : CRITIC) CPM/G active got IOC/TIPC notification for node [3] --
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00266 : CRITIC) - Possible reasons for this are on node [3] :
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00267 : CRITIC) - 1. AMF crashed.
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00268 : CRITIC) - 2. AMF was killed.
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00269 : CRITIC) - 3. Critical component failed.
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00270 : CRITIC) - 4. Kernel panicked.
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00271 : CRITIC) - 5. Communication was lost.
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.AMS.00272 : CRITIC) - 6. AMF was shutdown.
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.---.00273 : WARN) Not able to find node having node ID [3], error [0xf0013]
Sat Feb 23 16:40:27 2008 (Node1I0.15933 : AMF.CPM.---.00279 : WARN) Not able to find node having node ID [3], error [0xf0013]

***************************************************************

This problem comes in both the nodes. Sometimes it works and some times it doesnt come up.[Please Note that the NodeId of the second node is taken 3 ]

The problem is more frequent in second Node.

I thought the problem concerns TIPC.
So before starting the node again I fired some tipc commands,

>>>rmmod tipc
>>modprobe tipc

and then used
>>./tipc-config -v start --enforce-tipc-settings

But even then I couldnt get away with the problem.

Submitted by amitg on Mon, 2008-02-25 22:07.

Hi Aditya,

The above logs say the the ASP on one of the nodes has gone down. Since the logs say that the ASP with ASP_ADDRESS 3 has gone down, it should have come on other node on which ASP is still running.

Send me the output of "dmesg" and also of "tipc-config -n".
What are the TIPC configurations(i.e. netid and, addr) on both the nodes?

I am suspecting that there is one more node(other than the two which you are using) in the network, which is using a TIPC netid, that clashs with the same on your nodes.

And if possible please attach the full complete logs generated by both the nodes...

Regards,
Amit