problem of Installation on target platform

Submitted by jianghaiying on Thu, 2006-07-13 21:46.

I'm doing the OpenClovis ASP installation on target platform.
When I executed the command below, some error messages occured.
Could anyone give me a hand?
====
#./hpiSubagent -x localhost:3456
plugin.c:581:new_handler: Attempt to create handler for unknown plugin libsnmp_bc
config.c:693:oh_process_config: Couldn't load handler for plugin libsnmp_bc
init.c:103:_init: Error: Handlers were defined, but none loaded.
No log handling enabled - turning on stderr logging
registered debug token hpiSubagent, 1
Starting $Id: hpiSubagent.c,v 1.130 2005/11/14 16:18:37 ddearauj Exp $
Sending EVENTS during startup.
/usr/local/etc/snmp/hpiSubagent.conf: line 8: Warning: Unknown token: check_hpi_interval.
Max Event rows 1024.
Warning: Failed to connect to the agentx master agent (localhost:3456): Unknown host (localhost:3456) (Resource temporarily unavailable)
hpiSubagent: Hpi Version 131329 Implemented.
hpiSubagent: saHpiSessionOpen returns with SessionId 1
hpiSubagent: subcsribe_all_sessions() SUCCEEDED!!!!!!!
plugin.c:527:oh_getnext_handler_id: Warning - no handlers
safhpi.c:106:saHpiDiscover: Error attempting to discover resources in Domain 1
hpiSubagent: saHpiDiscover Error: returns ERROR
====

Thanks.

Submitted by honglu on Fri, 2006-07-14 12:29.

Hi jianghaiying,
The problem seems to be missing library "libsnmp_bc"
This library is needed for BCT machines.
Please refer to http://openhpi.sourceforge.net/manual/x348.html
If your target is not BCT disable the building of this plugin by adding the configure flag --disable-snmp_bc during the configure process.

If your target is BCT try upgrading the firmware as mentioned in the
above link

Hope this helps,
gp

Submitted by jianghaiying on Tue, 2006-07-18 03:06.

Hi honglu,

Thanks for your reply.

Now I'm trying to use csa102, the redundancy and failover test application, included in the Evaluation Kit.
# I'm using three rack-mount servers, one Management Station, one System Controller and one Payload.
When I killed the process of csa102 on System Controller node , nothing happened.

These are the detailed steps I did as follows.
1. Install the target machine, System Controller node and Payload node.
2. Start SISP on System Controller node and Payload node.
3. Start the csa102 on System Controller node and Payload node.
./csa_console -c ../images/evalplatform/target.conf sc0 csa102CompI0
4. Kill the csa102 process on System Controller and watch what will happen

I have three questions as follows.

1. Is there any mistakes in my test steps?
How can I check whether the failover happened?

2. What's the relation of System Controller and Payload?
I regard System Controller as Act, and Payload as Standby. It that right?

3. What's the role of Management Station? I used it only for installing the target machines and starting the csa102.

Thank you.

jianghaiying.

Submitted by rbbeatie on Wed, 2006-07-19 05:58.

1) Only mistakes I can see are that possibly a: you didn't specify which csa102 process you killed. There should have been two. One would have been csa102CompI0 and the other would have been csa102CompI1. You wanted to kill the active one, but you didn't specify that you killed the active. If you had killed the standby then you wouldn't see any indication of that fact other than to do another ps. b: when you ran the csa_console, you only watched the csa102CompI0 log, not the csa102CompI1 log. If the I1 component were the active component and you killed the I0 component then you wouldn't have seen anything in the console output.
2) No, it's not quite right, at least not with the eval model. The SCNode is the System Controller node. That means that it is the ring leader and it runs the master sisp_amf process that controls what components are brought up on the various nodes. The standby may be on the payload node, or may be running on the SC node. In your case both active and standby are running on the SC node. As for which is the standby and which is the active, I confess that I don't know. The only way I know to figure out which component is the active is to see which one is printing the "hello world" lines in the log file. If both processes are running on the same node then I just use strace on the first process and look to see if it's making write(2) calls with the "hello world" message. Then if I want to kill the active process I just kill that process.
3) Yes, basically that's what it's for at this point.

This brings us to "the bug" There is a bug in the install-to-target script that you have. If you look at line 60 in the install-to-target script you will see the line:
if [ "${FORCE_SINGLE}" = "" ]
That line actually needs to read:
if [ "${FORCE_SINGLE}" != "" ]

The problem is that there are two sets of config files: a set for a distributed installation where there is a SCNode and a Payload node, and a set for a single node installation where everything is running on the SCNode.
The bug causes the install-to-target script to install the single-node set of configuration files which causes sisp to run as if it were a single node installation. Because of that both active and standby csa102 processes would have been running on your SCNode. Make the change above and try installing again. You should see csa102 running on both your SCNode and you Payload node. You should be able to see that one is active while the other is standby. If you kill the active then you should see the standby become the active.

Submitted by jianghaiying on Wed, 2006-07-19 21:00.

Thanks for your reply.
I just succeeded in the failover of csa102,
both active and standby are on the SCnode.
But I failed in retrying it.
These are the detailed steps as follows.
1. killed active and failover happened.
2. killed the other one, ex-standby.
2. stop csa102
# ./lockutil.sh la 102
# ./lockutil.sh li 102
3. run the command again
# ./lockutil.sh la 102
4. see the log file
# tail -f /var/log/csa102CompI*.log

No new lines was output in them.
And no csa102 process was initiated.

Is there any mistake in the steps?
Could you help me?

Thanks and regards.

Submitted by jianghaiying on Wed, 2006-07-19 21:19.

I succeeded in retrying the failover after restarting sisp.
And I succeed in the failover on the SCnode and PLnode.
I modified the install-to-target file.
Thank you very much.

Submitted by jianghaiying on Wed, 2006-07-19 23:14.

I'm trying csa103, the checkpoint test application.
I expected that the csa103s would be initiated on SCnode and PLnode.
But both active and standby were initiated only on SCnode.

I have modified the install-to-target
========
if [ "${FORCE_SINGLE}" != "" ]
========

target.conf
=====
CL_IP_SC0=10.144.133.77
#CL_IP_SC1=10.144.133.76
CL_IP_PAYLOAD=10.144.133.78
export CL_IP_SC0 CL_IP_SC1 CL_IP_PAYLOAD
=====

Is there anything wrong?

Thanks.

Submitted by rbbeatie on Thu, 2006-07-20 13:41.

The problem is that in that version of the eval kit csa103 runs both instances on the SCNode.

The next version of the eval kit (due real soon now) should support running csa103 instances on separate nodes.

Submitted by jianghaiying on Thu, 2006-07-20 19:04.

Thanks for your reply.

I have one more question.

Is only the eval kit csa103 not supported to run on separate nodes?
Or is the checkpoint function on separate nodes not supported in this version of OpenClovis?

Thanks.

Submitted by honglu on Wed, 2006-07-26 23:12.

OpenClovis current release does support checkpoint across single and multiple nodes.