Gossip Based Failure Detection Service
1. INTRODUCTION
2. FILES
3.(a) STARTING GOSSIP/startgossip options
(b) ALTERNATIVE RAPID GOSSIP START
(c)
RESTARTING GOSSIP IN A DEAD NODE
4. STOPPING
GOSSIP
5. REPORTING CONSENSUS AND NODE LIVELINESS
6. INSERTING A NEW NODE INTO THE SYSTEM
7.
CONFIGURING GOSSIP
8.
INTERFACE TO GOSSIP
9. OUTPUT
MESSAGES
10.
ADDITIONAL FEATURES
11. COMMENTS
1. INTRODUCTION
Gossip-based failure detection uses exchange of liveliness information to detect
failures and
reach consensus on a particular node. The gossiping service can be run as a
background service
that provides failure detection to individual applications. This package
includes the interface
library for individual applications to check the status of any node at any time.
The package comes
with commands to start and stop gossip on any number of nodes. The behavior of
the service gossip
can be changed by modifying the config file. The config file specifies all the
parameters required
by gossip.
GEMS_GOSSIP however provides a gossip-based monitoring service along with
failure detection. The
monitoring service has been described in a separate section following the
description of failure
detection service. For further details on the monitoring service, please read
the MONITOR-README
file in the DOCS directory
2. FILES
This package contains the following files
INSTALL
DOCS:
interface
GOSSIP-README
sampleflatconfigfile
samplemultilayerconfigfile
Makefile:
To compile
startgossip and stopgossip
C-code files:
startall.c
start-gossip.c
stop-gossip.c
gossip-main.c
send-thread.c
recv-thread.c
info-thread.c
insert-thread.c
update-thread.c
packet.c
destination.c
comm.c
time.c
sock_init.c
interface-test.c
C-include files:
defs.h
fault-detect.h
local.h
Scripts:
gossip.restart
rmexec
gossipser
startall
3.(a) STARTING
GOSSIP/startgossip options
The command "startgossip" is used to start gossip service on a given number of
nodes.
startgossip [OPTION]...[CONFIGURATION FILE]
-c,
Compile
gossip for failure detection alone
-v,
Version of
GEMS_GOSSIP
-h,
Manual
details, help
-m,
Compile
gossip for failure detection along with monitoring
-x,
Execute
GEMS_GOSSIP
This command reads the configuration file and executes GEMS_GOSSIP on the
specified nodes.
GEMS_GOSSIP is started on each of the nodes using the "rsh" command.
As soon as gossip starts on a node it sends a packet to node zero. Node zero is
the first node
in the config file. Node zero counts the number of nodes reported. If the count
reaches the
number of nodes specified in the config file, node zero sends a broadcast to
everyone to start
gossiping. This is in essence a simple implementation of barrier.
If a node specified in the list is already dead while starting the gossip
service, gossip
is started on all the live nodes provided the number of live nodes is greater
than half the
number of nodes specified in the config file. Once gossip is started, the dead
nodes are reported.
The startgossip script assumes the existence of NFS, where the executable files
are visible
on all the nodes where the service is started. Functional remote shelling is
also assumed
(i.e., properly configured ".rhosts" file).
3.(b) Alternative Rapid gossip
start
As opposed to the startgossip method of starting gossip on all the nodes, gossip
can also be started
using a gossip server. The 'startall' command can be used initially to remote
login into all the nodes
specified by a hostlist filename and start the gossip server in all of them. The
gossip server waits
for a command to execute in a blocking receive.
startall [hostlistfile] [1] //A
'0' can be used instead of one to stop gossip server.
'rmexec' command can then be used to send the executable gossip command like
'gossip configfile 1&' to
the gossip server in the nodes specified by a hostlist file. Gossip is started
locally in all the nodes.
rmexec [hostlistfile] [numberofnodes] [configfile name]
Note: This rapid mode of starting will save time while testing gossip, but may
not be suitable for proper
interfacing with applications
3.(c) RESTARTING GOSSIP IN A
DEAD NODE
When gossip has to be restarted on a node (when the node has been fixed) gossip
is executed with a flag
to avoid barrier and join the gossip system.
gossip [configfile] [2]
The 2 in the above command excutes gossip without calling the barrier function.
However this funtion is
useful only when restarting gossip. If the node is new to the system, it has to
be inserted using all the
insertion commands described later.
4. STOPPING GOSSIP
The command "stopgossip" should be used to stop gossip service. This is the
format of the command.
stopgossip host-name|all [config-file]
"stopgossip" can stop the gossip daemon on any one host, or with the "all"
identifier and
"config-file" specified it can stop the daemon on all the nodes. This is done by
sending
a stop gossip message to the information thread of the gossip service. On
receipt of this
message the information thread stops immediately. This is a safe and controlled
way to stop
the gossip service.
5. REPORTING CONSENSUS
AND NODE LIVELINESS
Whenever consensus about a dead node is achieved,the node broadcasts the
consensus information
to all the nodes. Likewise, when a dead node resumes normal operation, the
information again is broadcast
to all the nodes. This can be used by the application using gossip as a kind of
interrupt signal.
However the live list of the groups are also sent with gossip messages. By this
the nodes in the other
groups can know about the status of the nodes in all the groups. This is added
to avoid inconsistencies
due to UDP packet miss.
6. INSERTING A NEW NODE
INTO THE SYSTEM
In a Flat system, a new node is always inserted into the first group (as there
exists only one) as the last
node of the group. In a multi layered system, a new node can be inserted as the
last of any layer one group.
The group can be specified while inserting the node. The 'testsender' command is
used to insert the node.
testsender [config filename] [groupnum]
The command sends a requesst to the first node in the group (groupnum) called
the sponsor node. If the
sponsor does not reply, a message is sent to the next node in the group after a
timeout. The testsender program
exits if an ack is got. The sponsor node then changes the configuration file,
broadcasts the new node details
to all the nodes in the system, starts to the new node and finally broadcasts
again to all the nodes to start
with the new updated data structure.
Note 1: It is always better to insert a node into the group which has the least
number of nodes as it avoids extra
time and memory. If the sponsor node is not able to start gossip at the new
node, it times out and a
consensus is arrived on the new node and is declared dead. If again the node has
to be joined to the system
gossip has to only restarted by the method described earlier.
Note 2: Make sure that there is enough blank spaces.. at least (not tabs) left
between the group sizes and their cleanup values
to facilitate node insertion. Because if the group size from say a single digit
number to a double digit
number, the function might run out of space to overwrite the data in the config
file.
7. CONFIGURING GOSSIP
The important parameters for gossiping are 'number of layers','Tgossip', 'number
of groups in each layer'
and 'Tcleanup'.These parameters need to be tuned to make the failure detection
time as small as possible.
In a flat system, 'number of groups' is automatically set to one.
Tgossip
- Failure detection time is proportional to Tgossip
- Small value of Tgossip would increase bandwidth and CPU Utilization
- Minimum value of Tgossip for efficient CPU usage should be higher than the OS
time slice (10ms)
Number of layer and groups
- For a given system size there exists an optimum number of layers and hence
groups that minimize resource utilization
- It is possible to set Tcleanup for each group and each layer.
- Groups can have different sizes and appropriate Tcleanup values.
Tcleanup
- Smaller the value of Tcleanup smaller the consensus time
- Too small a value of Tcleanup will cause false consensus
- For a given number of nodes there exists a minimum value of Tcleanup that
minimizes the consensus time
Thus these values can tuned using the config file. The default value of Tgossip
is 10ms.
For more information on tuning these values and partitioning the system into
groups, see [2][3].
These references are available on the HCS web site, http://www.hcs.ufl.edu/Publications.htm
,
or see the contact information at the end of this document.
CONFIG FILE:
This file specifies the configuration parameters used in starting gossip.
| Keyword | Value | Description |
| TYPE | Flat/Multi | Whether Flat or Multi Layered gossip protocol will to be used |
| AUTO | false/true | If 'Tcleanup' needs to be chosen automatically or not. AUTO is not yet supported |
| NUM | N1 | Total Number of nodes (N1) participating |
| NUMG | N | Number of groups (N); N=1
for Flat Gossiping |
|
N11 N12
N21 N22 : :
NN1 NN2 |
N11 specifies
the number of nodes in group 1 followed by 'Tcleanup' used in group 1 (N12) N21 specifies
the number of nodes in
: : NN1 specifies the number of nodes in |
|
| TGOSSIPS | TG | Seconds Value of 'Tgossip'
(TG); TG=0 for 'Tgossip' values less than 1 second |
| TGOSSIPUS | TGM | Microseconds Value of Tgossip (TGM);eg., for a 'Tgossip' of 100ms:TG = 0 and TGM = 100000. |
| CLEANUP | NUM | Cleanup (NUM) used among
the groups in the topmost layer |
| LGOSSIPS | TG2..TGN | Gives the Tgossip to be
used in each layer among the groups. TG2...TGN specify the numbers by which TG(TGOSSIPS) should be multiplied to find Tgossip of each layer |
| HOST_LIST | HL | List specifying the names
of the nodes participating in Gossip. NOTE: Lines starting with '#' symbol are ignored. So the host names starting with '#'are also ignored! |
Sample Config file 1
Configuration used: Flat, 8 nodes, Tgossip of 10ms, Tcleanup of 20
Gossip is started on hosts named carrier-73, carrier-74, carrier-78, carrier-88,
carrier-89, carrier-1, carrier-2, and carrier-3.
Gossip is NOT started on carrier-77.
The initial node responsible for the "rsh" command and barrier is carrier-73
--------------------- Begin sample config
file---------------------------------------
#This file contains the host names on which gossip needs to be started.
#host lines beginning with # are ignored
#Type of gossip architecture
TYPE Flat
#Auto : If Tcleanup needs to be chosen automatically or not. AUTO is not yet
supported
AUTO false
#Total Num of nodes followed by Tcleanup
NUM 8
NUMLAYERS 1
1
8 20
#Tgossip seconds Value
TGOSSIPS 0
#Tgossip microseconds Value
TGOSSIPUS 10000
HOST_LIST
carrier-73
carrier-74
#carrier-77
carrier-78
carrier-88
carrier-89
carrier-1
carrier-2
carrier-3
Sample Config file 2
Configuration used: Layered, 3 Layers, 48 nodes,8 groups in second layer with
6 nodes in each second layer group. 2 groups in the third layer each with 4
second layer
groups in each third layer group
Tgossip of 10ms, Tcleanup of 20 for all layers and groups
The groups in layer 2 consists of these hosts:
group 1
carrier-1....carrier-6
group 2
carrier-7 ...carrier-12
group 3
carrier-13 ...carrier-18
group 4
carrier-19 ...carrier-24
group 5
carrier-25 ...carrier-30
group 6
carrier-31 ...carrier-36
group 7
carrier-37 ...carrier-42
group 8
carrier-43 ...carrier-48
The groups in layer 3 has following groups in it:
GROUP I
group 1 .....group 4 (So GROUP I will have carrier1..carrier24)
GROUP II
group 5 .....group 8 (So GROUP I will have carrier25..carrier48)
-------------------- Begin sample config file-----------------------------
#This file contains the host names on which gossip needs to be started.
#host lines beginning with # are ignored
#Type of gossip architecture
TYPE Multi
#Auto : If Tcleanup needs to be chosen automatically or not. AUTO is not yet
supported
AUTO false
#Total Num of nodes followed by Tcleanup
NUM 48
#Number of groups, Tcleanup needs to be specified for each group
NUMLAYERS 3
8
6 20
6 20
6 20
6 20
6 20
6 20
6 20
6 20
2
4 20
4 20
#Tgossip seconds Value
TGOSSIPS 0
#Tgossip microseconds Value
TGOSSIPUS 10000
CLEANUP 20
LGOSSIPS 1 1
HOST_LIST
carrier-1
carrier-2
carrier-3
carrier-4
:
:
:
:
:
:
:
carrier-48
8. INTERFACING TO GOSSIP
The file "fault-detect.h" contains the interface to gossip service. This include
file implements
functions that communicate with the gossip service using a socket interface.
This allows any node
to query any other node as long as gossip is running there. The following
functions are available
in the interface:
void gossip_init(host-file-name);
gossip_init() is the initialization function for
the interface
library. 'host-file-name' is the file containing the list of
hosts where gossip service is to be checked.
Host-file sample
----------------
carrier-1
carrier-10
carrier-3
This file can be used to check the status of these 3 hosts
even though gossip may be running on many others
void check_status(int target, int *ptr);
check_status() can be called at the any time to
determine
which nodes are alive or dead. 'target' is the id of the
host you want to supply the information. The host id is
the position in the list in the config file. 'ptr' is the
pointer to the integer array returned by check_status(). A
value of 0 shows that the node corresponding to that
position is dead. A 1 shows it is alive, and 2 shows that
no information is available about this node (since gossip
service is not running there). Taken from the sample
host-file above, ptr[0], ptr[1], and ptr[2] would
correspond to the information about carrier-1,carrier-10,
and carrier-3 respectively.
void print_status();
print_status() prints out a verbose output about
the status
of each of the nodes.
A more complete description of the functions can be found in DOCS section.
9. OUTPUT MESSAGES
The following are the list of messages that are printed to the standard output
during the operation of gossip
i. Barrier Messages
================
Sent barrier to 0 : myid : N
Node N has sent the barrier
information to node 0
Received barrier info from N num in N1
Node 0 has received the barrier info.
from node Nand info. from N1 nodes has been received totally
My name is Name
Message printed by node 'Name' after
barrier info is sent to Node 0
Send barrier broadcast ...
Node 0 has sent the start signal to
all the nodes
ii. Initialization/Re-initialization messages
=================================
Gossip type should either be Flat or Multi
Error message implying that Type
specified in the config file is something other than Flat or Multi
Illegal Number of nodes in the system
Gossip should be started with at
least 3 nodes in the system to have consensus on the failure of a node
Ooooops : too many layers unwantedly
Maximum Layers has been fixed to 5
which by itself can support 10000-20000 nodes. Anyway if required this can
be changed. The error message indicates that the number of layers
specified in the config file is more than five.
Inconsistent config-file -- you cannot have more than 1 layer for non Multi
architectures
Self explanatory
Wrong assignment of groups to layers..No difference in number of groups in
Layer%d and Layer%d
You cannot have the same number of
groups in two layers. The number of groups decrease as we move up the layers.
gossip_script [run_file]
Wrong number of arguments while
starting gossip
Node N1 is back alive
Previously dead node N1 is back alive
now
iii. Consensus Messages
====================
sending broadcast on N1 myid:N2
Node N2 has sent the consensus
broadcast on N1
Node 'n' is back alive
Node n has come back alive
iv. Node Insertion Messages
=======================
Acknowledgement got back. Info was gwwait
Acknowlegment received by the newly
joining node from the sponsor
The group number entered was illegal.There are only 'n' groups in layer two
Messages printed when the new node
tries to join in an non existing group
Received
Sending broadcast for restart
Message printed by the sponsor node
when the new node acknowledges that it has received the broadcast. The sponsor
then broadcasts to all the nodes to restart gossiping
Done -- Node inserted - Myid is 'n'
Printed by the sponsor node with id
'n' after the completion of the node insertion
v. Stop Gossip Messages
====================
Wrong number of arguments...Input config file missing
Error message. if config file is not
specified while stopping gossip
Sending stop signal to ... *name*
Node 0 sends the stop signal to the
node 'name'
Received stop gossip on info thread
Gossip Stopped on N1
Messages printed by Node N1
indicating that the stop signal was received and gossip was stopped
10. ADDITIONAL FEATURES
Multicast: If a broadcast facility is not available, the multicast option can
also be used to
propagate consensus information. But this option is not advisable for large
system sizes. The
default mode has been set to 'broadcast'.
Gossip Interval (Tgossip): The minimum Tgossip interval has been set to 10ms (OS
time slice).
Provision has also been made for Tgossip less than 10ms using busy waiting. But
since busy waiting
increases CPU utilization, it is advised to have the default minimum value of
10ms.
11. COMMENTS
gcc is assumed to be the default compiler. C++ style comments are used in the
code.