GEMS : Gossip-Enabled Monitoring Service
Table Of Contents
4. (a) Starting GEMS
(b) Restarting GEMS in a new node
(c) Inserting a new node into the service
5. Stopping GEMS
6. Retrieving the resource monitoring data
7. Configuring GEMS
8. Additional features
A thorough understanding of the gossip-style failure detection protocol [1-2] is required to understand the instructions
provided in this manual. Knowledge of the various configurations of Gossip when run in both the flat and layered scheme is
also required. For a detailed description of the gossip failure detection protocol refer to [3-5] while for a more
functional description of the implementation consult the GOSSIP-README in the DOCS directory provided.
Gossip-based failure detection uses exchange of liveliness information to detect failures and reach consensus on a
particular node. GEMS is an extension of the Gossip-style failure detection service to support resource monitoring for
heterogeneous clusters. GEMS uses sensors to monitor resources and the monitored data are piggybacked on Gossip messages.
In GEMS, the entire system is divided into groups and the distribution of monitored data takes place at two levels. The
system parameters are measured at the lowest layer and distributed within the group while they are aggregated at upper
layers and exchanged between groups. This file describes the various features provided by GEMS. A detailed discussion on
the architecture and features of GEMS is provided in .
This package contains the following GEMS files
Sample API files:
4. (a) Starting GEMS
GEMS can be started in any node in two ways. The service can be started either along with the gossip-style failure
detection service or at any arbitrary time when GEMS services are needed. In the former method, GEMS is started by
providing the command line option '-m' to the gossip executable. For a detailed account on how to start the gossip service
itself, please consult the GOSSIP-README. This method of starting is preferred when GEMS is used for system
administration or when GEMS is to be started on a large number of nodes. The latter method of starting GEMS at any
arbitrary time can be done using the API of GEMS. This latter method requires the gossip service to be running on the
nodes. This method of starting is used by scheduling and load balancing services to start GEMS in a limited number of
nodes that also run gossip. This is done by using the gems_init function which sends a message to RMA in the corresponding
nodes to monitor the resources.
4. (b) RESTARTING GOSSIP IN A DEAD NODE
The same two methods detailed above can be used to restart GEMS in a node that comes back alive. But care should be taken
to start the gossip service as instructed in the GOSSIP-README
4. (b) Inserting a new node into GEMS
The insertion of new node is same as that specified in the GOSSIP-README except that the '-m' command-line option is to
be provided to enable the monitoring service. The issues such as the destination group of node and insertion are handled
by the gossip service. Once the insertion is complete the monitoring service reconfigures itself to transparently
accommodate the new node into the service.
5. Stopping GEMS
Stopping the GEMS service can be done by either stopping the entire gossip service or using the GEMS API. When the gossip
service is stopped GEMS is automatically stopped as there is no carrier for the monitored data. The GEMS service can also
be stopped using the API, without affecting the Gossip service. The API control functions gems_end and gems_stopall are
used for this purpose. The earlier function stops the dissemination of both the built-in sensor data and the userdata (For
a description of the built-in sensor data and the userdata please see the attached rough draft titled GEMS: Gossip-Enabled
Monitoring Service for heterogeneous distributed systems ) while the latter just stops the dissemination of the userdata.
6. Disseminating and retrieving the resource monitoring data
The application programming interface provided with GEMS is used for the dissemination and retrieval of data. The
implementation of the application programming interface is provided in the file api.c and also as a dynamically loadable library named libgems_api.so with the GEMS package. When the client requests for the retrieval of monitored data the API
contacts the GEMS and retrieves data which is stored in a global data structure. The applications can access these global
data structures through the inclusion of the header file api_header.h in their source files. The different types of global
data structures for the different types of data are detailed below.
GEMS monitored data is principally composed of the built-in sensor data and the user data. The built-in sensor data as name
implies is measured by sensors built within the GEMS and can only be retrieved by the applications. The applications
cannot change the values of the built-in sensor data. The applications use the API to retrieve the built-in sensor data. The
API function gems_update is used for this purpose. The gems_update function communicates with GEMS and stores the received
monitored data in the data structure agg_loadinfo.
The structure agg_loadinfo is composed of an array of load_info structures. The load_info structure is shown below
struct load_info loadinfo[num_hosts];
The num_hosts refers to the number of hosts/groups in each layer and the num_layers refers to the number of layers. These
structures are dynamically allocated and the values num_hosts and num_layers are retrieved from GEMS at runtime. To access
the monitored data of a particular group (group j) of hosts in a specific layer (layer i) the format to be followed is
To access the monitored data of individual nodes within the same group as the node being contacted
Any number > 0 given for the num_layers will refer to a higher layer and in higher layers the num_hosts will in turn refer
to a group of hosts instead of a single host. The load_info contains the values that were compressed to reduce the resource
utilization and to ensure the portability of the service. The decompressed values which are of integer length can be
accessed through data structure
The other type of data that is disseminated by GEMS is the user data. User data is the data provided by the applications for
dissemination through the GEMS. GEMS does not change these values and it just receives these data from applications and
disseminates it through the system. The transmission and receipt of the userdata is achieved through the API functions
gems_senduserdata and gems_recvuserdata. The pseudo-code for the dissemination with the various API functions is given in
the file pseudo-code.c
1. gems_aggfn_init( &aggfn_id, filename, strlen(filename));
2. gems_userdata_init(&userdata_id, size, aggfn_id );
3. while (1)
5. data = sensor ( );
6. gems_senduserdata (userdata_id, data, size);
7. gems_recvuserdata (userdata_ id);
8. sleep (1);
10. gems_userdata_stop (userdata_id);
The dissemination of user data involves the following steps: procuring an ID for the user data, procuring an ID for the
aggregation function if the data employs a user-defined aggregation function, and the selection of an aggregation function.
1. gems_aggfn_init( &aggfn_id, FILE, strlen(FILE));
If a new user-defined aggregation function is required, then a request for a unique ID is sent to the RMA, along with the
filename containing the aggregation function. Here the variable aggfn_id holds the Id assigned for the aggregation function,
when the call returns. FILE is the name of the file containing the aggregation function. All user-defined aggregation
functions should be of the format
char * aggfn(char * data);
because GEMS uses the keyword aggfn to dynamically load the function. The data refers to the monitored data supplied by
GEMS, which is aggregated by the newly loaded function and returned as a character array. An example file containing an
aggregation function is shown in the file testfn.c. While sending the filename of the file the entire path of the file
should be provided or GEMS assumes that the file is present in GEMS source directory. The RMA assigns the unique ID to the
new function, after dynamically loading it into the service and propagates it throughout the system.
2. gems_userdata_init(&userdata_id, datasize, aggfn_id );
The application then requests an ID for the new data to be disseminated. This is done on line 2 of the pseudo-code. Here the
size of the data to be disseminated is given in the datasize variable and the aggregation function that will be used for
this data is given in the aggfn_id variable. When the call returns, the userdata ID is returned in the variable userdata_id.
The call to these two functions should be done by only one of the processes of a distributed application. This process
should then broadcast the Id obtained to all the other processes.
5. data = sensor ( );
6. gems_senduserdata (userdata_id, data, datasize);
Line 5 shows an example function of the application that returns the data to be disseminated through the GEMS. In line 6
this data is sent to GEMS through the API function gems_senduserdata. Here the Id for the data is provided in userdata_id,
the data is supplied in the character array data whose size is specified by the datasize.
7. gems_recvuserdata (userdata_ id);
Line 7 shows the function call for receiving the user data. This function gives GEMS the ID of the userdata in the variable
userdata_id, whose current disseminated values are required. The user data is returned in the datastructure
The declaration of each structure is shown below
struct userdata datanum[NUM_DATA];
struct all_userdata hostnum[num_host];
Here, as explained already the num_layer and num_host refer to the number of layers and the number of hosts/groups
respectively. These structures are dynamically allocated by obtaining these values at run time. The NUM_DATA refers maximum
number of data that can be disseminated. To retrieve the userdata of any individual node in the same group as contacted
node, the format to be followed is
Here 0 refers to the lowest layer and hence an individual node. To retrieve the userdata (userdata_id k)of a particular
group (group j) of nodes in a specific layer (layer i)the format to be followed is
The gems_recvuserdata() function retrieves only the disseminated information of single user data identified by the
userdata_id. These values are retrieved by giving zero for the userdata_id in the datastructures. The format is shown below
Finally, to retrieve all the userdata that is currently disseminated by GEMS the API functions gems_update_w_userdata and
gems_update_w_nuserdata are used. The gems_update_w_userdata does not take any arguments storing all the retrieved sensor
data in agg_loadinfo data structure and the userdata in layernum data structure.
int gems_update_w_nuserdata(u_int8_t num_data,u_int8_t *ID_array)
This function is used when only a specific number of userdata are required identified by their Ids. The num_data variable
specifies the total number of data required whose Ids are given in the ID_array variable.
To use the API functions provided in the file api.c either the client process should be compiled together with this file or it can use the dynamically loadable library libgems_api.so.
7. Configuring GEMS
There is no separate configuration for GEMS. The GEMS follows the same configuration as that of Gossip. Thus for monitoring
a specific groups of nodes or for grouping nodes in a specific fashion the configuration instructions specified in the
GOSSIP-README file should be used.
Currently the load update interval for the built-in sensors is fixed the same as that of the Tgossip. Since the proc file
system is opened and read every Tgossip it is a good practice to use a minimum Tgossip time of 1 seconds for minimizing
- Failure detection time is proportional to Tgossip
- Small value of Tgossip would increase bandwidth and CPU Utilization
- Minimum value of Tgossip for efficient CPU usage should be higher than
the OS time slice (10ms). 1sec recommended for GEMS
8. ADDITIONAL FEATURES
Gossip Interval (Tgossip): The minimum Tgossip interval has been set to 10ms (OS time slice). Provision has also been made
for Tgossip less than 10ms using busy waiting. But since busy waiting increases CPU utilization, it is advised to have the
default minimum value of 10ms. Also note that for using GEMS the recommended minimum Tgossip value is 1sec.
gcc is assumed to be the default compiler. C++ style comments are used in the code.