GEMS Network Sensor (GNS)

 

Large-scale applications operating on huge volumes of data and requiring immense processing capabilities are served generally by multiple computers operating in tandem as a single system.  Such a system requires an information gathering/distribution service to manage the available resources efficiently and to work consistently.  Many services based on broadcasting and multicasting have been famous in the past.  In the recent past, the need for distributed and decentralized information services has been emphasized due to the enormity of problem and system sizes. 

 

GEMS (Gossip-Enabled Monitoring Service) is a distributed information service based on an epidemic style of information dissemination.  Information monitored in each node in a system is exchanged with other nodes in the system in a distributed fashion.  GEMS features a consensus-based failure detection service that can reliably and consistently detect failures of nodes in the system.  For scalability reasons, the system is divided into logical groups and information is shared only between the nodes within the group.  The information of all the nodes within a group is aggregated and exchanged with the nodes in the other groups. 

 

 

 

 

 

Text Box: Hierarchical organization of clusters for scalable resource monitoring
Text Box: (b) Distributed information collection and control
Text Box: (a) Logical cluster organization

 

 

 

 

 

GEMS as such monitors and shares health and resource information of individual nodes along with aggregated information of groups of nodes.  But for effective management of resources and optimum execution of applications, the network information in the system is also vital as the applications share information between the nodes via the network.  There is a high dependency on communication in modern computing paradigms.  GNS adds network monitoring capabilities to GEMS in order to serve these purposes.  GNS provides scalable and lightweight distributed network sensors for GEMS to measure available throughput and other key metrics in any network under use.   The highlight of the network monitoring capability is the responsiveness and low utilization of system resources even for very large systems.

 

 

                                                Network Sensors for GEMS

The network sensor module measures the actual network performance (e.g. throughput and latency) between the nodes in the system.  The measured values are reported by the report module to applications that use these performance measurements for optimized execution.  The probing plan module decides the control flow and how the data are measured.  The measured values are also stored in memory using a circular queue with adjustable size, and hence can be used for prediction of performances at future times.   The network sensors are designed to provide very low values of inter-test (interval between the probes) and low probing times which lead to a better utilization of the network.

 

LIVE DEMO:  The performance of the network in a cluster of 8 nodes at the HCS Research Lab, University of Florida.