|
FT Group / Gossip with GEMS V1.0
(released 10/15/02
for Linux, 02/25/03 for Tru64 and Solaris)
Fault Tolerance and Resource Management in Heterogeneous Distributed Networks and Systems
Gossip protocols provide a scalable means for failure detection and
resource monitoring in heterogeneous distributed systems in an
asynchronous manner without the limits associated with group
communication. In addition to supporting all the features provided in the
earlier release of Gossip v2.0, Gossip service with GEMS v1.0 supports
resource monitoring as an extension of gossip-style failure detection
protocol. The gossip-enabled monitoring service (GEMS) is implemented by
piggybacking monitored data on the failure detection messages. This
technique makes the combined service scalable, distributed and
fault-tolerant. The new version addresses two major challenges of
clustering, failure detection and resource monitoring. The service can be
used as a middleware for system administration, scheduling, and load
balancing middleware services.
New features supported by Gossip service with GEMS v1.0 include:
- Failure detection of groups through group consensus in addition to
failure detection of nodes.
- Supports fully distributed resource monitoring with an array of
built-in sensors for monitoring load average, network utilization, etc.,
- Data consistency maintained through the heartbeat protocol of
gossip-style failure detection service
- Supports aggregation of monitored parameters to present an
aggregate view of groups of nodes, which also aids in improving the
scalability of service through reduce resource utilization
- Provides provisions for the dissemination and aggregation of
application data
- Dynamic inclusion of user-defined aggregation functions
- Simple API for retrieval and dissemination of monitored data
|
 |