|
HPC Group / Home Scalable and
Dependable Applications and Infrastructure for High-Performance
Computing
The high-performance computing (HPC) group primarily focusses on critical services for efficient, scalable and dependable
high-performance computing. The HPC group is responsible for the GEMS (Gossip-Enabled
Monitoring/Management System) project and related HPC projects in the HCS Lab at the University of Florida. GEMS project focuses on
the development of key concepts and mechanisms in scalable failure detection, consensus, and resource performance monitoring and
management for heterogeneous, distributed networks and systems and related HPC applications. The HPC group focuses on these issues
for large-scale, heterogeneous clusters and grids, and key applications, and works with the international iVDGL group led by the Department of Physics at UF to adapt
our methods for scalable resource health and performance monitoring for the needs of scientific data-intensive grids. Recently, the
group has been focusing on two particular areas for IVDGL, these being hybrid forms of network monitoring and new computational grids
based on multiparadigm resources including reconfigurable hardware. The HPC group also works to provide high-performance parallel
solutions to complex problems, including simulation of joint mechanics where the group works closely with the Computational Biomechanics Laboratory in the Department of
Mechanical and Aerospace Engineering.
Sponsor: NSF/iVDGL, NSF/UltraLight
Principal Investigator: Dr. Alan D.
George
Spring 2006 meetings: 3pm (8th period) Mondays and 12:50pm (6th
period) Thursdays, HCS conference room (LAR335)
Group Members
Raj Subramaniyan, PhD student, group
leader
Ajit Apte, MS student
Adam Jacobs, PhD student, Alumni Fellow
Kyu Sang Park, PhD student
Sachin Sanap, MS student
Rahul Singh, BS student
Related Links
GEMS web page
Group materials
maintained by Byung Il (password
protected)
MonALISA - note: a version
is MonALISA+GEMS is under
construction for grids of large sites and clusters
2nd Sandia
Workshop on Scalable Fault Tolerance for Distributed
Computing, Albuquerque, NM, April 2002
1st
Sandia Workshop on Scalable Fault Tolerance for Distributed
Computing, Livermore, CA, April 2001
|
 |