The University of Florida 
High-performance Computing & Simulation Research Lab
home > project > magroup

    submenu »     project home | overview | downloads | publications | related links

MA Group / Project Overview
Mission Assurance in High-Performance Computing and Information Technology

The HCS Research Lab is active on research in the area of mission assurance in high-performance computing. Researchers in this group are investigating key issues with regard to achieving mission assurance for mission-critical dynamic applications in high-performance computing so as to identify and elucidate tradeoffs and determine the most appropriate balance between performance and assurance.

High-performance computing leads the way in terms of processing capability and speed, and is often given the job of handling a variety of critical applications and tasks. The use of increasingly complex technology in high-performance computing as well as the need for dynamically changing applications presents a unique challenge in the area of contingency planning for disaster recovery. Example scenarios would include anything from power outages or disk failures all the way to malicious, intentional interruption of the information infrastructure.

As for the notion of static applications versus dynamic applications, the idea of the two is meant to convey distinct differences in development, use, and storage of such applications and/or related data. Static applications encompass the majority of applications, and they enjoy benefits such as lengthy development periods and large development teams, formal software engineering practice, a long software half-life, and "static" use (e.g. code doesn't change often, data is useful for longer periods of time, etc.). By contrast, dynamic applications feature a higher rate of code turnover, small and flexible development teams, and dynamically changing objects and/or data.

To date, there has been significant planning and success in terms of minimizing or even eliminating loss of resources due to unpredictable failures of static applications. Off-site backup, critical component redundancy, documentation of system configurations, and even standardized hardware/software all help to alleviate the negative impact of disasters for static applications. However, for the purposes of dynamic applications, many of these solutions pose new problems or trade-offs in terms of system performance and reliability.

In summary, the objective of this project is to investigate key issues in providing mission assurance for mission-critical, dynamic, high-performance computing applications in order to identify and balance the trade-offs between performance and assurance. Methods used to achieve these objectives will include the collection of technical literature from government, industry, and academia on continuity of operations planning and management for software systems, as well as the use of modeling and simulation to investigate more deeply the tradeoffs involved in finding the optimal balance of mission performance versus assurance.