Our project is divided into two phases.
Phase I relates to the development of RAPIDS 4.0 into a minimally-intrusive
In Phase II, RAPIDS will be shaped into a complete performance evaluation
Each milestone below represents the completion of certain key modules.
- The MPI Wrapper: Here we assume that the applications would
be using MPI for communication. The design of the LMM would require
some modifications to MPI or the development of a "wrapper"
to be used on top of MPI.
- The Monitoring Module: The Main Monitoring module collects
information as the application is running and displays it in real-time.
This will mark the completion of Phase I.
- Fault-tolerant Synthetic Workloads: These synthetic applications
will be generic and tunable by the user's input.
- Fault Injection and Recovery Monitoring: User-specified faults
will be injected via a fault injector and the response of the system
monitored. For applications that do not have a built-in recovery scheme,
recovery techniques such as checkpointing or ALFT will be investigated.
- Allocation and Scheduling Algorithms: Within the scope of the
resources available, these system-level algorithms will be made changeable
by the user.
- Integration: All the above modules will be integrated into
one complete, user-friendly, system. This will mark the end of Phase