"Application Level RunTime Load
Management:
A Bayesian Approach"
Luís Paulo Peixoto dos Santos
June, 2001
Complete Thesis: (PDF,
1899 KBytes)
Abstract
"Affordable parallel computing on distributed shared systems requires novel approaches to manage the runtime load distribution, since current algorithms fall below expectations. The efficient execution of irregular parallel applications, on dynamically shared computing clusters, has an unpredictable dynamic behaviour, due both to the application requirements and to the available system's resources. This thesis addresses the explicit inclusion of the uncertainty an application level scheduling agent has about the environment, on its internal model of the world and on its decision making mechanism.
Bayesian decision networks are introduced and a generic framework is proposed for application level scheduling, where a probabilistic inference algorithm helps the scheduler to efficiently make decisions with improved predictions, based on available incomplete and aged measured data. An application level performance model and associated metrics (performance, environment and overheads) are proposed to obtain application and system behaviour estimates, to include in the scheduling agent's model and to help the evaluation.
To verify that this novel approach improves the overall application
execution time and the scheduling efficiency, a parallel ray tracer was developed as a message passing irregular data parallel
application, and an execution model prototype was built to run on a seven time--shared nodes computing cluster, with dynamically
variable synthetic workloads. To assess the effectiveness of the
load management, the stochastic scheduler was evaluated rendering several complex scenes, and compared with three reference
scheduling strategies: a uniform work distribution, a demand driven work allocation and a sensor based deterministic scheduling
strategy.
The evaluation results show considerable performance improvements over blind strategies, and stress the decision network based scheduler improvements over the sensor based deterministic approach of identical complexity."