logo

Schedule

European Grid Conference, February 14 -16 2005, Science Park Amsterdam, The Netherlands

>Register for EGC2005
>Latest News
Aligning data and computation within a grid computing environment
time: 14:00 - 16:00
Chair: John Easton

Summary

A typical grid environment is likely to consist of a number ofcomputational elements. In a grid comprising systems from multipleorganisations, these elements will typically be located at differentgeographically-dispersed sites and interconnected, for the most part, bywide-area network links. Users have a varied workload that they wish torun, generally aiming to process as much workload as quickly as possible.By intelligently farming the workload out over the grid, the users canexpedite its processing. The problem arises when some of the executablesand data items are large compared to the available network bandwidths andlatencies, at which point it is necessary to firstly understand and thenfind the trade offs between the time to process the work and the time totransmit the work to a remote location for processing. To maximize theutilisation of the available systems within the grid we need to find a wayto distribute workloads across the available systems, taking the relativelocalities of executables and data into account ensuring that both data andcomputation are co-located on the same physical system(s). At the highestlevel we can consider this problem as making a decision between sending thecomputation to the data, sending the data to the computation or sendingboth computation and data to a third location for processing.

In environments where there is no central view of the entire grid, there isa need for some entity to help make these high-level, policy-baseddecisions. Today, the lack of a common view used by the different gridcontrol mechanisms (job schedulers, data movers etc.) means that they oftenmake different, potentially conflicting decisions, based upon their ownindividual perspectives of the grid, over where jobs should be submitted tobe run. In the worst case, data and executables are sent to differentlocations within the grid, resulting in a failure of the job due to a lackof valid data. Too many occurrences of this problem can severely impactthe effectiveness of the grid implementation and mean poor utilisation ofboth computer resources and the network links between the variouslocations.

This session will discuss the general issues in aligning data andcomputation within a grid computing environment as well as some of thesolutions in use today which attempt to solve them. The pros and cons ofthese different solutions when used in anger will be highlighted. A noveldata distribution mechanism for grid infrastructures will also bedescribed. This solution uses performance criteria and business rules tocontrol the provisioning of data within a heterogeneous grid environment.Rules that take into account criteria as diverse as network bandwidths,intellectual property restrictions and product development schedules areused to make de-cisions over the scheduling of jobs and the associatedplacement of data to service these jobs. The performance of real customerap-plications that are currently being implemented in a grid computingenvironment spanning sites in five countries will be used to il-lustrateboth the challenges with existing data distribution mechanisms as well asthe before and after performance of applications when the new policy-baseddistribution engine has been implemented. Finally, a set of learningpoints from this implementation will be presented to aid practitioners infuture implementations of multiple applications in both locally andwidely-distributed grid infrastructures.

© Schedule