**Principal Investigator**
David Harrar II **Project **r09

School of Mathematical Sciences, **Machine **VP

The Faculties

**Parallel/Vector Implementations of Algorithms
for the Solution of PDEs **

Work continued on the development, analysis, and efficient implementation of iterative methods for the solution of partial differential equations (PDEs) on vector and parallel supercomputers. The primary interest has been in the use of preconditioned conjugate gradient (PCG) methods to solve the large, sparse, linear systems of equations arising from the finite difference discretization of elliptic PDEs. The preconditioning is effected through the use of other iterative methods, in particular symmetric successive over-relaxation (SSOR). The preconditioner then consists of a matrix polynomial in the SSOR iteration matrix.

It was observed that the majority of the time spent in these polynomial PCG methods was due to the communication inherent in the execution of the SSOR preconditioning. Therefore novel alternative data assignment schemes were introduced so as to accelerate the execution of SSOR, in particular, and local iterative updates, in general; these so-called "lattice assignment schemes" may be used to accelerate stencil operations in any data-parallel PDE application. All that these schemes entail is the assignment of multiple points of the computational domain to each of the (virtual) processors. When done in the prescribed manner, these assignment schemes result in a considerable reduction in the number of shift operations necessary to implement a typical stencil iteration. The choice of the parameters of the lattice assignment scheme is determined based on the geometry of the stencil and on the multicolouring scheme used to decouple local iterative updates under this stencil.

When incorporated into the SSOR-based preconditioners these schemes resulted in a considerable improvement to the efficiency of the parallel performance of the methods, especially for higher-degree polynomials. It is hoped that these schemes may be used to accelerate data-parallel updates implemented not only using CM-Fortran as on the CM-5, but also implemented on other machines which support data-parallel programming in languages such as High Performance Fortran.

**What are the basic questions addressed?**

As described above, the basic question was that of how to accelerate the data-parallel implementation of local iterative updates (stencil operations).

**What are the results to date and future of the
work?**

For the usual 7-point grid stencil which is often used in the discretisation of problems in three spatial dimensions, the proposed lattice scheme results in a reduction in nearest-neighbor communication volume of one-half, while for the 9-point stencil used for two-dimensional problems the corresponding reduction is by a factor of four. For the former stencil this results in an acceleration of a standard SOR update by a factor of over four from the standard implementation using masked array assignment (on a 128 x 256 x 256 grid); on a 256 x256 x 256 grid communication-intensive SOR updates execute at nearly a gflop using this lattice scheme on the 32-node CM-5. With the 9-point stencil the results are even more impressive: On a 4096 x 2048 grid (over 8 million grid points) the lattice scheme is TEN times as fast as the usual implementation! An SOR update on this grid executes at over 1 gflop. These performance improvements carry over to the implementation of the SSOR-based preconditioner for the CG method. Here even the matrix-vector multiplication required in the CG method is accelerated since it uses update-like operations.

**What computational techniques are used and why
is a supercomputer required?**

The major computational technique is the use of the lattice itself in the assignment of grid points to each processor. Since the whole point of the investigation was to accelerate these operations on a supercomputer, it is obvious why a supercomputer was required!

**Publications**

*Efficient SSOR-based polynomial preconditioning
of the conjugate gradient method on vector and parallel computers*.
David L. Harrar II Mathematics Research Report MRR 024-95, Mathematics
Department, Australian National University, (1995).

*On the acceleration of stencil operations in the
data-parallel solution of PDEs, *David
L. Harrar II* *In V.L. Narasimhan, editor, Proceedings of
the First International Conference on Algorithms and Architectures
for Parallel Processing, Brisbane, Australia, IEEE pages 246-255,
(1995)

*On increasing efficiency in multicolour iterative
methods on massively parallel computers.*
David L. Harrar II* *In A. Easton and R. May, editors, Proceedings
of CTAC '95, Melbourne, Australia,. World Scientific. (1996) To
appear.