Department of Mathematics, **Machine** CM

The Faculties

**Parallel/Vector Implementations of Algorithms for the Solution of PDEs
**

We are concerned with the development, analysis, and efficient implementation of iterative methods for the solution of partial differential equations (PDEs) on today's supercomputers, both vector and parallel. Specifically we use preconditioned conjugate gradient (PCG) methods to solve the large, sparse, linear systems of equations arising from the finite difference discretization of elliptic PDEs. The preconditioners we are interested in are matrix polynomials in the symmetric successive over-relaxation (SSOR) iteration matrix. We are also interested in the efficient data-parallel implementation of local iterative update kernels, such as that which is entailed by the SSOR-based preconditioners.

**What are the basic questions addressed?**

The primary question is that of developing efficient SSOR-based least-squares polynomial PCG methods for the solution of the linear systems of equations which arise from the finite-difference (or finite-element) discretization of PDEs. In order to parallelize/vectorize the SSOR-based preconditioners a 2-colour (red/black) ordering of the equations of the discretization is used. Considerable computational simplifications to the PCG methods have been developed in the case that the SSOR relaxation parameter is set to unity (as we have shown to be analytically optimal for these methods). Partial performance results for this method implemented on both the VP2200 and the CM-5 can be found in the publications.

Another portion of the research is concerned with the efficient data-parallel implementation of local iterative updates on the CM-5. We have proposed a class of "lattice'' assignment schemes; one of these eliminates one-half of all nearest-neighbour communication required for a standard SOR-type local iterative update. A successful practical implementation of these ideas has been tested in detail, along with the case that these schemes are incorporated into the SSOR polynomial PCG method.

**What are the results to date and the future of the work?**

Thus far an SSOR-based least-squares polynomial PCG method has been implemented on both the VP2200 and the CM-5 with performance exceeding 0.400 Gflops on both machines. Incorporation of the lattice assignment scheme into this PCG method increased the performance of the code to over 0.500 Gflops for 1st- to 6th-degree polynomial preconditioning on a 128x128x128 grid with the 1-step PCG method executing at over 0.550 Gflops. A comparison of CM-5 performance in data-parallel shift operations necessary for local iterative updates was also performed.

**What computational techniques are used and why is a supercomputer
required?**

This project is aimed at developing efficient computational techniques for use on supercomputers, specifically for the use in applications involving the solution of elliptic PDEs in three dimensional domains. These sorts of applications lead to very large linear systems, typically with millions of equations.

**Publications**

*On the Use of Serial Array Axes to Increase Efficiency in the
Data-Parallel Implementation of Various Iterative Methods*, D L Harrar II,
Mathematics Research Report MRR 069-94, Australian National University, 1994.
Also: Submitted to the 1st IEEE International Conference on Algorithms and
Architectures for Parallel Processing (ICA3PP-95), Brisbane, Australia, 19-21
April 1995 under the title *On Increased Efficiency in the Data-Parallel
Implementation of Various Iterative Methods.*

*Data-Parallel Nearest-Neighbor Communication: Relative Performance
Differences between Shift Operations on the CM-5 in the Implementation of Local
Iterative Updates*, D L Harrar II, Mathematics Research Report MRR 066-94,
Australian National University (1994).

*Alternative Data Assignment Schemes for the Acceleration of Data-Parallel
Implementations of Local Iterative Updates*, D L Harrar II, Mathematics
Research Report MRR 053-94, Australian National University, (1994).

*Analytically and Implementationally Optimal 2-Color SSOR Polynomial
Preconditioning on Vector and Parallel Supercomputers*, D L Harrar II, in J
G Lewis, editor, Proceedings of the Fifth SIAM Conference on Applied Linear
Algebra, Snowbird, Utah, SIAM, pages 556-561, (1994). Also: Mathematics
Research Report MRR 022-94, Australian National University, (1994).