Fault Line and Boundary Estimation from Spatial Data


In many instances in applications, spatial data may be viewed as observations on an underlying regression surface. Estimation of jump curves or "fault lines" in such surfaces is then a task of primary interest. The developed methodology for estimating fault lines caters for features which are desirable both in theory and practice, such as irregularly-spaced design. An intuitively appealing but computationally demanding approach is taken, and asymptotically (as sample size tends to infinity) confidence envelopes for this estimator were derived theoretically. This approach consists of transforming the raw data into a smooth surface wherein the fault line appears as a ridge, which is defined to be its estimator.


Principal Investigator

Peter Hall
Mathematics Research Section
SMS
ANU

Project

x21, d33

Facilities Used

PC, SC

Co-Investigator

Christian Rau
Mathematics Research Section
SMS
ANU

RFCD Codes

230203


Significant Achievements, Anticipated Outcomes and Future Work

The projects explore the statistical and numerical aspects of the estimator, mainly in dependency on both the sample size, or more precisely the intensity of the random point process which generates the design points, and the local geometric properties of the fault line. Indeed, the radius of the confidence envelope is estimated using estimates of quantities such as curvature at each point of the fault line estimate, that is, the ridge.

The distribution which yields the percentiles for the asymptotically correct coverage level of the envelopes was studied via a Karhunen-Loeve expansion of a planar Gaussian field. Although the methodology has been established a considerable while ago, there is notably little existing work dealing with the practical aspects that were investigated in the projects.

Future work includes the investigation of the methodologically quite similar case of estimation of fault lines in bivariate densities, and the estimation of density support boundaries. In either case, methods for adaptive bandwidth selection are of interest. For regression surfaces, the investigation of the case where the distribution of the errors is not normal, but heavy-tailed, is also planned. Questions that arise in the context of simulating the distribution mentioned in the previous paragraph appear to be of independent interest, such as the expediency of sparse-grid techniques in calculating a four-dimensional Fourier Transform.

 

Computational Techniques Used

The code used in the projects was mainly developed in C++ and Matlab. Generally, the C++ algorithms served to generate the datasets used in the simulations, while the Matlab routines served to find the likelihood maximisers via the generalised Nelder-Mead method (or variants thereof), and to perform the geometrical construction of the confidence envelopes. In the latter task, estimated values of curvature of the fault line were produced using Fortran code from other authors. The computation of the likelihood, though only involving elementary operations, was strongly affected by sparseness difficulties that are common in spatial problems. These difficulties were compounded in estimating curvature. Karhunen-Loeve calculations involve the treatment of large matrix eigensystem problems, which posed inhibiting limitations, notably again in terms of memory, on smaller-scale facilities.