Automatic Feature Learning for Optical Character Recognition and Speech Recognition


Principal Investigator

Classical approaches to statistical pattern
recognition problems require that the statistician
first selects a set of appropriate features for the problem. This feature-selection process is informal, heuristic, and largely considered to be a "black art". Furthermore, the quality of the chosen set of features fundamentally constrains the performance of any pattern recognition device constructed using those features. Therefore, an obvious goal is to find ways of automatically learning the features.

It can be shown that if a learner is embedded within an environment of related learning tasks then it is possible to learn features that are appropriate for the entire environment. Furthermore, the larger the number of tasks the more accurate the learnt features will be. In addition to these theoretical results, an algorithm for learning features using artificial neural networks has also been developed.

The purpose of this project is to experimentally verify the theoretical results in two domains: Japanese optical character recognition and spoken word recognition. These domains are ideal tests for the theory because they consist of a large number of related tasks. In Kanji OCR each character can be regarded as an individual learning problem, and there are thousands of different characters, while each spoken word can be viewed as an individual learning problem in the case of speech recognition.


Jonathon Baxter

Systems Engineering,

Research School of Physical Sciences and Engineering



v62 - VPP, PC



What are the results to date and the future of this work?

A classifier for all 3018 Japanese characters in the CEDAR database has been developed and trained to an error of 6%, which is comparable to the error achieved by the CEDAR group, but in our case features were automatically learnt rather than selected by hand. In achieving this a new loss function for very large class problems was developed and an adaptive reweighting scheme was used to overcome the problems associated with under-represented characters. In addition, it was shown how optimal nearest-neighbour classification could be achieved using Euclidean distance in feature space. Experimental results verified the theoretical conclusions.

- Appendix A



Most of the experiments were run on the PC across several processors using the Power Challenge multi-processing C-directives. However, recently the code was ported to the VPP and vectorised with a significant improvement in performance.

The next phase of the project will be to investigate the technique's application in speech recognition, and to continue to investigate other areas of application such as face recognition.

What computational techniques are used?

The algorithms are Conjugate-Gradient based neural network optimization procedures, with a line search method developed specifically for this type of problem. It has been coded in C with a near-linear speedup on the PC, and recently ported to the VPP where it achieves 1 GFlop performance or around 50% of the maximum possible.


J. Baxter and P. Bartlett, The Canonical Distortion Measure in Feature Space and 1-NN Classification, Proceedings of the 10th International Workshop on Neural Informatiion Processing Systems, 1997.

Appendix A -