ANU Supercomputer Facility - Annual Report 1997

Mass Data Storage System


Massive Data Storage System and Related Projects

The Mass Data Storage System (MDSS) was acquired in 1993 to support supercomputer users and other researchers with data intensive projects.

In 1997 the MDSS moved to its second phase which significantly improved its capacity and speed of access. Two very fast and high density Redwood drives and two Timberline drives were added to the original four tape drives. This has increased the MDSS capacity by a factor of 100 to a potential storage of 300 TBytes and vastly increased the speed of data storage and retrieval.

The original Epoch file migration software was replaced by Storage and Archive Manager File System (SAM-FS) from LSCI. The transfer of one terabyte of Epoch managed files to SAM-FS was successfully completed in the surprisingly short period of three months using locally developed code which bypassed the notoriously slow Epoch data manager. A comprehensive transition guide was provided to users to minimize problems resulting from the software change.

The new SAM-FS system has proved to be robust, reliable, and easy to manage and use. Many users have voiced their appreciation of the new system. Others are quietly demonstrating their approval by using it regularly.

Details of the current configuration of the fileserver, tape systems and network connections can be found elsewhere in this report.

Access to the MDSS is either via an NFS mounted partition or via `ftp'. Retrieval is instantaneous if the file is located on the magnetic disk cache or takes less than three minutes if the file has already been staged to tape. There are tools to enable bulk stagein of files from tape and bulk stageout of files to tape, to determine where a file physically resides, and to efficiently copy with the minimum tape mounts. Standard Unix `cp' can be used but the specialized utilities are more efficient.

Management Policy

The local configuration has been tailored to provide archive tape redundancy to avoid lost files caused by a damaged tape, robustness in the face of disk failure, optimization according to file characteristics, and transparent operation with respect to naive users.

Access is restricted to supercomputer users and others on a case by case basis. The MDSS is not a replacement for group or departmental disks. Details of the access policy are published on the ANUSF WWW site.

Usage and Consultancy Activities in 1997

The majority of MDSS users are also supercomputer users. As such, these projects have provided Project Summaries under the auspices of the relevant supercomputer in the Appendix to this report. Some MDSS users are not affiliated with the supercomputers and these projects have designated their Project Summaries under the `S' category in the Appendix. One project - the Massive Compact Halo Search (MACHO) - provided separate supercomputer and MDSS summaries in order to highlight the exceptional MDSS support required for their 1.1 TByte astronomy database.

Following the mid-year transition from Epoch a further 2.1 TBytes of data was archived using SAM-FS bringing the total to over 3.2 TBytes. The number of users also increased by 25 per cent to 113 users.

Projects dealing with massive amounts of data need to carefully consider all aspects of data acquisition, storage, retrieval, navigation, and interpretation before using the MDSS. In 1997, a programming consultant was devoted to assisting researchers to plan and implement their data intensive projects so as to use the MDSS more effectively with efficient data modelling, storage and retrieval methods.

The programmer can help the MDSS user community in three ways. As a first step in 1997, the programmer began contacting existing users to ensure they are aware of the full capabilities of the system and to recommend more efficient methods, if necessary. This was followed by contact with prospective and new users to ensure they understand the rich capabilities of the MDSS system. It is also planned to develop software demonstrating more sophisticated access methods to massive data archives rather than flat file `puts' and `gets'.

In 1997, 26 of the largest data holders from the original 86 Epoch users have been individually contacted. Assistance has been provided on a variety of topics including efficient access of project data, database organization, data interpretation tools and source configuration management.

As a result of discussions with a group regarding source management on the MDSS and the implications of rapid source file change, the MDSS programmer hosted a course on "CVS Source Configuration Management".

To advertise the new services to potential users at ANU, ANUSF staff hosted an introductory course on efficient use of the MDSS system. They were joined by current users from the John Curtin School of Medical Research and the Mt Stromlo Observatory who provided users' perspectives on how the MDSS facilitated their data management tasks. These two groups account for more than 1 TByte of SAM-FS data.

During 1998, the programmer will select an active MDSS project for development of a demonstration tool enabling moderated WWW access to project data residing on the mass data archive.

ANU Supercomputer Facility - Home Page | Contact us
The Australian National University