ANU Supercomputer Facility - Annual Report 1998

Mass Data Storage System

 

The Mass Data Storage System (MDSS) was acquired in 1993 to support data intensive projects. The MDSS was upgraded to its current configuration in 1997 by increasing the capacity of the system by using Redwood tape technology. Most importantly, the upgrade also included the installation of the HSM (Heirarchical Storage Management) software SAM-FS from LSCI, which has enabled the full functionality of the system to be utilised. These upgrades have brought the Mass Data Storage System to world class standards. Additional details of the current configuration of the fileserver, tape systems and network connections are found elsewhere in this report.

Access to the MDSS is distributed over the university using NFS as well as by using standard transfer tools such as 'ftp' and 'scp'. Special utilities were also created for handling large transfers of data from the supercomputers to the MDSS. These tools, 'netmv' and 'netcp', have enabled several projects to transfer large quantities of data to the MDSS in a failsafe way from their batch jobs. An additional tool, Bulkstage, optimizes throughput during the bulk retrieval of files from MDSS.

Management Policy

The local configuration has been tailored to provide archive tape redundancy to avoid lost files caused by a damaged tape, robustness in the face of system problems, optimization according to file characteristics, and transparent operation with respect to naive users.

Access is restricted to appropriate supercomputer projects and other ANU projects outside ANUSF. However, the MDSS is not intended to replace group or departmental disks due to the different I/O characteristics required. Details of the access policy are published on the ANUSF WWW site.

Usage and Consultancy Activities in 1998

Projects dealing with massive amounts of data need to carefully consider all aspects of data acquisition, storage, retrieval, navigation, and interpretation before using the MDSS. To assist in this work, a programming consultant is devoted to help plan and implement their data intensive projects so as to use the MDSS more effectively with efficient data modelling, storage and analysis methods.

The MDSS consultant contacts prospective and new users to ensure they understand the rich capabilities of the MDSS system and the best method of utilizing those capabilities. The consultant is also developing more sophisticated access methods to massive data archives rather than flat file 'puts' and 'gets'.

In collaboration with the MSSO Multibeam Survey Optical Follow-up Program, the MDSS consultant developed a demonstration tool enabling authenticated Web access to MDSS data. The Follow-up Program supplied the data and the user interface requirements. The MDSS consultant provided an integrated software package comprised of a web-server, a database server and a file server residing on distinct but networked systems. Web users are not provided with MDSS accounts rather a specialized authentication protocol was developed to support direct (and hence faster) communications between the global user's system and the MDSS for file retrieval. The protocol prevents counterfeit access.

In the coming year, the MDSS consultant will assist additional large-scale MDSS projects in developing Web front-ends to their MDSS archives.

The majority of MDSS users are also supercomputer users. To simplify the access mechanism and to reflect the cross-computational needs of these projects, the MDSS accounting system was changed from a user-based access mechanism to a project-based mechanism. As such, these projects now have a single project code for access to the MDSS as well as to the VPP and PC. The scheme was extended to non-supercomputer users in a similar way. MDSS projects not affiliated with the supercomputers have been designated as 'd' projects in the Appendix.

Since the Mass Data Store is available to users of the supercomputers, there is a consequent defacto external usage scheme. Currently 75% of the data store is from internal projects and a further 24% for external projects. Less than 1% of the system is used in administrative overhead.

There are now 64 projects on the Mass Data Store with approximately 20 TBytes of user data archived, an increase of 17 TBytes over the previous 12 months.

The increased demand on the system uncovered several bottlenecks in data access and retrieval. The ANUSF systems engineers significant testing and tuning of the system was performed in order to optimize operation of the current hardware and software. Additionally, it is envisaged that the tape drives and data server will be upgraded in 1999 to handle the projected data storage increase.

During 1998 there were several difficulties with the Redwood tape drives. A significant proportion of tape access problems were resolved when the ANUSF systems engineers determined that tape performance problems were related to a vendor firmware upgrade installed mid-year.


ANU Supercomputer Facility - Home Page | Contact us
The Australian National University