Cargando…
Optimized Architectural Approaches in Hardware and Software Enabling Very High Performance Shared Storage Systems
There are issues encountered in high performance storage systems that normally lead to compromises in architecture. Compute clusters tend to have compute phases followed by an I/O phase that must move data from the entire cluster in one operation. That data may then be shared by a large number of cl...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2004
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/973360 |
Sumario: | There are issues encountered in high performance storage systems that normally lead to compromises in architecture. Compute clusters tend to have compute phases followed by an I/O phase that must move data from the entire cluster in one operation. That data may then be shared by a large number of clients creating unpredictable read and write patterns. In some cases the aggregate performance of a server cluster must exceed 100 GB/s to minimize the time required for the I/O cycle thus maximizing compute availability. Accessing the same content from multiple points in a shared file system leads to the classical problems of data "hot spots" on the disk drive side and access collisions on the data connectivity side. The traditional method for increasing apparent bandwidth usually includes data replication which is costly in both storage and management. Scaling a model that includes replicated data presents additional management challenges as capacity and bandwidth expand asymmetrically while the system is scaled. In some cases, systems must also be designed for minimum acceptable bandwidths in the case of a subsystem failure such as a disk drive or data channel leading to increased hardware and management costs. An architectural model will be presented that is in use today in demanding high performance shared access environments. This architecture allows simultaneous data access requiring high bandwidths without the use of data replication. Data can be shared in dynamic multiple path environments allowing for multiple point simultaneous data analysis without a concern for data "hot spots". Further benefits include online scalability where bandwidth and capacity scale linearly as the system is expanded. Additionally, planning for subsystem failures is greatly simplified since the system throughput is maintained regardless ofsystem status. An object based file system will also be presented that is highly scalable and can take full advantage of the hardware architecture. The future roadmap of both the hardware and software will be discussed including block level interfaces that will improve data movement efficiency through the server layer. Biography Dave Fellinger is the Chief Technical Officer at DataDirect Networks, Inc., the world's leading provider of networked storage and cluster solutions for High Performance Computing. Mr. Fellinger has positioned DataDirect Networks solutions in 15 of the world's top 20 HPC systems including Lawrence Livermore National Labs, Sandia National Labs, NCSA, NASA Goddard, Lawrence Berkeley National Labs, Argonne National Laboratory, US Army Research Lab, NOAA Forecast Systems Lab, NCAR, and DKRZ. Mr. Fellinger was a member of "Distributed Lustre File System Demonstration" team that won the "Both Direction Award" at the fourth annual High Performance Bandwidth Challenge held at SC2003 in Phoenix, Arizona. Mr. Fellinger attended Carnegie-Mellon University (Electrical Engineering and Physics) and holds numerous electrical engineering patents. |
---|