Comparison Between NetCDF and HDF5
来源:互联网 发布:nob2b 邮件群发软件 编辑:程序博客网 时间:2024/06/06 00:29
The netCDF file chooseslinear data layout, in which the data arrays areeither stored in contiguous space and in a predefined order or interleaved in aregular pattern. This regular and highly predictable data layout enables the PnetCDFdata I/O implementation to simply pass the data buffer, metadata (file view,MPI Datatype, etc.), and other optimization information to MPI-IO, and allparallel I/O operations are carried out in the same manner as when MPIIO aloneis used. Thus, there is very little overhead, and the PnetCDF performanceshould be nearly the same as MPIIO if only raw data I/O performance iscompared.
On the other hand, parallelHDF5 uses atree-like file structure that issimilar to the UNIX file system:the data is irregularlylaid out using super block, header blocks, data blocks, extended header blocks,and extended data blocks.This is a very flexiblesystem and might have advantages for some applications and access patterns.However, this irregular layout pattern can make it difficult to pass useraccess patterns directly to MPI-IO, especially for variable-sized arrays.
Instead, parallelHDF5 uses dataspace and hyperslabs to define the data organization, map andtransfer data between memory space and the file space, and does buffer packing/unpackingin a recursive way. MPI-IOis used under this, but this additional overhead can result in significant performanceloss.
Second, the PnetCDF implementationmanages to keep the overhead involved in header I/O as low as possible. In the netCDF file, only one header contains all necessaryinformation for direct access of each data array, and each array is associatedwith a predefined, numerical ID that can be efficiently inquired when it isneeded to access the array.By maintaining a local copy of the header oneach process, our implementation saves a lot of interprocess synchronization aswell as avoids repeated access of the file header each time the headerinformation is needed to access a single array. All header information can beaccessed directly in local memory and interprocess synchronization is needed onlyduring the definition of the dataset. Once the definition of the dataset iscreated, each array can be identified by its permanent ID and accessed at anytime by any process, without any collective open/close operation.
On the other hand, in HDF5 the header metadata is dispersed in separate headerblocks for each object, and, in order to operate on an object, it has toiterate through the entire namespace to get the header information of thatobject before accessing it.This kind of access method may be inefficientfor parallel access, particularly because parallel HDF5 defines the open/closeof each object to be a collective operation, which forces all participatingprocesses to communicate when accessing a single object, not to mention thecost of file access to locate and fetch the header information of that object.Further, HDF5 metadata is updatedduring data writes in some cases. Thusadditional synchronization is necessary at write time in order to maintain synchronizedviews of file metadata.
However, PnetCDF also has limitations. Unlike HDF5, netCDF does notsupport hierarchical group based organization of data objects. Since it laysout the data in a linear order, adding a fixed-sized array or extending thefile header may be very costly once the file is created and has existing datastored, though moving the existing data to the extended area is performed inparallel. Also, PnetCDF does not provide functionality to combine two or morefiles in memory through software mounting, as HDF5 does. Nor does netCDFsupport data compression within its file format (although compressed writesmust be serialized in HDF5, limiting their usefulness). Fortunately, thesefeatures can all be achieved by external software such as netCDF Operators [8],with some sacrifice of manageability of the files.
- Comparison Between NetCDF and HDF5
- comparison between rand() and arc4random()
- A Technical Comparison between PDF and DJVU
- Comparison between YAFFS (YAFFS2) and JFFS2
- Comparison Modeling between Excel and PowerPivot
- CSharp - Comparison between IComparer and IComparable
- iOS:“warning:comparison between pointer and integer”
- part 8: comparison between OOL and FL
- Ordered comparison between pointer and zero ('const char *' and 'int')
- Comparison between Apache Axis2 and Apache cxf on stack flow
- C warning“comparison between pointer and integer [enabled by default]”
- Comparison between the simplex algorithm and interior point method
- A Comparison between Focus-Group and In-Depth Interview
- [Error] ISO C++ forbids comparison between pointer and integer [-fpermiss
- [dynamic FL]part 6:comparison between Dynamic and Static
- GDAL+HDF4+HDF5+netCDF库编译C#
- A comparison between lcc-win32, gcc, and Intel's icl compiler
- A High Level Comparison Between Oracle and SQL Server - Part II
- bzoj 2741 分块 + 可持久化trie
- 图像有用区域(广搜)
- javascript 闭包浅析
- SSH通信配置-更新
- main.html 梳理2_git使用1
- Comparison Between NetCDF and HDF5
- 使用c99编译出warning: implicit declaration of function ‘popen’
- android中级自测题(二)
- JS的正则表达式 .
- mysql安装图解-mysql图文安装教程
- grep命令详解
- POJ 1265 Area
- VirtualBox安装扩展包失败解决办法
- Codeforces Round #209 (Div. 2)-C. Prime Number(set)和D. Pair of Numbers(线段树)