Array DB 摘记

来源：互联网发布：约瑟夫环java实现编辑：程序博客网时间：2024/06/05 15:56

1. Array DB needs to support traditional DBMS operations, along with powerful user-defined operations, including linear algebra (presented in BLAS or LApack).

BLAS:

The BLAS (Basic Linear Algebra Subprograms)are routines that provide standard building blocks for performing basic vector and matrix operations. The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations,and the Level 3 BLAS perform matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software, LAPACK for example.

2. Users typically would like to use an interface closer to theapplication than the data management system.

3. Use append-style update rather than overwrite-style update,so provenance is also required to keep trace of value derivation/correction (i.e., time travel)

4. Most science data is uncertain andcomes with “error bars”. A built-in support foruncertainty is preferred.

5. To performneighborhood-related operations such as KNN, NL-means, overlappingpartitioning is required.

6. Accept multipleformats to support on-demand loading and in-situquerying. Avoid the prohibitively expensive overhead caused by reloadingoriginal data into a specific format which is exclusively used by a certainsystem. “One-shot” queries should be performed directly on the raw data, “in-situ”.

7. Needs ofmultiple kinds of null, meaning different things (datanot collected, data unknown, etc.).

An overwhelming amount of raw science data is naturally specified in arrays. Simulating arrays on top of a traditional tabular DBMS is likely to give up between one and two orders of magnitude in performance. At scale, this is likely to be a big problem. Also, essentially all complex aggregation (correlations, curve fitting, data mining, clustering)is defined in terms of arrays and not tables. As such, a native array-based system is likely to have a huge advantage and allow the integration of the cooking process into the DBMS.

Traditional DBMS is more feasiblefor managing metadata (documentation). The structure ofmetadata is frequently more complex than that of data and conforms better tothe model of business data (relatively few types of data, standard reports areuseful). Most data is located based on searching metadata rather than the dataitself so the query capabilities of a DBMS are useful. Similarly, metadata ischanged more often than data, so the updating capabilities of a DBMS are moreuseful for metada.

Coreoperations:

1. array constructor ---- build array & initialize fromcell expression

2. Condenser ---- summarize over array, delivering a scalar (usingsome commutative & associative summarization op)

3. Sorter ---- slice array along a dimension, sort slices