Calcite-[4]-JSON Model

来源:互联网 发布:中国出口美国数据 编辑:程序博客网 时间:2024/06/11 21:19

原文:https://calcite.apache.org/docs/model.html

待续。。。

Calcite models 可以用 JSON files的形式标识,同时也可以用Schema SPI构建

Elements

Root

{ version: '1.0', defaultSchema: 'mongo', schemas: [ Schema... ] }

version (required string) :必须是1.0.

defaultSchema (optional string). 

schemas (optional list of Schema elements).

Schema

Occurs within root.schemas.

{ name: 'foodmart', path: ['lib'], cache: true, materializations: [ Materialization... ] }

name (required string):名字.

type (optional string, default map):

  • map for Map Schema
  • custom for Custom Schema
  • jdbc for JDBC Schema

path (optional list): SQL path(分解schema中使用的functions),必须是个列表,元素可以为string或者strings的列表

 path: [ ['usr', 'lib'], 'lib' ]

如上定义两个元素: schema ‘/usr/lib’, schema ‘/lib’. 

materializations (optional list of Materialization) :定义schema中的tables  defines the tables(tables是queries 的materializations )

cache (optional boolean, default true) :告知 Calcite 是否将这个schema产生的metadata (tables, functions and sub-schemas)  缓存

  • If false,  每次Calcite需要metadata是将返回到 schema中获取

  • If true, 第一次读取后,将结果缓存起来备用

备注:这也会导致cache过时问题,可以在schema实现 中override the Schema.contentsHaveChangedSince method,告知Calcite cahe过期问题。schema中明确定义的Tables, functions and sub-schemas不受caching机制的影响,立刻出现在schema中,不会被刷新

Map Schema

和Schema类似

{ name: 'foodmart', type: 'map', tables: [ Table... ], functions: [ Function... ] }

nametypepathcachematerializations 继承自 Schema.

tables (optional list of Table elements) 定义schema中的表

functions (optional list of Function elements) 定义schema中的 functions.

Custom Schema

{ name: 'mongo', type: 'custom', factory: 'org.apache.calcite.adapter.mongodb.MongoSchemaFactory', operand: { host: 'localhost', database: 'test' } }

factory (required string) :schema的 factory class的名字,必须实现 interface org.apache.calcite.schema.SchemaFactory,有一个公共的constructor

operand (optional map) :传给factory的 attributes .

JDBC Schema

{ name: 'foodmart', type: 'jdbc', jdbcDriver: TODO, jdbcUrl: TODO, jdbcUser: TODO, jdbcPassword: TODO, jdbcCatalog: TODO, jdbcSchema: TODO }

jdbcDriver (optional string) :JDBC driver class的名字,若无,则为JDBC DriverManager中的class

jdbcUrl (optional string):JDBC connect字符串,例如: “jdbc:mysql://localhost/foodmart”.

jdbcUser (optional string) :JDBC user name.

jdbcPassword (optional string) : JDBC password.

jdbcCatalog (optional string) : JDBC data source.初始化catalog 的名字

jdbcSchema (optional string):JDBC data source初始化schema的名字

Materialization

在root中, root.schemas.materializations.

{ view: 'V', table: 'T', sql: 'select deptno, count(*) as c, sum(sal) as s from emp group by deptno' }

view (optional string) :view的名字,null表示table是实际存在的表

table (required string): query物化数据的table的名字,如果view不为空,这个table可能不存在,此时Calcite将构建一个内存表

sql (optional string, or list of strings that will be concatenated as a multi-line string) materialization定义的SQL.

Table

在root中, root.schemas.tables.

{ name: 'sales_fact', columns: [ Column... ] }

name (required string) :表的名字

type (optional string, default custom) 类型:

  • custom for Custom Table
  • view for View

columns (list of Column elements, required for some kinds of table, optional for others such as View)

View

和Table相似

{ name: 'female_emps', type: 'view', sql: "select * from emps where gender = 'F'", modifiable: true }

nametypecolumns 继承自 Table.

sql (required string, or list of strings that will be concatenated as a multi-line string) 定义 view 的SQL

path (optional list) : 解析query的SQL path,默认为当前schema

modifiable (optional boolean):view是否modifiable   

modifiable:如果只有SELECT, FROM, WHERE (no JOIN, aggregation or sub-queries) ,同时每一列

  • is specified once in the SELECT clause; or
  • occurs in the WHERE clause with a column = literal predicate; or
  • is nullable.

The second clause allows Calcite to automatically provide the correct value for hidden columns. It is useful in multi-tenant environments, where the tenantId column is hidden, mandatory (NOT NULL), and has a constant value for a particular view.

Errors regarding modifiable views:

  • If a view is marked modifiable: true and is not modifiable, Calcite throws an error while reading the schema.
  • If you submit an INSERT, UPDATE or UPSERT command to a non-modifiable view, Calcite throws an error when validating the statement.
  • If a DML statement creates a row that would not appear in the view (for example, a row in female_emps, above, with gender = 'M'), Calcite throws an error when executing the statement.

Custom Table

和Table相似

{ name: 'female_emps', type: 'custom', factory: 'TODO', operand: { todo: 'TODO' } }

nametypecolumns 继承自 Table.

factory (required string): factory class,必须实现interface org.apache.calcite.schema.TableFactory,有一个public default constructor

operand (optional map) :传给factory的attributes

Stream

表示table是否允许streaming

root.schemas.tables.stream.

{ stream: true, history: false }

stream (optional; default true) :是否允许streaming

history (optional; default false):是否允许流历史.

Column

Occurs within root.schemas.tables.columns.

{ name: 'empno' }

name (required string):column 名字

Function

Occurs within root.schemas.functions.

{ name: 'MY_PLUS', className: 'com.example.functions.MyPlusFunction', methodName: 'apply', path: [] }

name (required string) :function名字

className (required string) :实现function的类.

methodName (optional string) :实现function.的方法名。

If methodName is specified, the method must exist (case-sensitive) and Calcite will create a scalar function. The method may be static or non-static, but if non-static, the class must have a public constructor with no parameters.

If methodName is “*”, Calcite creates a function for every method in the class.

If methodName is not specified, Calcite looks for a method called “eval”, and if found, creates a a table macro or scalar function. It also looks for methods “init”, “add”, “merge”, “result”, and if found, creates an aggregate function.

path (optional list of string) :function的path

Lattice

Occurs within root.schemas.lattices.

  name: 'star',  sql: [    'select 1 from "foodmart"."sales_fact_1997" as "s"',    'join "foodmart"."product" as "p" using ("product_id")',    'join "foodmart"."time_by_day" as "t" using ("time_id")',    'join "foodmart"."product_class" as "pc" on "p"."product_class_id" = "pc"."product_class_id"'  ],  auto: false,  algorithm: true,  algorithmMaxMillis: 10000,  rowCountEstimate: 86837,  defaultMeasures: [ {    agg: 'count'  } ],  tiles: [ {    dimensions: [ 'the_year', ['t', 'quarter'] ],    measures: [ {      agg: 'sum',      args: 'unit_sales'    }, {      agg: 'sum',      args: 'store_sales'    }, {      agg: 'count'    } ]  } ]}


name (required string) :lattice.名字

sql (required string, or list of strings that will be concatenated as a multi-line string):定义lattice的 fact table, dimension tables, and join paths

auto (optional boolean, default true) :是否materialize

algorithm (optional boolean, default false):是否使用优化算法初始化set of tiles

algorithmMaxMillis (optional long, default -1, meaning no limit) :运行算法的最大时间,到时后,使用此时得到的算法

rowCountEstimate (optional double, default 1000.0) :估计lattice中行数

tiles (optional list of Tile elements) :物化聚合list

defaultMeasures (optional list of Measure elements): is a list of measures that a tile should have by default. Any tile defined in tiles can still define its own measures, including measures not on this list. If not specified, the default list of measures is just ‘count(*)’:

[ { name: 'count' } ]

statisticProvider (optional name of a class that implementsorg.apache.calcite.materialize.LatticeStatisticProvider) provides estimates of the number of distinct values in each column.

You can use a class name, or a class plus a static field. Example:

 "statisticProvider": "org.apache.calcite.materialize.Lattices#CACHING_SQL_STATISTIC_PROVIDER"

If not set, Calcite will generate and execute a SQL query to find the real value, and cache the results.

See also: Lattices.

Tile

Occurs within root.schemas.lattices.tiles.

{ dimensions: [ 'the_year', ['t', 'quarter'] ], measures: [ { agg: 'sum', args: 'unit_sales' }, { agg: 'sum', args: 'store_sales' }, { agg: 'count' } ] }

dimensions (list of strings or string lists, required, but may be empty) defines the dimensionality of this tile. Each dimension is a column from the lattice, like a GROUP BY clause. Each element can be either a string (the unique label of the column within the lattice) or a string list (a pair consisting of a table alias and a column name).

measures (optional list of Measure elements) is a list of aggregate functions applied to arguments. If not specified, uses the lattice’s default measure list.

Measure

Occurs within root.schemas.lattices.defaultMeasures and root.schemas.lattices.tiles.measures.

{ agg: 'sum', args: [ 'unit_sales' ] }

agg is the name of an aggregate function (usually ‘count’, ‘sum’, ‘min’, ‘max’).

args (optional) is a column label (string), or list of zero or more column labels

Valid values are:

  • Not specified: no arguments
  • null: no arguments
  • Empty list: no arguments
  • String: single argument, the name of a lattice column
  • List: multiple arguments, each a column label

Unlike lattice dimensions, measures can not be specified in qualified format, {@code [“table”, “column”]}. When you define a lattice, make sure that each column you intend to use as a measure has a unique label within the lattice (using “{@code AS label}” if necessary), and use that label when you want to pass the column as a measure argument


原创粉丝点击