Greenplum merge insert 用法与性能 (insert on conflict)

来源：互联网发布：单片机模拟键盘输入编辑：程序博客网时间：2024/06/10 12:15

背景

PostgreSQL insert on conflict语法非常强大，支持合并写入（当违反某唯一约束时，冲突则更新，不冲突则写入），同时支持流式计算。

流计算例子链接：

《PostgreSQL 流式统计 - insert on conflict 实现流式 UV(distinct), min, max, avg, sum, count ...》

《HTAP数据库 PostgreSQL 场景与性能测试之 22 - (OLTP) merge insert|upsert|insert on conflict|合并写入》

《PostgreSQL upsert功能(insert on conflict do)的用法》

《PostgreSQL 10.0 preview 功能增强 - 支持分区表ON CONFLICT .. DO NOTHING》

PostgreSQL insert on conflict语法如下：

Command:     INSERT    Description: create new rows in a table    Syntax:    [ WITH [ RECURSIVE ] with_query [, ...] ]    INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ]        [ OVERRIDING { SYSTEM | USER} VALUE ]        { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }        [ ON CONFLICT [ conflict_target ] conflict_action ]        [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]        where conflict_target can be one of:            ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]        ON CONSTRAINT constraint_name        and conflict_action is one of:            DO NOTHING        DO UPDATE SET { column_name = { expression | DEFAULT } |                        ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |                        ( column_name [, ...] ) = ( sub-SELECT )                      } [, ...]                  [ WHERE condition ]

Greenplum的版本较低，还不支持insert on conflict的语法。

如果需要在Greenplum中实现类似的功能该如何操作？

DEMO

ID为PK，以它为合并列，举例。

1、目标表，也就是需要合并写入的目标：

create table t( id int primary key, c1 int , c2 int, c3 int, c4 int, c5 int, crt_time timestamp);

2、中间表，也就是用户只管插入的表：

create table t_tmp(like t);

写入一些中间记录。

insert into t_tmp values(1,1,2,3,null,null,now());    insert into t_tmp values(1,1,2,4,null,null,now());    insert into t_tmp values(1,1,2,3,null,7,now());    insert into t_tmp values(1,1,null,3,5,6,now());

postgres=# select * from t_tmp;     id | c1 | c2 | c3 | c4 | c5 |          crt_time              ----+----+----+----+----+----+----------------------------      1 |  1 |  2 |  3 |    |    | 2017-12-13 17:03:16.28482      1 |  1 |  2 |  4 |    |    | 2017-12-13 17:03:16.286302      1 |  1 |  2 |  3 |    |  7 | 2017-12-13 17:03:16.635121      1 |  1 |    |  3 |  5 |  6 | 2017-12-13 17:03:25.434191    (4 rows)

3、窗口合并，按唯一值约束，仅提取一条（可能存在窗口内合并的需求，例如按时间取最新，比如以最后一条为准，又或者以有值，且最新的为准）。

我有几张阿里云幸运券分享给你，用券购买或者升级阿里云相应产品会有特惠惊喜哦！把想要买的产品的幸运券都领走吧！快下手，马上就要抢光了。

以有值切最新为准例子：

select distinct on (id)       id,      first_value(c1) over (partition by id order by (case when c1 is null then null else crt_time end) desc nulls last) as c1,      first_value(c2) over (partition by id order by (case when c2 is null then null else crt_time end) desc nulls last) as c2,      first_value(c3) over (partition by id order by (case when c3 is null then null else crt_time end) desc nulls last) as c3,      first_value(c4) over (partition by id order by (case when c4 is null then null else crt_time end) desc nulls last) as c4,      first_value(c5) over (partition by id order by (case when c5 is null then null else crt_time end) desc nulls last) as c5,      first_value(crt_time) over (partition by id order by crt_time desc) as crt_time      from t_tmp ;         id | c1 | c2 | c3 | c4 | c5 |          crt_time              ----+----+----+----+----+----+----------------------------      1 |  1 |  2 |  3 |  5 |  6 | 2017-12-13 17:03:25.434191    (1 row)

存储中间结果：

create table t_tmp1 (like t) ;        insert into t_tmp1     select distinct on (id)       id,      first_value(c1) over (partition by id order by (case when c1 is null then null else crt_time end) desc nulls last) as c1,      first_value(c2) over (partition by id order by (case when c2 is null then null else crt_time end) desc nulls last) as c2,      first_value(c3) over (partition by id order by (case when c3 is null then null else crt_time end) desc nulls last) as c3,      first_value(c4) over (partition by id order by (case when c4 is null then null else crt_time end) desc nulls last) as c4,      first_value(c5) over (partition by id order by (case when c5 is null then null else crt_time end) desc nulls last) as c5,      first_value(crt_time) over (partition by id order by crt_time desc) as crt_time      from t_tmp ;

4、合并写入：

将窗口提取的结果，合并写入目标表。

4.1、INNER JOIN，覆盖旧记录，同时补齐旧的字段（以NULL为判断条件。如果新的记录没有值，则取旧记录的值。）提取。

create table t_tmp2 (like t);        insert into t_tmp2    select     t_tmp.id,     coalesce(t_tmp.c1, t.c1),     coalesce(t_tmp.c2, t.c2),     coalesce(t_tmp.c3, t.c3),     coalesce(t_tmp.c4, t.c4),     coalesce(t_tmp.c5, t.c5),     coalesce(t_tmp.crt_time, t.crt_time)     from    t_tmp1 as t_tmp    inner join     t    using (id);

4.2、DELETE USING，删除全量表的符合条件的记录。

delete from t using t_tmp2 where t.id=t_tmp2.id;

4.3、INSERT

insert into t    select t_tmp1.* from t_tmp1 left join t_tmp2 using (id) where t_tmp2.* is null    union all    select * from t_tmp2;

原文链接

阅读全文

0 0

Greenplum merge insert 用法与性能 (insert on conflict)

标签

背景

DEMO