维度模型数据仓库(十) —— 快照

来源:互联网 发布:设计餐厅菜单软件 编辑:程序博客网 时间:2024/06/06 00:05
(五)进阶技术
        5. 快照
        前面实验说明了处理维度的扩展。本篇讨论两种事实表的扩展技术。
        有些用户,尤其是管理者,经常会要看某个特定时间点的数据。也就是说,他们需要数据的快照。周期快照和累积快照是两种处理事实表扩展的技术。
        周期快照是在一个给定的时间对事实表进行一段时期的总计。例如,一个月销售订单周期快照是每个月底时总的销售订单金额。
        累积快照用于跟踪事实表的变化。例如,数据仓库可能需要累积(存储)销售订单从下订单的时间开始,到订单中的商品被出库、运输和到达的各阶段的时间点数据来跟踪订单生命周期的进展情况。用户可能要取得在某个给定时间点,销售订单处理状态的累积快照。
        下面说明周期快照和累积快照的细节问题。

        周期快照
        本节以销售订单的月底汇总为例说明如何实现一个周期快照。
        首先需要添加一个新的事实表。图(五)- 5-1中的模式显示了一个名为month_end_sales_order_fact的新事实表。该表中有两个度量值,month_order_amount和month_order_quantity,这两个值是不能加到sales_order_fact表中的。不能加到sales_order_fact表中的原因是,sales_order_fact表和新的度量值有不同的时间属性(数据的粒度不同)。sales_order_fact表包含的是每天一条记录。新的度量值要的是每月的数据。使用清单(五)- 5-1里的脚本建立month_end_sales_order_fact表

图(五)- 5-1
USE dw; CREATE TABLE month_end_sales_order_fact (    order_month_sk INT,    product_sk INT,    month_order_amount DEC(10 , 2 ),    month_order_quantity INT,    foreign key (order_month_sk)        references month_dim (month_sk)        on delete cascade on update cascade,    foreign key (product_sk)        references product_dim (product_sk)        on delete cascade on update cascade); 
清单(五)- 5-1

        建立了month_end_sales_order_fact表后,现在需要向表中装载数据。月底销售订单事实表的数据源是已有的销售订单事实表。清单(五)- 5-2里的脚本装载月底销售订单事实表。每个月第一天,在每天销售订单定期装载执行完后,执行此脚本,装载上个月的销售订单数据。
USE dw;SET @pre_date = SUBDATE(CURRENT_DATE,1) ;INSERT INTO month_end_sales_order_factSELECT  b.month_sk, a.product_sk, SUM(order_amount), SUM(order_quantity)FROM  sales_order_fact a, month_dim b, order_date_dim dWHERE  a.order_date_sk = d.order_date_skAND b.month = d.monthAND b.year = d.yearAND b.month = MONTH(@pre_date)AND b.year = YEAR(@pre_date)GROUP BY b.month_sk, a.product_sk ;COMMIT ;
清单(五)- 5-2

        使用Kettle转换装载月底销售订单事实表的步骤如图(五)- 5-2到(五)- 5-5所示。
图(五)- 5-2

图(五)- 5-3

图(五)- 5-4

图(五)- 5-5

        执行清单(五)- 5-2里的脚本或相应的Kettle转换之前,设置系统日期为2015年3月1日(装载2015年2月最后一天的数据库),因为该脚本只能在每月1日执行。
        执行完清单(五)- 5-2里的脚本或相应的Kettle转换后,使用下面的命令查询month_end_sales_order_fact表,表中只有两条记录,都来自于2015年2月的销售订单,显示如下:
mysql> select * from month_end_sales_order_fact;
+----------------+------------+--------------------+----------------------+
| order_month_sk | product_sk | month_order_amount | month_order_quantity |
+----------------+------------+--------------------+----------------------+
|            182 |          2 |            4000.00 |                 NULL |
|            182 |          3 |            4000.00 |                 NULL |
+----------------+------------+--------------------+----------------------+
2 rows in set (0.00 sec)

mysql> select * from month_dim where month_sk = 182;
+----------+------------+-------+---------+------+----------------+-------------+
| month_sk | month_name | month | quarter | year | effective_date | expiry_date |
+----------+------------+-------+---------+------+----------------+-------------+
|      182 | February   |     2 |       1 | 2015 | 0000-00-00     | 9999-12-31  |
+----------+------------+-------+---------+------+----------------+-------------+
1 row in set (0.00 sec)
        
        累积快照
        本节说明如何在销售订单上实现累积快照。
        该累加快照跟踪五个销售订单的里程碑:下订单、分配库房、出库、配送和收货。这五个里程碑的日期及其各自的数量来自源数据库的销售订单表。一个订单完整的生命周期由五行描述:下订单的时间一行,订单商品被分配到库房的时间一行,产品出库的时间一行,订单配送的时间一行,订单客户收货的时间一行。每个里程碑各有一个状态:N为新订单,A为已分配库房,P为已出库,S为已配送,R为已收货。sales_order表的结构必须做相应的改变,以处理五种不同的状态。执行清单(五)- 5-3里的脚本修改数据库模式。对源数据库的修改如下:把order_date列改为status_date,添加了名为order_status的列,并把order_quantity列改为quantity。正如名字所表示的,order_status列用于存储N,A,P,S或R之一。它描述了status_date列对应的状态值。如果一条记录的状态为N,则status_date列是下订单的日期。如果状态是R,status_date列是收货日期。对数据仓库的修改如下:给现有的sales_order_fact表添加四个数量和四个日期代理键,要加的新列是allocate_date_sk、allocate_quantity、packing_date_sk、packing_quantity、ship_date_sk、ship_quantity、receive_date_sk、receive_quantity。还要在日期维度上使用数据库视图角色扮演生成四个新的日期代理键。
USE source;-- 修改销售订单事务表ALTER TABLE sales_order  CHANGE order_date status_date DATE, ADD order_status VARCHAR(1) AFTER status_date, CHANGE order_quantity quantity INT;-- 删除sales_order表的主键alter table sales_order change order_number order_number int(10) not null;alter table sales_order drop primary key;USE dw;-- 事实表增加八列ALTER TABLE sales_order_fact  ADD allocate_date_sk INT AFTER order_date_sk, ADD allocate_quantity INT, ADD packing_date_sk INT AFTER allocate_date_sk, ADD packing_quantity INT, ADD ship_date_sk INT AFTER packing_date_sk, ADD ship_quantity INT, ADD receive_date_sk INT AFTER ship_date_sk, ADD receive_quantity INT;-- 建立四个日期维度视图CREATE VIEW allocate_date_dim (allocate_date_sk , allocate_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date) AS    SELECT         date_sk,        date,        month_name,        month,        quarter,        year,        promo_ind,        effective_date,        expiry_date    FROM        date_dim;CREATE VIEW packing_date_dim (packing_date_sk , packing_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date) AS    SELECT         date_sk,        date,        month_name,        month,        quarter,        year,        promo_ind,        effective_date,        expiry_date    FROM        date_dim;CREATE VIEW ship_date_dim (ship_date_sk , ship_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date) AS    SELECT         date_sk,        date,        month_name,        month,        quarter,        year,        promo_ind,        effective_date,        expiry_date    FROM        date_dim;CREATE VIEW receive_date_dim (receive_date_sk , receive_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date) AS    SELECT         date_sk,        date,        month_name,        month,        quarter,        year,        promo_ind,        effective_date,        expiry_date    FROM        date_dim;
清单(五)- 5-3

        修改后的数据仓库模式如图(五)- 5-6所示。
图(五)- 5-6

        因为事实表的结构已经改变了,所以需要修改定期装载脚本。清单清单(五)- 5-4里是新的定期装载脚本。处理五个里程碑状态和日期的五条语句加了注释。同一个日期可能有销售订单的多个事务被记录,在这种情况下对应的里程碑日期同时修改。图(五)- 5-7到图(五)- 5-24显示了相应的Kettle转换所修改的步骤(“装载订单维度”、“装载事实表”转换)。
USE dw;-- 设置SCD的截止时间和生效时间SET @pre_date = SUBDATE(CURRENT_DATE,1) ;-- 设置CDC的上限时间UPDATE cdc_time SET current_load = CURRENT_DATE ;-- 装载客户维度TRUNCATE TABLE customer_stg;INSERT INTO customer_stgSELECT   customer_number, customer_name, customer_street_address, customer_zip_code, customer_city, customer_state, shipping_address, shipping_zip_code, shipping_city, shipping_stateFROM source.customer ;/* 在所有地址列上 SCD2                           *//* 置过期                          */UPDATE customer_dim a,    customer_stg b SET     expiry_date = @pre_dateWHERE    a.customer_number = b.customer_number        AND (a.customer_street_address <> b.customer_street_address        OR a.customer_city <> b.customer_city        OR a.customer_zip_code <> b.customer_zip_code        OR a.customer_state <> b.customer_state        OR a.shipping_address <> b.shipping_address        OR a.shipping_city <> b.shipping_city        OR a.shipping_zip_code <> b.shipping_zip_code        OR a.shipping_state <> b.shipping_state        OR a.shipping_address IS NULL        OR a.shipping_city IS NULL        OR a.shipping_zip_code IS NULL        OR a.shipping_state IS NULL)        AND expiry_date = '2200-01-01';/* 加新行                          */INSERT INTO customer_dimSELECT  NULL, b.customer_number, b.customer_name, b.customer_street_address, b.customer_zip_code, b.customer_city, b.customer_state, b.shipping_address, b.shipping_zip_code, b.shipping_city, b.shipping_state, a.version + 1, @pre_date, '2200-01-01'FROM  customer_dim a, customer_stg bWHERE    a.customer_number = b.customer_numberAND ( a.customer_street_address <> b.customer_street_address     OR a.customer_city <> b.customer_city     OR a.customer_zip_code <> b.customer_zip_code     OR a.customer_state <> b.customer_state     OR a.shipping_address <> b.shipping_address     OR a.shipping_city <> b.shipping_city     OR a.shipping_zip_code <> b.shipping_zip_code     OR a.shipping_state <> b.shipping_state     OR a.shipping_address IS NULL     OR a.shipping_city IS NULL     OR a.shipping_zip_code IS NULL     OR a.shipping_state IS NULL)AND EXISTS(SELECT *FROM customer_dim xWHERE    b.customer_number=x.customer_numberAND a.expiry_date = @pre_date )AND NOT EXISTS (SELECT *FROM customer_dim yWHERE    b.customer_number = y.customer_numberAND y.expiry_date = '2200-01-01') ;/* 在 customer_name 列上 SCD1                                             */UPDATE customer_dim a, customer_stg bSET a.customer_name = b.customer_nameWHERE a.customer_number = b.customer_number      AND a.customer_name <> b.customer_name ;/* 新增的客户                                                   */INSERT INTO customer_dimSELECT  NULL, customer_number, customer_name, customer_street_address, customer_zip_code, customer_city, customer_state, shipping_address, shipping_zip_code, shipping_city, shipping_state, 1, @pre_date,'2200-01-01'FROM customer_stgWHERE customer_number NOT IN(SELECT y.customer_numberFROM customer_dim x, customer_stg yWHERE x.customer_number = y.customer_number) ;/* 重建PA客户维度                               */TRUNCATE pa_customer_dim;INSERT INTO pa_customer_dimSELECT  customer_sk, customer_number, customer_name, customer_street_address, customer_zip_code, customer_city, customer_state, shipping_address, shipping_zip_code, shipping_city, shipping_state, version, effective_date, expiry_dateFROM customer_dimWHERE customer_state = 'PA' ;/* 装载产品维度                                           */TRUNCATE TABLE product_stg ;INSERT INTO product_stgSELECT   product_code, product_name, product_categoryFROM source.product ;/* 在 product_name 和 product_category 列上 SCD2                                    *//* 置过期                                 */UPDATE  product_dim a, product_stg bSET  expiry_date = @pre_dateWHERE    a.product_code = b.product_codeAND (   a.product_name <> b.product_name     OR a.product_category <> b.product_category)AND expiry_date = '2200-01-01';/* 加新行                                */INSERT INTO product_dimSELECT  NULL, b.product_code, b.product_name, b.product_category, a.version + 1, @pre_date,'2200-01-01'FROM  product_dim a, product_stg bWHERE    a.product_code = b.product_codeAND (   a.product_name <> b.product_name     OR a.product_category <> b.product_category)AND EXISTS(SELECT *FROM product_dim xWHERE     b.product_code = x.product_code      AND a.expiry_date = @pre_date)AND NOT EXISTS (SELECT *FROM product_dim yWHERE     b.product_code = y.product_code      AND y.expiry_date = '2200-01-01') ;/* 新增的产品                                                    */INSERT INTO product_dimSELECT  NULL, product_code, product_name, product_category, 1, @pre_date, '2200-01-01'FROM product_stgWHERE product_code NOT IN(SELECT y.product_codeFROM product_dim x, product_stg yWHERE x.product_code = y.product_code) ;-- 装载订单维度,新增前一天的订单号INSERT INTO order_dim (  order_number, effective_date, expiry_date)SELECT  order_number, status_date, '2200-01-01'FROM source.sales_order, cdc_timeWHERE order_status = 'N' AND entry_date >= last_load AND entry_date < current_load ;-- 装载事实表,新增前一天的订单INSERT INTO sales_order_factSELECT  order_sk, customer_sk, product_sk, e.order_date_sk, NULL, NULL, NULL, NULL, f.request_delivery_date_sk, order_amount, quantity, NULL, NULL, NULL, NULLFROM  source.sales_order a, order_dim b, customer_dim c, product_dim d, order_date_dim e, request_delivery_date_dim f, cdc_time gWHEREa.order_status = 'N'AND a.order_number = b.order_numberAND a.customer_number = c.customer_numberAND a.status_date >= c.effective_dateAND a.status_date < c.expiry_dateAND a.product_code = d.product_codeAND a.status_date >= d.effective_dateAND a.status_date < d.expiry_dateAND a.status_date = e.order_dateAND a.request_delivery_date = f.request_delivery_dateAND a.entry_date >= g.last_load AND a.entry_date < g.current_load ;/* UPDATING the new sales order to Allocated status           */UPDATE sales_order_fact a,    source.sales_order b,    allocate_date_dim c,    order_dim g, cdc_time hSET     a.allocate_date_sk = c.allocate_date_sk,    a.allocate_quantity = b.quantityWHERE    order_status = 'A'        AND b.entry_date >= h.last_load AND b.entry_date < h.current_load        AND b.order_number = g.order_number        AND a.order_sk = g.order_sk        AND c.allocate_date = b.status_date ;/* UPDATING the allocated order to Packed status              */UPDATE sales_order_fact a,    source.sales_order b,    packing_date_dim d,    order_dim g, cdc_time hSET     a.packing_date_sk = d.packing_date_sk,    a.packing_quantity = b.quantityWHERE    order_status = 'P'        AND b.entry_date >= h.last_load AND b.entry_date < h.current_load        AND b.order_number = g.order_number        AND a.order_sk = g.order_sk        AND d.packing_date = b.status_date ;/* UPDATING the packed order to Shipped status                */UPDATE sales_order_fact a,    source.sales_order b,    ship_date_dim e,    order_dim g, cdc_time hSET     a.ship_date_sk = e.ship_date_sk,    a.ship_quantity = b.quantityWHERE    order_status = 'S'        AND b.entry_date >= h.last_load AND b.entry_date < h.current_load        AND b.order_number = g.order_number        AND a.order_sk = g.order_sk        AND e.ship_date = b.status_date ;/* UPDATING the shipped order to Received status              */UPDATE sales_order_fact a,    source.sales_order b,    receive_date_dim f,    order_dim g, cdc_time hSET     a.receive_date_sk = f.receive_date_sk,    a.receive_quantity = b.quantityWHERE    order_status = 'R'        AND b.entry_date >= h.last_load AND b.entry_date < h.current_load        AND b.order_number = g.order_number        AND a.order_sk = g.order_sk        AND f.receive_date = b.status_date ;-- 更新时间戳表的last_load字段UPDATE cdc_time SET last_load = current_load ;COMMIT ;
清单(五)- 5-4

图(五)- 5-7

图(五)- 5-8

图(五)- 5-9

图(五)- 5-10

图(五)- 5-11

图(五)- 5-12

图(五)- 5-13

图(五)- 5-14

图(五)- 5-15

图(五)- 5-16

图(五)- 5-17

图(五)- 5-18

图(五)- 5-19

图(五)- 5-20

图(五)- 5-21

图(五)- 5-22

图(五)- 5-23

图(五)- 5-24

        测试步骤
        在执行清单(五)- 5-3里的定期装载脚本或相应的Kettle作业之前,需要准备一些数据。本示例用六步来跟踪两个销售订单的生命周期:
        1. 使用下面的命令新增两个销售订单
USE source;
INSERT INTO sales_order VALUES
  (50, 1, 1, '2015-03-05', 'N', '2015-03-10', '2015-03-05', 7500,
       75)
, (51, 2, 2, '2015-03-05', 'N', '2015-03-10', '2015-03-05', 1000,
       10) ;
COMMIT ;

        2. 设置系统日期为2015年3月6日,执行定期装载脚本或Kettle作业并确认数据正确装载。可以使用下面的语句查询sales_order_fact表里的两个销售订单,确认定期装载成功。
mysql> select
    ->     order_number,
    ->     a.order_date_sk,
    ->     allocate_date_sk,
    ->     packing_date_sk,
    ->     ship_date_sk,
    ->     receive_date_sk
    -> from
    ->     sales_order_fact a,
    ->     order_dim b,
    ->     order_date_dim c
    -> where
    ->     order_number IN (50 , 51)
    ->         and a.order_sk = b.order_sk
    ->         and a.order_date_sk = c.order_date_sk;
+--------------+---------------+------------------+-----------------+--------------+-----------------+
| order_number | order_date_sk | allocate_date_sk | packing_date_sk | ship_date_sk | receive_date_sk |
+--------------+---------------+------------------+-----------------+--------------+-----------------+
|           50 |          5543 |             NULL |            NULL |         NULL |            NULL |
|           51 |          5543 |             NULL |            NULL |         NULL |            NULL |
+--------------+---------------+------------------+-----------------+--------------+-----------------+
2 rows in set (0.00 sec)
注意 只有order_date_sk列有值,其它日期都是NULL,因为这两个订单是新增的,并且还没有分配库房、出库、配送或收货。

        3. 使用下面的命令添加销售订单作为这两个订单的分配库房和/或出库的里程碑
USE source;
INSERT INTO sales_order VALUES
  (50, 1, 1, '2015-03-06', 'A', '2015-03-10', '2015-03-06', 7500,
       75)
, (50, 1, 1, '2015-03-06', 'P', '2015-03-10', '2015-03-06', 7500,
       75)
, (51, 2, 2, '2015-03-06', 'A', '2015-03-10', '2015-03-06', 1000,
       10) ;
COMMIT ;

        4. 设置系统日期为2015年3月7日,执行定期装载脚本或Kettle作业并确认数据正确装载。使用下面的SQL语句查询sales_order_fact表里的两个销售订单,确认正确装载了数据。
mysql>  select
    ->     order_number,
    ->     a.order_date_sk,
    ->     allocate_date_sk,
    ->     packing_date_sk,
    ->     ship_date_sk,
    ->     receive_date_sk
    -> from
    ->     sales_order_fact a,
    ->     order_dim b,
    ->     order_date_dim c
    -> where
    ->     order_number IN (50 , 51)
    ->         and a.order_sk = b.order_sk
    ->         and a.order_date_sk = c.order_date_sk;
+--------------+---------------+------------------+-----------------+--------------+-----------------+
| order_number | order_date_sk | allocate_date_sk | packing_date_sk | ship_date_sk | receive_date_sk |
+--------------+---------------+------------------+-----------------+--------------+-----------------+
|           50 |          5543 |             5544 |            5544 |         NULL |            NULL |
|           51 |          5543 |             5544 |            NULL |         NULL |            NULL |
+--------------+---------------+------------------+-----------------+--------------+-----------------+
2 rows in set (0.00 sec)
注意 第一个订单具有了allocate_date_sk和packing_date_sk,第二个只具有allocate_date_sk。

        5. 使用下面的命令添加销售订单作为这两个订单后面的里程碑:出库、配送和/或收货。注意四个日期可能相同。
USE source;
INSERT INTO sales_order VALUES
  (50, 1, 1, '2015-03-07', 'S', '2015-03-10', '2015-03-07', 7500,
       75)
, (50, 1, 1, '2015-03-07', 'R', '2015-03-10', '2015-03-07', 7500,
       75)
, (51, 2, 2, '2015-03-07', 'P', '2015-03-10', '2015-03-07', 1000,
       10) ;
COMMIT ;

        6. 设置系统日期为2015年3月8日,执行定期装载脚本或Kettle作业并确认数据正确装载。使用下面的SQL语句查询sales_order_fact表里的两个销售订单,确认正确装载了数据。
mysql> select
    ->     order_number,
    ->     a.order_date_sk,
    ->     allocate_date_sk,
    ->     packing_date_sk,
    ->     ship_date_sk,
    ->     receive_date_sk
    -> from
    ->     sales_order_fact a,
    ->     order_dim b,
    ->     order_date_dim c
    -> where
    ->     order_number IN (50 , 51)
    ->         and a.order_sk = b.order_sk
    ->         and a.order_date_sk = c.order_date_sk;
+--------------+---------------+------------------+-----------------+--------------+-----------------+
| order_number | order_date_sk | allocate_date_sk | packing_date_sk | ship_date_sk | receive_date_sk |
+--------------+---------------+------------------+-----------------+--------------+-----------------+
|           50 |          5543 |             5544 |            5544 |         5545 |            5545 |
|           51 |          5543 |             5544 |            5545 |         NULL |            NULL |
+--------------+---------------+------------------+-----------------+--------------+-----------------+
2 rows in set (0.00 sec)
注意 第一个订单号为50的订单,具有了全部日期代理键,这意味着订单已完成(客户已经收货)。第二个订单已经出库,但是还没有配送。
1 0
原创粉丝点击