笔记:Oracle SQL 高级查询简介 (2) 分析函数

来源:互联网 发布:linux setfacl 编辑:程序博客网 时间:2024/04/30 08:56

1、评级函数

(1). 排序rank()、dense_rank()

遇到重复的,rank()下一个加2,dense_rank() 下一个加1  

select s.prod_id, sum(s.amount_sold),rank() over (order by sum(s.amount_sold) desc) as rank,dense_rank() over (order by sum(s.amount_sold) desc) as dense_rankfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 

结果:

   PROD_ID SUM(S.AMOUNT_SOLD)       RANK DENSE_RANK---------- ------------------ ---------- ----------        13          936197.53          7          7        14         2733887.43          2          2        15         1368317.88          5          5        16              11.99         60         60        17         2239127.88          3          3        18         5477218.04          1          1        19          182670.35         20         20        20          990525.95          6          6        21         1535187.44          4          4        22           31853.11         54         54        23           85211.28         36         36        24          163929.27         22         22        25          522713.71         13         13        26          567533.83         12         12        27          107968.24         30         30        28          644480.02          9          9        29          578374.62         11         11        30            59391.8         48         48        31           64464.83         45         45        32           124081.8         26         26


还可以通过 NULLS  LAST 或 NULLS  FIRST  控制null放在首位或末位

select s.prod_id, sum(s.amount_sold),rank() over (order by sum(s.amount_sold) desc nulls last) as rank,dense_rank() over (order by sum(s.amount_sold) desc nulls last) as dense_rankfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 


与PARTITION BY 子句结合使用 

select s.prod_id, s.month_id, sum(s.amount_sold),rank() over (partition by s.month_id order by sum(s.amount_sold) desc) as rank,dense_rank() over (partition by s.month_id order by sum(s.amount_sold) desc) as dense_rankfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_id, month_idorder by s.prod_id, month_id; 

结果:

   PROD_IDMONTH_IDSUM(S.AMOUNT_SOLD)RANKDENSE_RANK11301125575.646621302122325.21553130361649.5664130417404.2616165130561649.5        666130620004        161671307164719.384481308125010.335591310112205.2766101312125654.4455111401239773.2433121402278879.9722

rank()、dense_rank() 等同样可以与rollup、cube、grouping sets 等函数合用


(2). CUME_DIST 和 PERCENT_RANK 函数

cume_dist 计算某个值相对于一组值中的位置,即 cumulative distribution (累积分布)。

percent_rank 某个值相对于一组值的百分比

select s.prod_id, sum(s.amount_sold),cume_dist() over (  order by sum(s.amount_sold) desc) as cume_dist,percent_rank() over (  order by sum(s.amount_sold) desc) as percent_rankfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 


结果:
   PROD_IDSUM(S.AMOUNT_SOLD)CUME_DISTPERCENT_RANK113936197.530.1166666666666670.1016949152542372142733887.430.03333333333333330.01694915254237293151368317.880.08333333333333330.067796610169491541611.99        1                15172239127.880.05                0.03389830508474586185477218.040.01666666666666670719182670.350.3333333333333330.322033898305085820990525.950.1                0.08474576271186449211535187.440.06666666666666670.0508474576271186102231853.110.9                0.898305084745763112385211.280.6                0.5932203389830511224163929.270.3666666666666670.3559322033898311325522713.710.2166666666666670.2033898305084751426567533.830.2                0.1864406779661021527107968.240.5                0.4915254237288141628644480.020.15                0.1355932203389831729578374.620.1833333333333330.169491525423729183059391.8        0.8                0.796610169491525193164464.830.75                0.7457627118644072032124081.80.4333333333333330.4237288135593222133110987.480.4833333333333330.4745762711864412234106525.010.5166666666666670.5084745762711862335269009.610.2833333333333330.2711864406779662436131170.120.4166666666666670.4067796610169492537293152.280.2666666666666670.254237288135593263861209.1        0.7833333333333330.7796610169491522739149108.820.4                0.3898305084745762840412274.430.25               0.23728813559322 2941101928.660.5333333333333330.525423728813559304296450.490.5666666666666670.559322033898305314363514.580.7666666666666670.76271186440678324454659.470.8333333333333330.8305084745762713345122265.540.45                0.440677966101695344673544.8    0.65                0.644067796610169354770288.370.7166666666666670.711864406779661364869656.880.7333333333333330.7288135593220343711391540.880.5833333333333330.5762711864406783811471070.250.6833333333333330.6779661016949153911510505.430.9333333333333330.9322033898305084011656622.640.8166666666666670.813559322033898            4111740338.170.85            0.84745762711864442118226875.980.3                0.2881355932203394311939241.120.866666666666667 0.8644067796610174412027333.370.9166666666666670.91525423728813545123163865.430.3833333333333330.37288135593220346124624.82          0.9833333333333330.98305084745762747125117214.730.4666666666666670.45762711864406848126172907.760.35            0.33898305084745849127611329.860.1666666666666670.15254237288135650128221494.360.3166666666666670.30508474576271251129496483.760.2333333333333330.22033898305084752130691798.970.1333333333333330.118644067796615313170645.280.7                0.6949152542372885413283353.360.6166666666666670.6101694915254245513376010        0.6333333333333330.627118644067797561368989.83         0.95            0.9491525423728815714097259.840.55                0.5423728813559325814635771.660.8833333333333330.88135593220339591471685.89        0.9666666666666670.9661016949152546014872058.920.6666666666666670.661016949152542


(3) 分片NTILE

ntile(n), 划分为n个分片,查找属于哪个分片

select s.prod_id, sum(s.amount_sold),ntile(4) over (  order by sum(s.amount_sold) desc) as ntilefrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 

结果:

   PROD_IDSUM(S.AMOUNT_SOLD)NTILE113936197.5312142733887.4313151368317.88141611.99        45172239127.8816185477218.041719182670.352820990525.9519211535187.441102231853.114112385211.2831224163929.2721325522713.7111426567533.8311527107968.2421628644480.0211729578374.621183059391.8        4193164464.8332032124081.822133110987.4822234106525.0132335269009.6122436131170.1222537293152.282263861209.1        42739149108.8222840412274.4312941101928.663304296450.493314363514.584324454659.4743345122265.542344673544.8        3354770288.373364869656.8833711391540.8833811471070.2533911510505.4344011656622.6444111740338.17442118226875.9824311939241.1244412027333.37445123163865.43246124624.82        447125117214.73248126172907.76249127611329.86150128221494.36251129496483.76152130691798.9715313170645.2835413283353.3635513376010        3561368989.83        45714097259.8435814635771.664591471685.89 46014872058.923


(4). ROW_NUMBER
row_number 从1开始,每个分组返回一个数字。

select s.prod_id, sum(s.amount_sold),row_number()  over (  order by sum(s.amount_sold) desc) as row_numberfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 

结果:

   PROD_IDSUM(S.AMOUNT_SOLD)ROW_NUMBER113936197.5372142733887.4323151368317.88541611.99        605172239127.8836185477218.041719182670.3520820990525.9569211535187.444102231853.1154112385211.28361224163929.27221325522713.71131426567533.83121527107968.24301628644480.0291729578374.6211183059391.8  48193164464.83452032124081.8262133110987.48292234106525.01312335269009.61172436131170.12252537293152.2816263861209.1 472739149108.82242840412274.43152941101928.6632304296450.4934314363514.5846324454659.47503345122265.5427344673544.8  39354770288.3743364869656.88443711391540.88353811471070.25413911510505.43564011656622.64494111740338.175142118226875.98184311939241.12524412027333.375545123163865.432346124624.82         5947125117214.732848126172907.762149127611329.861050128221494.361951129496483.761452130691798.9785313170645.28425413283353.36375513376010           38561368989.83        575714097259.84335814635771.6653591471685.89        586014872058.9240


2、反百分点函数

使用反百分比函数可以获得对应某个百分点的值。

percenttile_disc、percentile_cont  与 cume_dist、percent_rank 作用相反。

select percentile_cont(0.6) within group (  order by sum(s.amount_sold) desc) as percentile_cont,percentile_disc(0.6) within group (  order by sum(s.amount_sold) desc) as percentile_discfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 

结果:

   PERCENTILE_CONTPERCENTILE_DISC184468.11285211.28


3、窗口函数

(1)、累积和: rows between unbounded preceding and current row

计算某年1月到12月累计销量:

select s.month_id, sum(s.amount_sold), sum(sum(s.amount_sold)) over            (order by s.month_id  rows between unbounded preceding and current row)             as cumulative_amountfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.month_idorder by s.month_id; 

结果

   MONTH_IDSUM(S.AMOUNT_SOLD)CUMULATIVE_AMOUNT1012277420.492277420.492022372690.874650111.363031830572.6464806844041975978.38456662.35051748287.2310204949.536061869728.6112074678.147071932282.2814006960.428081972532.6315979493.059092167008.1918146501.2410102236464.5320382965.7711111959664.2222342629.9912121741284.9624083914.95

(2)、移动平均值 rows between N preceding and current row

某年当月与前三个月之间的移动平均值:

select s.month_id, sum(s.amount_sold), avg(sum(s.amount_sold)) over            (order by s.month_id  rows between 3 preceding and current row)             as moving_avgfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.month_idorder by s.month_id; 

结果:

1012277420.492277420.492022372690.872325055.683031830572.6421602284041975978.32114165.5755051748287.231981882.266061869728.611856141.6957071932282.281881569.1058081972532.631880707.68759092167008.191985387.927510102236464.532077071.907511111959664.222083917.392512121741284.962026105.475


(3)、中心平均值: between N preceding and N following

计算当月与前后1月的中心平均值

select s.month_id, sum(s.amount_sold), avg(sum(s.amount_sold)) over            (order by s.month_id  rows between 1 preceding and 1 following )             as moving_avgfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.month_idorder by s.month_id; 


结果:
   MONTH_IDSUM(S.AMOUNT_SOLD)MOVING_AVG1012277420.492325055.682022372690.8721602283031830572.642059747.274041975978.31851612.723333335051748287.231864664.713333336061869728.611850099.373333337071932282.281924847.848081972532.632023941.033333339092167008.192125335.1166666710102236464.532121045.6466666711111959664.221979137.9033333312121741284.961850474.59


(4)、FIRST_VALUE 、 LAST_VALUE

FIRST_VALUE 、 LAST_VALUE 获取窗口的首行、末行。

select s.month_id, sum(s.amount_sold), first_value(sum(s.amount_sold)) over            (order by s.month_id  rows between 1 preceding and 1 following )             as last_month_amount,last_value(sum(s.amount_sold)) over            (order by s.month_id  rows between 1 preceding and 1 following )             as next_month_amount           from all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.month_idorder by s.month_id; 

结果:

   MONTH_IDSUM(S.AMOUNT_SOLD)LAST_MONTH_AMOUNTNEXT_MONTH_AMOUNT1012277420.492277420.492372690.872022372690.872277420.491830572.643031830572.642372690.871975978.34041975978.31830572.641748287.235051748287.231975978.31869728.616061869728.611748287.231932282.287071932282.281869728.611972532.638081972532.631932282.282167008.199092167008.191972532.632236464.5310102236464.532167008.191959664.2211111959664.222236464.531741284.9612121741284.961959664.221741284.96


4、报表函数

包括 sum、avg、max、min、count、variance、stddev、ratio_to_report等。


(1)、总计报表

计算每月销量总和以及所有产品销量总和

select s.prod_id, s.month_id, sum(sum(s.amount_sold)) over (partition by s.month_id) as month_total,sum(sum(s.amount_sold)) over (partition by s.prod_id) as prod_totalfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_id, s.month_idorder by s.prod_id, s.month_id; 


结果:

   , PROD_ID, MONTH_ID, MONTH_TOTAL, PROD_TOTAL1  13  01  2277420.49  936197.532  13  02  2372690.87  936197.533  13  03  1830572.64  936197.534  13  04  1975978.3    936197.535  13  05  1748287.23  936197.536  13  06  1869728.61  936197.537  13  07  1932282.28  936197.538  13  08  1972532.63  936197.539  13  10  2236464.53  936197.5310  13  12  1741284.96  936197.5311  14  01  2277420.49  2733887.4312  14  02  2372690.87  2733887.4313  14  03  1830572.64  2733887.4314  14  04  1975978.3  2733887.4315  14  05  1748287.23  2733887.4316  14  06  1869728.61  2733887.4317  14  07  1932282.28  2733887.4318  14  08  1972532.63  2733887.4319  14  09  2167008.19  2733887.4320  14  10  2236464.53  2733887.4321  14  11  1959664.22  2733887.4322  14  12  1741284.96  2733887.4323  15  01  2277420.49  1368317.8824  15  02  2372690.87  1368317.8825  15  03  1830572.64  1368317.8826  15  04  1975978.3  1368317.8827  15  05  1748287.23  1368317.8828  15  06  1869728.61  1368317.8829  15  07  1932282.28  1368317.8830  15  08  1972532.63  1368317.8831  15  09  2167008.19  1368317.8832  15  10  2236464.53  1368317.8833  15  11  1959664.22  1368317.8834  15  12  1741284.96  1368317.8835  16  03  1830572.64  11.9936  17  01  2277420.49  2239127.8837  17  02  2372690.87  2239127.8838  17  03  1830572.64  2239127.8839  17  04  1975978.3  2239127.8840  17  05  1748287.23  2239127.8841  17  06  1869728.61  2239127.8842  17  07  1932282.28  2239127.8843  17  08  1972532.63  2239127.8844  17  09  2167008.19  2239127.8845  17  10  2236464.53  2239127.8846  17  11  1959664.22  2239127.8847  17  12  1741284.96  2239127.8848  18  01  2277420.49  5477218.0449  18  02  2372690.87  5477218.0450  18  03  1830572.64  5477218.0451  18  04  1975978.3  5477218.0452  18  05  1748287.23  5477218.0453  18  06  1869728.61  5477218.0454  18  07  1932282.28  5477218.0455  18  08  1972532.63  5477218.0456  18  09  2167008.19  5477218.0457  18  10  2236464.53  5477218.0458  18  11  1959664.22  5477218.0459  18  12  1741284.96  5477218.0460  19  01  2277420.49  182670.35


(2)、RATIO_TO_REPORT 函数

RATIO_TO_REPORT 可用来计算某个值在一组值的总和中所占的比例。

计算每个产品每个月销量总和,以及该类型在整月中的比例。

select s.prod_id, s.month_id, sum(s.amount_sold) as prod_month_total,ratio_to_report(sum(s.amount_sold)) over (partition by s.month_id) as prod_ratiofrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_id, s.month_idorder by s.prod_id, s.month_id;  
结果:


   PROD_IDMONTH_IDPROD_MONTH_TOTALPROD_RATIO11301125575.640.055139417841981421302122325.210.05155547717853363130361649.5        0.03367771300241874130417404.260.008807920613298235130561649.5 0.03526279832176096130620004        0.010698878913769271307164719.380.08524602316386281308125010.330.063375544768554791310112205.270.0501708247525839101312125654.440.0721619050795684111401239773.240.105282814944727121402278879.970.117537422816483

5、LAG与LEAD函数

LAG 与 LEAD 函数获得位于距当前指定距离处那条记录中的数据。

如获得前一个月 与后一个月的数据:

select s.month_id, sum(s.amount_sold), lag(sum(s.amount_sold)) over (  order by s.month_id) as last_month_sum,lead(sum(s.amount_sold)) over ( order by s.month_id) as next_month_sumfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.month_idorder by s.month_id; 

结果:

   MONTH_IDSUM(S.AMOUNT_SOLD) LAST_MONTH_SUM  NEXT_MONTH_SUM1012277420.49        2372690.872022372690.872277420.491830572.643031830572.642372690.871975978.34041975978.31830572.641748287.235051748287.231975978.31869728.616061869728.611748287.231932282.287071932282.281869728.611972532.638081972532.631932282.282167008.199092167008.191972532.632236464.5310102236464.532167008.191959664.2211111959664.222236464.531741284.9612121741284.961959664.22


6、FIRST函数与LAST函数

first 和 last 返回排序分组中的第一个或者最后一个值。

如查询某年销量最高的、最低的月份。

select min(s.month_id) keep (dense_rank first order by sum(s.amount_sold)) as min_month,max(s.month_id) keep (dense_rank last order by sum(s.amount_sold)) as max_monthfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.month_idorder by s.month_id; 


结果:

     MIN_MONTHMAX_MONTH
1 12                 02


7、线性回归函数



8、假设评级及分布函数

假象评级与分布函数可以计算一条新记录在表中的排名和百分比,与 rank(),dense_rank(), percent_rank(),  cume_dist() 等连用。

例如,查找 sum(amount) 为20004 的假想排名和百分比排名

 

select rank(20004) within group (  order by sum(s.amount_sold) desc) as rank,percent_rank(20004) within group (  order by sum(s.amount_sold) desc) as pencent_rankfrom all_sales s where s.year=1998 and s.amount_sold is not nullgroup by s.prod_idorder by s.prod_id; 

结果:

    RANK PENCENT_RANK
1 56 0.916666666666667



0 0