时间序列(二)数据重采样

来源:互联网 发布:如何清理淘宝缓存 编辑:程序博客网 时间:2024/05/20 05:30

数据重采样
时间数据由一个频率转换到另一个频率
降采样
升采样

生成一条带随机值的时间序列

rng = pd.date_range('1/1/2011', periods=90, freq='D')ts = pd.Series(np.random.randn(len(rng)), index=rng)print(ts.head())

2011-01-01 -1.025562
2011-01-02 0.410895
2011-01-03 0.660311
2011-01-04 0.710293
2011-01-05 0.444985
Freq: D, dtype: float64

按月求和展示

ts.resample('M').sum()

2011-01-31 2.510102
2011-02-28 0.583209
2011-03-31 2.749411
Freq: M, dtype: float64

三天一求和

ts.resample('3D').sum()

2011-01-01 0.045643
2011-01-04 -2.255206
2011-01-07 0.571142
2011-01-10 0.835032
2011-01-13 -0.396766
2011-01-16 -1.156253
2011-01-19 -1.286884
2011-01-22 2.883952
2011-01-25 1.566908
2011-01-28 1.435563
2011-01-31 0.311565
2011-02-03 -2.541235
2011-02-06 0.317075
2011-02-09 1.598877
2011-02-12 -1.950509
2011-02-15 2.928312
2011-02-18 -0.733715
2011-02-21 1.674817
2011-02-24 -2.078872
2011-02-27 2.172320
2011-03-02 -2.022104
2011-03-05 -0.070356
2011-03-08 1.276671
2011-03-11 -2.835132
2011-03-14 -1.384113
2011-03-17 1.517565
2011-03-20 -0.550406
2011-03-23 0.773430
2011-03-26 2.244319
2011-03-29 2.951082
Freq: 3D, dtype: float64

3天的均值

day3Ts = ts.resample('3D').mean()day3Ts

2011-01-01 0.015214
2011-01-04 -0.751735
2011-01-07 0.190381
2011-01-10 0.278344
2011-01-13 -0.132255
2011-01-16 -0.385418
2011-01-19 -0.428961
2011-01-22 0.961317
2011-01-25 0.522303
2011-01-28 0.478521
2011-01-31 0.103855
2011-02-03 -0.847078
2011-02-06 0.105692
2011-02-09 0.532959
2011-02-12 -0.650170
2011-02-15 0.976104
2011-02-18 -0.244572
2011-02-21 0.558272
2011-02-24 -0.692957
2011-02-27 0.724107
2011-03-02 -0.674035
2011-03-05 -0.023452
2011-03-08 0.425557
2011-03-11 -0.945044
2011-03-14 -0.461371
2011-03-17 0.505855
2011-03-20 -0.183469
2011-03-23 0.257810
2011-03-26 0.748106
2011-03-29 0.983694
Freq: 3D, dtype: float64

注意:如果放大序列那么会有空缺值产生

print(day3Ts.resample('D').asfreq())

2011-01-01 0.015214
2011-01-02 NaN
2011-01-03 NaN
2011-01-04 -0.751735
2011-01-05 NaN
2011-01-06 NaN
2011-01-07 0.190381
2011-01-08 NaN
2011-01-09 NaN
2011-01-10 0.278344
2011-01-11 NaN
2011-01-12 NaN
2011-01-13 -0.132255
2011-01-14 NaN
2011-01-15 NaN
2011-01-16 -0.385418
2011-01-17 NaN
2011-01-18 NaN
2011-01-19 -0.428961
2011-01-20 NaN
2011-01-21 NaN
2011-01-22 0.961317
2011-01-23 NaN
2011-01-24 NaN
2011-01-25 0.522303
2011-01-26 NaN
2011-01-27 NaN
2011-01-28 0.478521
2011-01-29 NaN
2011-01-30 NaN

这时可以选择三种填充方法,copy前面值,copy后面值,或者线性递推填充
插值方法:
ffill 空值取前面的值
bfill 空值取后面的值
interpolate 线性取值

day3Ts.resample('D').ffill(1)
day3Ts.resample('D').bfill(1)
day3Ts.resample('D').interpolate('linear')
原创粉丝点击