Python-pandas模块数据处理

来源:互联网 发布:上位机编程语言 编辑:程序博客网 时间:2024/05/19 03:42

1.常用数据结构

(1)Series相当于一个一维数组,只不过多了一个索引

>import pandas as pd>s=pd.Series([1,2,3,4],index=['a','b','c','d'])>sa    1b    2c    3d    4dtype: int64>s['a']1

(2)DataFrame相当于一个二维数组,可以通过行,列来索引

>df=pd.DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randn(5),'data2':np.random.randn(5)})      data1     data2 key1 key20  0.298132 -0.889997    a  one1 -1.610528  0.735897    a  two2  1.229059 -0.922434    b  one3 -0.419731  1.611932    b  two4 -0.485703 -1.041524    a  one>df.ix[1,:]data1    -1.61053data2    0.735897key1            akey2          twoName: 1, dtype: object>df['data1']0    0.2981321   -1.6105282    1.2290593   -0.4197314   -0.485703Name: data1, dtype: float64

2.常用操作

【不定期更新,边学边练】
(1)groupby
(2)rolling,rolling_mean
DataFrame.rolling(window, min_periods=None, freq=None, center=False, win_type=None, on=None, axis=0, closed=None)
相当于滑动窗口处理数据
window–窗口大小
min_periods–至少得有多少个元素才能计算结果

>df['data1'].rolling(2).sum()#第一个窗口因为只有一个元素,所以没办法计算结果0         NaN1   -1.3123962   -0.3814683    0.8093284   -0.905435Name: data1, dtype: float64>df['data1'].rolling(2,min_periods=1).sum()#设置min_peroid,第一个窗口也可以成功计算0    0.2981321   -1.3123962   -0.3814683    0.8093284   -0.905435Name: data1, dtype: float64

这个函数好像快被前面那个替代了
pandas.rolling_mean(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

>pd.rolling_mean(df[['data1','data2'],2)      data1     data20       NaN       NaN1 -0.656198 -0.0770502 -0.190734 -0.0932683  0.404664  0.3447494 -0.452717  0.285204>pd.rolling_mean(df[['data1','data2']],2,min_periods=1)      data1     data20  0.298132 -0.8899971 -0.656198 -0.0770502 -0.190734 -0.0932683  0.404664  0.3447494 -0.452717  0.285204
0 0