python pandas 聚合与分组函数

来源:互联网 发布:session生命周期 php 编辑:程序博客网 时间:2024/06/15 09:31

1 主要内容

  1. DataFrame.groupby().sum()
  2. DataFrame.groupby().agg()
  3. pandas.concat([DataFrame1,DataFrame2])
  4. pandas.merge(DataFrame1,DataFrame2,parameters….)
  5. DataFrame1.join(DataFrame2,lsuffix=’列名 on DataFrame1’,rsuffix=’列名 on DataFrame2’)
  6. 帮助文档的获取

2 实例

  1. 构造dataframe如下所示:
  food  food_id  number     price  user_id weather0       soup        4       6  1.818250        3    cold1       soup        8       6  1.834045        4     hot2    iceream        8       7  3.042422        2    cold3  chocolate        3       6  5.247564        4     hot4    iceream        6       3  4.319450        4    cold5    iceream        5       4  2.912291        1    cold6    iceream        2       7  6.118529        2    cold7       soup        8       4  1.394939        2     hot8       soup        6       8  2.921446        2     hot9  chocolate        2       1  3.663618        4     hot

实现程序如下所示:

import pandas as pdfrom numpy import randomfrom numpy.random import randimport numpy as nprandom.seed(42)df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10),'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'],'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'],'price':10 * rand(10),'number':random.randint(1,9,10)}) print df

2 groupby函数应用
代码

groupby1 = df.groupby(['user_id'])i = 0for user_id,group in groupby1:    i = i + 1    print "group", i , user_id    print group

结果

group 1 1      food  food_id  number     price  user_id weather5  iceream        5       4  2.912291        1    coldgroup 2 2      food  food_id  number     price  user_id weather2  iceream        8       7  3.042422        2    cold6  iceream        2       7  6.118529        2    cold7     soup        8       4  1.394939        2     hot8     soup        6       8  2.921446        2     hotgroup 3 3   food  food_id  number    price  user_id weather0  soup        4       6  1.81825        3    coldgroup 4 4        food  food_id  number     price  user_id weather1       soup        8       6  1.834045        4     hot3  chocolate        3       6  5.247564        4     hot4    iceream        6       3  4.319450        4    cold9  chocolate        2       1  3.663618        4     hot

3 groupby和sum等函数结合使用
代码

print groupby1.sum()#对除了groupby索引以外的每个数值列进行求和 print groupby1['food_id','number'].sum()#对除了groupby索引以外的特定数值列进行求和 print df.groupby(['user_id'],as_index=False).sum()#默认as_index=True#当然除了sum,还有mean,min,max,median,mode,std,mad等等,操作方法同理#groupby()中的形参可用help(df.groupby)来查看#常用的参数axis=0,表示对行进行操作,即指定列中不同值进行分组;axis=1,表示对列进行分组

结果

output[1]:  food_id  number      priceuser_id                            1              5       4   2.9122912             24      26  13.4773363              4       6   1.8182504             19      16  15.064678output[2]:         food_id  numberuser_id                 1              5       42             24      263              4       64             19      16output[3]:   user_id  food_id  number      price0        1        5       4   2.9122911        2       24      26  13.4773362        3        4       6   1.8182503        4       19      16  15.064678

4 agg函数
代码

print df.groupby(['weather','food']).agg([np.mean,np.median])

结果

 output[4]:                    food_id        number            price            \                       mean median   mean median      mean    median   weather food                                                           cold    iceream    5.250000    5.5   5.25    5.5  4.098173  3.680936           soup       4.000000    4.0   6.00    6.0  1.818250  1.818250   hot     chocolate  2.500000    2.5   3.50    3.5  4.455591  4.455591           soup       7.333333    8.0   6.00    6.0  2.050143  1.834045                       user_id                                mean median  weather food                        cold    iceream    2.250000      2          soup       3.000000      3  hot     chocolate  4.000000      4          soup       2.666667      2  

5 concat()
代码

print "df :3\n",df[:3]print "df :4\n",df[6:]print pd.concat([df[:3],df[6:]],axis=0)

结果

df :3      food  food_id  number     price  user_id weather0     soup        4       6  1.818250        3    cold1     soup        8       6  1.834045        4     hot2  iceream        8       7  3.042422        2    colddf :4        food  food_id  number     price  user_id weather6    iceream        2       7  6.118529        2    cold7       soup        8       4  1.394939        2     hot8       soup        6       8  2.921446        2     hot9  chocolate        2       1  3.663618        4     hotdf.concat        food  food_id  number     price  user_id weather0       soup        4       6  1.818250        3    cold1       soup        8       6  1.834045        4     hot2    iceream        8       7  3.042422        2    cold6    iceream        2       7  6.118529        2    cold7       soup        8       4  1.394939        2     hot8       soup        6       8  2.921446        2     hot9  chocolate        2       1  3.663618        4     hot

6 merge()和join()
代码

df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})print "df1\n",df1print "df2\n",df2print "Merge() on Key\n",pd.merge(df1,df2,on='EmpNr')print "inner join with Merge()\n",pd.merge(df1,df2,how='inner')print "Dests join tips\n",df1.join(df2,lsuffix='Dest',rsuffix='Tips')

结果

df1        Dest  EmpNr0  The Hague      51  Amsterdam      32  Rotterdam      9df2   Amount  EmpNr0    10.0      51     5.0      92     2.5      7Merge() on Key        Dest  EmpNr  Amount0  The Hague      5    10.01  Rotterdam      9     5.0inner join with Merge()        Dest  EmpNr  Amount0  The Hague      5    10.01  Rotterdam      9     5.0Dests join tips        Dest  EmpNrDest  Amount  EmpNrTips0  The Hague          5    10.0          51  Amsterdam          3     5.0          92  Rotterdam          9     2.5          7

6帮助文档获取方式

1.help(pd.concat)2.dir(pd.concat)3.pd.concat?...

7 参考文献
利用python进行数据分析笔记
python数据分析,Ivan Idris著

原创粉丝点击