python pandas 聚合与分组函数
来源:互联网 发布:session生命周期 php 编辑:程序博客网 时间:2024/06/15 09:31
1 主要内容
- DataFrame.groupby().sum()
- DataFrame.groupby().agg()
- pandas.concat([DataFrame1,DataFrame2])
- pandas.merge(DataFrame1,DataFrame2,parameters….)
- DataFrame1.join(DataFrame2,lsuffix=’列名 on DataFrame1’,rsuffix=’列名 on DataFrame2’)
- 帮助文档的获取
2 实例
- 构造dataframe如下所示:
food food_id number price user_id weather0 soup 4 6 1.818250 3 cold1 soup 8 6 1.834045 4 hot2 iceream 8 7 3.042422 2 cold3 chocolate 3 6 5.247564 4 hot4 iceream 6 3 4.319450 4 cold5 iceream 5 4 2.912291 1 cold6 iceream 2 7 6.118529 2 cold7 soup 8 4 1.394939 2 hot8 soup 6 8 2.921446 2 hot9 chocolate 2 1 3.663618 4 hot
实现程序如下所示:
import pandas as pdfrom numpy import randomfrom numpy.random import randimport numpy as nprandom.seed(42)df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10),'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'],'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'],'price':10 * rand(10),'number':random.randint(1,9,10)}) print df
2 groupby函数应用
代码
groupby1 = df.groupby(['user_id'])i = 0for user_id,group in groupby1: i = i + 1 print "group", i , user_id print group
结果
group 1 1 food food_id number price user_id weather5 iceream 5 4 2.912291 1 coldgroup 2 2 food food_id number price user_id weather2 iceream 8 7 3.042422 2 cold6 iceream 2 7 6.118529 2 cold7 soup 8 4 1.394939 2 hot8 soup 6 8 2.921446 2 hotgroup 3 3 food food_id number price user_id weather0 soup 4 6 1.81825 3 coldgroup 4 4 food food_id number price user_id weather1 soup 8 6 1.834045 4 hot3 chocolate 3 6 5.247564 4 hot4 iceream 6 3 4.319450 4 cold9 chocolate 2 1 3.663618 4 hot
3 groupby和sum等函数结合使用
代码
print groupby1.sum()#对除了groupby索引以外的每个数值列进行求和 print groupby1['food_id','number'].sum()#对除了groupby索引以外的特定数值列进行求和 print df.groupby(['user_id'],as_index=False).sum()#默认as_index=True#当然除了sum,还有mean,min,max,median,mode,std,mad等等,操作方法同理#groupby()中的形参可用help(df.groupby)来查看#常用的参数axis=0,表示对行进行操作,即指定列中不同值进行分组;axis=1,表示对列进行分组
结果
output[1]: food_id number priceuser_id 1 5 4 2.9122912 24 26 13.4773363 4 6 1.8182504 19 16 15.064678output[2]: food_id numberuser_id 1 5 42 24 263 4 64 19 16output[3]: user_id food_id number price0 1 5 4 2.9122911 2 24 26 13.4773362 3 4 6 1.8182503 4 19 16 15.064678
4 agg函数
代码
print df.groupby(['weather','food']).agg([np.mean,np.median])
结果
output[4]: food_id number price \ mean median mean median mean median weather food cold iceream 5.250000 5.5 5.25 5.5 4.098173 3.680936 soup 4.000000 4.0 6.00 6.0 1.818250 1.818250 hot chocolate 2.500000 2.5 3.50 3.5 4.455591 4.455591 soup 7.333333 8.0 6.00 6.0 2.050143 1.834045 user_id mean median weather food cold iceream 2.250000 2 soup 3.000000 3 hot chocolate 4.000000 4 soup 2.666667 2
5 concat()
代码
print "df :3\n",df[:3]print "df :4\n",df[6:]print pd.concat([df[:3],df[6:]],axis=0)
结果
df :3 food food_id number price user_id weather0 soup 4 6 1.818250 3 cold1 soup 8 6 1.834045 4 hot2 iceream 8 7 3.042422 2 colddf :4 food food_id number price user_id weather6 iceream 2 7 6.118529 2 cold7 soup 8 4 1.394939 2 hot8 soup 6 8 2.921446 2 hot9 chocolate 2 1 3.663618 4 hotdf.concat food food_id number price user_id weather0 soup 4 6 1.818250 3 cold1 soup 8 6 1.834045 4 hot2 iceream 8 7 3.042422 2 cold6 iceream 2 7 6.118529 2 cold7 soup 8 4 1.394939 2 hot8 soup 6 8 2.921446 2 hot9 chocolate 2 1 3.663618 4 hot
6 merge()和join()
代码
df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})print "df1\n",df1print "df2\n",df2print "Merge() on Key\n",pd.merge(df1,df2,on='EmpNr')print "inner join with Merge()\n",pd.merge(df1,df2,how='inner')print "Dests join tips\n",df1.join(df2,lsuffix='Dest',rsuffix='Tips')
结果
df1 Dest EmpNr0 The Hague 51 Amsterdam 32 Rotterdam 9df2 Amount EmpNr0 10.0 51 5.0 92 2.5 7Merge() on Key Dest EmpNr Amount0 The Hague 5 10.01 Rotterdam 9 5.0inner join with Merge() Dest EmpNr Amount0 The Hague 5 10.01 Rotterdam 9 5.0Dests join tips Dest EmpNrDest Amount EmpNrTips0 The Hague 5 10.0 51 Amsterdam 3 5.0 92 Rotterdam 9 2.5 7
6帮助文档获取方式
1.help(pd.concat)2.dir(pd.concat)3.pd.concat?...
7 参考文献
利用python进行数据分析笔记
python数据分析,Ivan Idris著
阅读全文
1 0
- python pandas 聚合与分组函数
- 聚合函数与分组
- 聚合函数与分组
- 聚合函数与分组
- 聚合函数与分组
- 聚合函数与分组
- 聚合函数与分组
- pandas—数据聚合与分组运算
- pandas 分组聚合
- python/pandas数据分析(十五)-聚合与分组运算实例
- 6.聚合函数与分组
- 数据分组与聚合函数
- 聚合函数与分组1
- pandas 数据分组和聚合
- Python之数据聚合与分组运算
- python中数据聚合与分组运算
- 第六章 聚合函数与分组
- 第六章.聚合函数与分组.总结
- 2014年第五届蓝桥杯C/C++程序设计本科B组决赛 Log大侠(编程大题)
- 解决maven与eclipse中@override出现must override a superclass method错误
- 机器学习清单
- git回滚工作区本地库文件对比
- 构建qt项目后使用vs打开
- python pandas 聚合与分组函数
- Spring 手动装配和自动装配
- Android瀑布流照片墙实现,体验不规则排列的美感
- Python图像处理库PIL的ImageFilter模块介绍
- 关于#include<bits/stdc++.h>
- Java对象的销毁
- python技巧——自己做验证码
- text-stroke实现文本描边效果
- MFC下打开选择文件夹并获取文件夹的绝对路径