python
来源:互联网 发布:多表创建视图的sql语句 编辑:程序博客网 时间:2024/06/11 16:35
import pandas as pdclos = ['Time', 'Action', 'User', 'Product', 'Quantity', 'Price']orders = pd.read_csv('Data/purchase_order.tab', sep='\t',parse_dates={'Dates' : [0]}, names=clos, encoding='utf-8')orders.info()
<class 'pandas.core.frame.DataFrame'>RangeIndex: 54772 entries, 0 to 54771Data columns (total 6 columns):Dates 54772 non-null datetime64[ns]Action 54772 non-null objectUser 54772 non-null objectProduct 54772 non-null objectQuantity 54772 non-null int64Price 54721 non-null float64dtypes: datetime64[ns](1), float64(1), int64(1), object(3)memory usage: 2.5+ MB
orders.head()
# 求某个商品的总售出量orders[orders['Product'] == 'P0002267974']['Price'].describe()
count 10.0mean 285.0std 0.0min 285.025% 285.050% 285.075% 285.0max 285.0Name: Price, dtype: float64
# 商品去重orders['Product'].unique()
array(['P0006944501', 'P0006018073', 'P0002267974', ..., 'P0022884606', 'P0013911085', 'P0025123755'], dtype=object)
# 商品单价orders.groupby('Product')['Price'].mean().sort_values(ascending=False).head()
ProductP0000143511 438888.0P0000143500 438888.0P0006584093 320000.0P0025280275 183900.0P0000150006 111375.0Name: Price, dtype: float64
# 增加总价格栏位orders['Total_Price'] = orders['Quantity'] * orders['Price']orders.head()
# 查看哪个用户今天消费最高orders.groupby('User')['Total_Price'].sum().sort_values(ascending=False).head()
UserU166708333 2942744.0U10120098943 1451117.0U142809250 747550.0U1006283751 515688.0U10114715330 456782.0Name: Total_Price, dtype: float64
import pandas as pdclos = ['Time', 'Action', 'User', 'Product']views = pd.read_csv('Data/purchase_view.tab', sep='\t',parse_dates={'Dates' : [0]}, names=clos, encoding='utf-8')views.info()
# 查看用户购买记录clos = ['Time', 'Action', 'User', 'Product', 'Quantity', 'Price']orders = pd.read_csv('Data/purchase_order.tab', sep='\t',parse_dates={'Dates' : [0]}, names=clos, encoding='utf-8') # reset_index name column,DataFrameorders_cnt = orders.groupby(['User', 'Product'])['Product'].count().reset_index(name='buys')orders_cnt.head()
# 查看用户浏览记录views_cnt = views.groupby(['User', 'Product'])['Product'].count().reset_index(name='views')views_cnt.head()
# 合并购买与浏览记录merge_df = pd.merge(orders_cnt, views_cnt, on=['User', 'Product'], how='right')merge_df.head()
%pylab inlineviews_cnt_by_date.plot(kind='line', figsize=[10, 5])
# 根据小时计算用户的浏览记录总和views_cnt_by_hour = views.groupby(views['Dates'].dt.hour)['Action'].count()views_cnt_by_hour.plot(kind='line', figsize=[10, 5])
# 查看消费大户orders['Total_Price'] = orders['Quantity'] * orders['Price']g = orders.groupby('User')['Total_Price'].sum().sort_values(ascending=False)[0:10]g.plot(kind='bar', figsize=[10, 5])
# merge 用户每日浏览量和购买量views_daily_cnt = views.groupby(by=views['Dates'].dt.date)['Action'].count()orders_daily_cnt = orders.groupby(by=orders['Dates'].dt.date)['Action'].count()df = pd.concat([views_daily_cnt, orders_daily_cnt], axis=1) #用index 做merge keydf.dropna(inplace=True)df.columns = ['views', 'orders']df.plot(kind='line', figsize=[10, 5], rot=45)
阅读全文
0 0
- Python
- Python
- Python
- python
- Python
- PYTHON
- Python
- Python
- Python
- Python
- Python
- Python
- Python
- Python
- Python
- Python
- python
- Python
- Mysql-5.7 基于GTID主从复制
- 编辑从字节码和 JVM 的角度解析 Java 核心类 String 的不可变特性
- C++设计模式——单例模式(总结比较全面)
- 算法题--两种排序方法
- JVM内存回收策略
- python
- 004_Java实现百度网盘爬虫
- mysql主从同步注意事项
- 如何实现一个分布式 RPC 框架
- 构造方法
- 前端初学学习进程X
- leetcode 142. Linked List Cycle II
- 深入了解 Java 之虚拟机内存
- css3 五角星圆形导航