Python基础篇—Pandas应用(二)
来源:互联网 发布:bbs url 网络推广 编辑:程序博客网 时间:2024/04/25 09:48
好的!接下来我们将利用骑行路线的数据集!我住在Montreal,加拿大东南部港市,我比较好奇这座城市的人们喜欢乘坐公共车辆,还是喜欢骑车?骑车的话 ,是喜欢在周末,还是工作日呢?
加载数据
首先,我们需要载入数据。
%matplotlib inlineimport pandas as pdimport matplotlib.pyplot as pltpd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettierplt.rcParams['figure.figsize'] = (15, 5)plt.rcParams['font.family'] = 'sans-serif'# This is necessary to show lots of columns in pandas 0.12. # Not necessary in pandas 0.13.pd.set_option('display.width', 5000) pd.set_option('display.max_columns', 60)bikes = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')
我们来看一下数据:
bikes[:5]
输出:
数据给出的是该地区平均每天选择7种骑行路线的人数统计。
接下来,我们将主要考虑Berri这条路线。Berri是Montreal的一条想当不错的骑行街道。现在,我经常选择这条路线去图书馆,但是之前我在Old Montreal住的时候,也经过这条街道去上班。
添加新的一列到数据框
我们将建立一个数据框,该数据框里只包含Berri路线。
berri_bikes = bikes[['Berri 1']].copy()berri_bikes[:5]
输出:
接下来,我们需要为其增加一个“weekday”列。首先, 我们可以从索引来获得。索引就是位于数据框左侧“Date”下面的一列,它包含了一年中所有的日期。可以通过下面的命令查看:
berri_bikes.index
输出:
<class 'pandas.tseries.index.DatetimeIndex'>[2012-01-01, ..., 2012-11-05]Length: 310, Freq: None, Timezone: None
你可以看到,这里一共记录了一年中的310天的记录。Pandas拥有一系列的时间序列函数,所以,如果我们想要每一行记录数据的日期,我们可以这样做:
berri_bikes.index.day
输出:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 1, 2, 3, 4, 5], dtype=int32)
事实上,我们想得到的是工作日,即周几:
berri_bikes.index.weekday
输出:
array([6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0], dtype=int32)
这样就得到了weekday,其中,“0”代表周一。既然知道了如何获取weekday,我们就可以将其添加到数据框中,想这样:
berri_bikes.loc[:,'weekday'] = berri_bikes.index.weekdayberri_bikes[:5]
通过weekday将骑自行车的人加到一起
这是非常简单的!
数据框中有一个“.groupby()”的方法,如果你熟悉SQL的话,它就如同SQL中的groupby。这里不再详细地介绍这个方法——如果你想了解更多关于这个方法的知识,这个网页是个不错的选择!
“berri_bikes.groupby(‘weekday’).aggregate(sum)”的意思就是:按weekday将所有的行分组,然后再将相同的weekday的人数加起来:
weekday_counts = berri_bikes.groupby('weekday').aggregate(sum)weekday_counts
输出:
用0到6表示星期不方便记忆,所以可以这样:
weekday_counts.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']weekday_counts
输出:
可视化:
weekday_counts.plot(kind='bar')
输出:
- Python基础篇—Pandas应用(二)
- Python基础篇—Pandas应用(一)
- Python基础篇—Pandas应用(三)
- python学习笔记二(pandas基础)
- python——pandas应用
- pandas的基础应用
- Python 数据分析:pandas 操作基础篇
- python pandas 应用系列
- Python Pandas基础1
- Python pandas基础2
- python pandas基础3
- python pandas库基础
- python-Pandas基础
- Python之pandas基础
- python基础之pandas
- python pandas基础
- Python基础-Pandas
- python基础---反射应用二
- input与span之间莫名的空隙
- qt读写excel
- 凑硬币
- java读取二进制文件,并且用二分查找安装一定的规则查找符合条件的数据列
- 偶遇mysql从库同步延迟延重
- Python基础篇—Pandas应用(二)
- MyEclipse安装JS代码提示插件——Spket插件)
- Tomcat Server.xml详解
- Java基于POI读取Excel工具类
- 03_Elasticsearch如何安装以及相关插件的介绍
- Struts2 三种action类
- 服务器配置ASP.NET服务过程
- Digimat-MF:广义平均结果(General averaging results)
- 笔试选择题二十