Python 初识Pandas- Python Data Analysis Library
来源:互联网 发布:剑三万花捏脸数据 编辑:程序博客网 时间:2024/05/16 07:55
Python 初识Pandas- Python Data Analysis Library
学习资料来自于
1. Coursera 《用Python 玩转数据》 https://www.coursera.org/learn/hipython
2. 网站:http://pandas.pydata.org/
Pandas Series
>>> from pandas import Series
>>> sa = Series(['a', 'b', 'c'], index = [0, 1, 2])
>>> sb = Series(['a', 'b', 'c'])
>>> sc = Series(['a', 'c', 'b'])
>>> sa.equals(sc)
False
>>> sb.equals(sa)
True
>>> sa*3 + sc*2
0 aaaaa
1 bbbcc
2 cccbb
>>> from pandas import Series, DataFrame
>>> data = {'language': ['Java', 'PHP', 'Python', 'R', 'C#'],
'year': [ 1995 , 1995 , 1991 ,1993, 2000]}
>>> frame = DataFrame(data)
>>> frame['IDE'] = Series(['Intellij', 'Notepad', 'IPython', 'R studio', 'VS'])
>>> 'VS' in frame['IDE']
False
>>> frame.ix[2]
language Python
year 1991
IDE IPython
Name: 2, dtype: object
frame.ix[2]表示取frame中的第三行数据。
frame.ix[i]表示取frame中的第i+1行数据。
Pandas DataFrame
1. 从雅虎财经获取上市公司股票历史数据,获取从两年前的今天到今天的微软公司的股票数据。
微软公司的公司代号可从http://finance.yahoo.com/q/cp?s=^DJI获得。
from matplotlib.finance import quotes_historical_yahoo
from datetime import date
#import pandas as pd
today = date.today()
start = (today.year-2, today.month, today.day)
quotes = quotes_historical_yahoo('MSFT', start, today)
2. 为quotes数据添加属性名
attributes = ['date','open','close','high','low','volume']
quotesdf = pd.DataFrame(quotes, columns = attributes)
quotesdf 数据示例
>>> quotesdf
date open close high low volume
0 735141 31.243013 31.508104 31.536509 30.958986 39839500
1 735142 31.574376 31.792134 31.820536 31.527039 36718700
2 735143 31.583847 32.114029 32.218173 31.517574 46946800
3 735144 32.076161 32.057226 32.189771 31.640650 38703800
4 735145 31.896275 32.076161 32.180305 31.830002 33008100
5 735148 31.811066 31.527040 31.915211 31.432366 35069300
3. 将索引列更换为日期,并删除掉原先的date列,日期格式是2015年1月30日星期五,显示为‘15/01/30,Fri’ 注意空格和符号。
date.fromordinal , date.strftime的用法如下
>>> from datetime import date
>>> d=date.fromordinal(735866)
>>> d
datetime.date(2015, 9, 25)
>>> y=date.strftime(d,"%y/%m/%d,%a")
>>> y
'15/09/25,Fri'
dataFrame的drop方法可以将指定行或者指定列删除掉。
接上面的代码
list1 = []
for i in range(0, len(quotes)):
x = date.fromordinal(int(quotes[i][0]))
y = date.strftime(x, "%y/%m/%d,%a")
list1.append(y)
quotesdf.index = list1
quotesdf = quotesdf.drop(['date'], 1)
quotesdf 数据示例
>>> quotesdf
open close high low volume
13/09/30,Mon 31.243013 31.508104 31.536509 30.958986 39839500
13/10/01,Tue 31.574376 31.792134 31.820536 31.527039 36718700
13/10/02,Wed 31.583847 32.114029 32.218173 31.517574 46946800
13/10/03,Thu 32.076161 32.057226 32.189771 31.640650 38703800
13/10/04,Fri 31.896275 32.076161 32.180305 31.830002 33008100
4. 要获取2014年1月30日到2月10日这期间微软更换CEO阶段股票的开盘价和收盘价,下面的命令可以运行并得到我们想要的结果:
>>> quotesdf.ix['14/01/30':'14/02/10',['open','close']]
open close
14/01/30,Thu 35.095384 35.162160
14/01/31,Fri 35.248015 36.097019
14/02/03,Mon 36.001626 34.799662
14/02/04,Tue 35.267094 34.675649
14/02/05,Wed 34.618415 34.170063
14/02/06,Thu 34.150985 34.513482
14/02/07,Fri 34.647033 34.875979
5. 查询2014年6月1日至12月31日微软股票收盘价大于45美元的记录。
>>> quotesdf['14/06/01':'14/12/01'][quotesdf.close>45]
open close high low volume
14/09/08,Mon 44.819649 45.257913 45.579304 44.790434 45736700
14/09/09,Tue 45.257913 45.540346 45.744871 45.209214 40302400
14/09/10,Wed 45.598784 45.618262 45.715653 45.072868 27302400
14/09/11,Thu 45.520872 45.774088 45.774088 45.257913 29216400
14/09/12,Fri 45.686436 45.481914 45.793566 45.384519 38244700
6. 查询在2014年整年内(即1月1日至12月31日)微软股票收盘价最高的5天数据
>>> quotesdf['14/01/01':'14/12/31'].sort('close', ascending=False)[:5]
open close high low volume
14/11/13,Thu 47.536879 48.316012 48.354970 47.439485 26208800
14/11/14,Fri 48.442622 48.286795 48.744533 48.101748 29081700
14/11/17,Mon 48.121228 48.169923 48.413402 47.858270 30315500
14/12/04,Thu 47.425075 47.866103 48.081717 47.238866 30320400
14/11/18,Tue 48.150321 47.768099 48.346334 47.728896 23995500
ascending=False or 0 表示降序
ascending=True or 1 表示升序
默认为升序
7. 统计微软股票在2014年中(即1月1日至12月31日)各个月价格上涨的天数
list1 = []
tmpdf = quotesdf['14/01/01':'14/12/31']
for i in range(0, len(tmpdf)):
list1.append(int(tmpdf.index[i][3:5]))
tmpdf['month'] = list1
print tmpdf[ tmpdf.close > tmpdf.open]['month'].value_counts()
结果为
9 14
10 12
2 12
11 11
8 11
6 11
3 11
12 10
7 10
4 10
5 9
1 9
value_counts() 返回一个series, 上例中类似group by 'month' 的用法。
index的用法
>>> quotesdf.index[0]
'13/09/30,Mon'
>>> quotesdf.index[0][3:5]
'09'
8. 统计2014年整年微软股票每个月的成交量
>>> tmpdf.groupby('month')['volume'].sum()
month
1 930226200
2 705304500
3 778425700
4 746113500
5 574362900
6 555779700
7 731616500
8 513919700
9 860827300
10 853235700
11 522988700
12 605188200
9. 列出2014年微软股票收盘价最高的5天和最低的5天。
>>> s = quotesdf.sort('close')
>>> pd.concat([s[:5],s[-5:]])
open close high low volume
13/10/08,Tue 31.536509 31.252479 31.555445 31.053661 41017600
13/10/09,Wed 31.309286 31.309286 31.574376 31.205142 35878600
13/09/30,Mon 31.243013 31.508104 31.536509 30.958986 39839500
13/10/07,Mon 31.811066 31.527040 31.915211 31.432366 35069300
13/10/01,Tue 31.574376 31.792134 31.820536 31.527039 36718700
14/11/17,Mon 48.121228 48.169923 48.413402 47.858270 30315500
14/11/14,Fri 48.442622 48.286795 48.744533 48.101748 29081700
14/11/13,Thu 47.536879 48.316012 48.354970 47.439485 26208800
15/04/29,Wed 48.088304 48.423896 48.670655 47.871156 47804600
15/04/28,Tue 47.160490 48.522598 48.571949 47.081529 60730800
- Python 初识Pandas- Python Data Analysis Library
- Python Data Analysis Library(一)--pandas
- pandas: powerful Python data analysis toolkit
- python for data analysis
- python Data analysis function
- Python For Data Analysis笔记
- Python for Data Analysis (1)
- Python for Data Analysis (2)
- Python for Data Analysis (3)
- Python for Data Analysis(4)
- Python for Data Analysis (5)
- Python for Data Analysis (6)
- Python for Data Analysis (7)
- Python for Data Analysis (8)
- Python for Data Analysis (9)
- Python for Data Analysis (10)
- Python for Data Analysis (11)
- [Python for data Analysis]Python Basic
- package 包 , jar 包
- android 源代码查看工具:android studio
- Content Addressed Storage (CAS)
- 让动画不再僵硬:Facebook Rebound Android动画库介绍
- NYOJ——62 笨小熊
- Python 初识Pandas- Python Data Analysis Library
- 快速集成微信支付和支付宝支付
- automaticallyAdjustsScrollViewInsets
- Python和C++的多继承的一次比较
- 1018. 锤子剪刀布 (20)
- c语言学习笔记(8)位运算符,++,--运算符的用法
- 使用链表实现堆栈
- Python-other-1
- storm安装小记