padas数据结构:DataFrame

来源:互联网 发布:vb.net 添加控件 编辑:程序博客网 时间:2024/06/06 09:14

padas数据结构:DataFrame

文档地址:
http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe

import pandas as pduser1 = pd.Series(["jack","男",22], index=["name","sex","age"])user2 = pd.Series(["lily","女",21], index=["name","sex","age"])users = pd.DataFrame({101:user1, 102:user2})print(users)"""       101   102name  jack  lilysex      男     女age     22    21"""print(users.index) # Index(['name', 'sex', 'age'], dtype='object')print(users.columns) # Int64Index([101, 102], dtype='int64')
# 还可以这样user_list = [("jack","男",22), ("lily","女",21)]users = pd.DataFrame(user_list)print(users)"""      0  1   20  jack  男  221  lily  女  21"""

设置列名:

user_list = [("jack","男",22), ("lily","女",21)]users = pd.DataFrame(user_list,columns=["name","sex","age"]) # 给设置列名print(users)"""   name sex  age0  jack   男   221  lily   女   21"""

查看类型:

print(type(users)) # <class 'pandas.core.frame.DataFrame'>

案列:基金指定月份同比数据对比

用我们前面抓取的基金数据,取2份数据。
比如:
2016年6月份的数据:print(fund.loc['2016-06'])
2017年6月份的数据:print(fund.loc['2017-06'])

实际上我们只要NAV(单位净值)的比较,于是乎代码就变成了这样:

fund.loc["2016-06",["NAV"]]fund.loc["2017-06",["NAV"]]

接下来就是把这2个DataFrame数据连接在一起(并集),连接才方便比较啊,但是我们发现这两者的时间并不是一致的,而且是不连续的。

完整代码:

# coding: utf-8import pandas as pdfund = pd.read_csv("./csv/519961.csv", dtype={"fcode": pd.np.str_}, index_col="fdate", parse_dates=["fdate"])#print(fund)f2016 = fund.loc["2016-06",["NAV"]].reset_index()f2017 = fund.loc["2017-06",["NAV"]].reset_index()# 计算6月的连续日期allDates201606 = pd.DataFrame(pd.date_range("2016-06","2016-07",closed="left"),columns=["fdate"])# 6月每天 去 和 当天数据 合并f2016_data = pd.merge(allDates201606,f2016,how="left",on=["fdate"])print(f2016_data)"""        fdate    NAV0  2016-06-01  0.9911  2016-06-02  0.9912  2016-06-03  0.9913  2016-06-04    NaN4  2016-06-05    NaN5  2016-06-06  0.9916  2016-06-07  0.9877  2016-06-08  0.9878  2016-06-09    NaN9  2016-06-10    NaN10 2016-06-11    NaN11 2016-06-12    NaN12 2016-06-13  0.98613 2016-06-14  0.98714 2016-06-15  0.98815 2016-06-16  0.98816 2016-06-17  0.98817 2016-06-18    NaN18 2016-06-19    NaN19 2016-06-20  0.98820 2016-06-21  0.98921 2016-06-22  0.98922 2016-06-23  0.98923 2016-06-24  0.98924 2016-06-25    NaN25 2016-06-26    NaN26 2016-06-27  0.99027 2016-06-28  0.99128 2016-06-29  0.99229 2016-06-30  0.992"""allDates201706 = pd.DataFrame(pd.date_range("2017-06","2017-07",closed="left"),columns=["fdate"])# 6月每天 去 和 当天数据 合并f2017_data = pd.merge(allDates201706,f2017,how="left")print(f2017_data)"""        fdate    NAV0  2017-06-01  0.9911  2017-06-02  0.9902  2017-06-03    NaN3  2017-06-04    NaN4  2017-06-05  0.9895  2017-06-06  0.9906  2017-06-07  0.9947  2017-06-08  0.9958  2017-06-09  0.9959  2017-06-10    NaN10 2017-06-11    NaN11 2017-06-12  0.99512 2017-06-13  0.99613 2017-06-14  0.99414 2017-06-15  0.99415 2017-06-16  0.99316 2017-06-17    NaN17 2017-06-18    NaN18 2017-06-19  0.99419 2017-06-20  0.99520 2017-06-21  0.99621 2017-06-22  0.99522 2017-06-23  0.99423 2017-06-24    NaN24 2017-06-25    NaN25 2017-06-26  0.99526 2017-06-27  0.99327 2017-06-28  0.99828 2017-06-29  0.99829 2017-06-30  0.999"""# 最后合并2个月的# 2016年6的数据 和2017年6月的数据 连接result = pd.concat([f2016_data,f2017_data], axis=1)print(result)"""        fdate    NAV      fdate    NAV0  2016-06-01  0.991 2017-06-01  0.9911  2016-06-02  0.991 2017-06-02  0.9902  2016-06-03  0.991 2017-06-03    NaN3  2016-06-04    NaN 2017-06-04    NaN4  2016-06-05    NaN 2017-06-05  0.9895  2016-06-06  0.991 2017-06-06  0.9906  2016-06-07  0.987 2017-06-07  0.9947  2016-06-08  0.987 2017-06-08  0.9958  2016-06-09    NaN 2017-06-09  0.9959  2016-06-10    NaN 2017-06-10    NaN10 2016-06-11    NaN 2017-06-11    NaN11 2016-06-12    NaN 2017-06-12  0.99512 2016-06-13  0.986 2017-06-13  0.99613 2016-06-14  0.987 2017-06-14  0.99414 2016-06-15  0.988 2017-06-15  0.99415 2016-06-16  0.988 2017-06-16  0.99316 2016-06-17  0.988 2017-06-17    NaN17 2016-06-18    NaN 2017-06-18    NaN18 2016-06-19    NaN 2017-06-19  0.99419 2016-06-20  0.988 2017-06-20  0.99520 2016-06-21  0.989 2017-06-21  0.99621 2016-06-22  0.989 2017-06-22  0.99522 2016-06-23  0.989 2017-06-23  0.99423 2016-06-24  0.989 2017-06-24    NaN24 2016-06-25    NaN 2017-06-25    NaN25 2016-06-26    NaN 2017-06-26  0.99526 2016-06-27  0.990 2017-06-27  0.99327 2016-06-28  0.991 2017-06-28  0.99828 2016-06-29  0.992 2017-06-29  0.99829 2016-06-30  0.992 2017-06-30  0.999"""