pandas time series/data functionality

来源:互联网 发布:java形参实参 编辑:程序博客网 时间:2024/06/05 15:38
# 72 hours starting with midnight Jan 1st, 2011In [1]: rng = date_range(’1/1/2011’, periods=72, freq=’H’)In [3]: ts = Series(randn(len(rng)), index=rng)In [4]: ts.head()Out[4]:2011-01-01 00:00:00 0.4691122011-01-01 01:00:00 -0.2828632011-01-01 02:00:00 -1.5090592011-01-01 03:00:00 -1.1356322011-01-01 04:00:00 1.212112Freq: H, dtype: float64

Change frequency and fill gaps:

# to 45 minute frequency and forward fillIn [5]: converted = ts.asfreq(’45Min’, method=’pad’)In [6]: converted.head()Out[6]:2011-01-01 00:00:00 0.4691122011-01-01 00:45:00 0.4691122011-01-01 01:30:00 -0.2828632011-01-01 02:15:00 -1.5090592011-01-01 03:00:00 -1.135632Freq: 45T, dtype: float64

Resample:

# Daily meansIn [7]: ts.resample(’D’, how=’mean’)Out[7]:2011-01-01 -0.3195692011-01-02 -0.3377032011-01-03 0.117258Freq: D, dtype: float64

Converting to Timestamps

In [16]: to_datetime(Series([’Jul 31, 2009’, ’2010-01-10’, None]))Out[16]:0 2009-07-311 2010-01-102 NaTdtype: datetime64[ns]In [17]: to_datetime([’2005/11/23’, ’2010.12.31’])Out[17]:<class ’pandas.tseries.index.DatetimeIndex’>[2005-11-23, 2010-12-31]Length: 2, Freq: None, Timezone: None

If you use dates which start with the day first (i.e. European style), you can pass the dayfirst flag:

In [18]: to_datetime([’04-01-2012 10:00’], dayfirst=True)Out[18]:<class ’pandas.tseries.index.DatetimeIndex’>[2012-01-04 10:00:00]Length: 1, Freq: None, Timezone: NoneIn [19]: to_datetime([’14-01-2012’, ’01-14-2012’], dayfirst=True)Out[19]:<class ’pandas.tseries.index.DatetimeIndex’>[2012-01-14, 2012-01-14]Length: 2, Freq: None, Timezone: None

Invalid Data

In [20]: to_datetime([’2009-07-31’, ’asd’])Out[20]: array([’2009-07-31’, ’asd’], dtype=object)In [21]: to_datetime([’2009-07-31’, ’asd’], coerce=True)Out[21]:<class ’pandas.tseries.index.DatetimeIndex’>[2009-07-31, NaT]Length: 2, Freq: None, Timezone: None

Generating Ranges of Timestamps

In [27]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]In [28]: index = DatetimeIndex(dates)In [29]: index # Note the frequency informationOut[29]:<class ’pandas.tseries.index.DatetimeIndex’>[2012-05-01, ..., 2012-05-03]Length: 3, Freq: None, Timezone: None
In [32]: index = date_range(’2000-1-1’, periods=1000, freq=’M’)In [33]: indexOut[33]:<class ’pandas.tseries.index.DatetimeIndex’>[2000-01-31, ..., 2083-04-30]Length: 1000, Freq: M, Timezone: NoneIn [34]: index = bdate_range(’2012-1-1’, periods=250)In [35]: indexOut[35]:<class ’pandas.tseries.index.DatetimeIndex’>[2012-01-02, ..., 2012-12-14]Length: 250, Freq: B, Timezone: None
In [36]: start = datetime(2011, 1, 1)In [37]: end = datetime(2012, 1, 1)In [38]: rng = date_range(start, end)In [39]: rngOut[39]:<class ’pandas.tseries.index.DatetimeIndex’>[2011-01-01, ..., 2012-01-01]Length: 366, Freq: D, Timezone: NoneIn [40]: rng = bdate_range(start, end)In [41]: rngOut[41]:<class ’pandas.tseries.index.DatetimeIndex’>[2011-01-03, ..., 2011-12-30]Length: 260, Freq: B, Timezone: None
In [42]: date_range(start, end, freq=’BM’)Out[42]:<class ’pandas.tseries.index.DatetimeIndex’>[2011-01-31, ..., 2011-12-30]Length: 12, Freq: BM, Timezone: NoneIn [43]: date_range(start, end, freq=’W’)Out[43]:<class ’pandas.tseries.index.DatetimeIndex’>[2011-01-02, ..., 2012-01-01]Length: 53, Freq: W-SUN, Timezone: NoneIn [44]: bdate_range(end=end, periods=20)Out[44]:<class ’pandas.tseries.index.DatetimeIndex’>[2011-12-05, ..., 2011-12-30]Length: 20, Freq: B, Timezone: NoneIn [45]: bdate_range(start=start, periods=20)Out[45]:<class ’pandas.tseries.index.DatetimeIndex’>[2011-01-03, ..., 2011-01-28]Length: 20, Freq: B, Timezone: None

DatetimeIndex

In [56]: dft = DataFrame(randn(100000,1),columns=[’A’],index=date_range(’20130101’,periods=100000,freq='h'))In [57]: dftOut[57]:A2013-01-01 00:00:00 0.1764442013-01-01 00:01:00 0.4033102013-01-01 00:02:00 -0.1549512013-01-01 00:03:00 0.3016242013-01-01 00:04:00 -2.1798612013-01-01 00:05:00 -1.3698492013-01-01 00:06:00 -0.954208... ...2013-03-11 10:33:00 -0.2930832013-03-11 10:34:00 -0.0598812013-03-11 10:35:00 1.2524502013-03-11 10:36:00 0.0466112013-03-11 10:37:00 0.0594782013-03-11 10:38:00 -0.2865392013-03-11 10:39:00 0.841669[100000 rows x 1 columns]In [58]: dft[’2013’]Out[58]:A2013-01-01 00:00:00 0.1764442013-01-01 00:01:00 0.4033102013-01-01 00:02:00 -0.1549512013-01-01 00:03:00 0.3016242013-01-01 00:04:00 -2.1798612013-01-01 00:05:00 -1.3698492013-01-01 00:06:00 -0.954208... ...2013-03-11 10:33:00 -0.2930832013-03-11 10:34:00 -0.0598812013-03-11 10:35:00 1.2524502013-03-11 10:36:00 0.0466112013-03-11 10:37:00 0.0594782013-03-11 10:38:00 -0.2865392013-03-11 10:39:00 0.841669[100000 rows x 1 columns]In [59]: dft[’2013-1’:’2013-2’]Out[59]:A2013-01-01 00:00:00 0.1764442013-01-01 00:01:00 0.4033102013-01-01 00:02:00 -0.1549512013-01-01 00:03:00 0.3016242013-01-01 00:04:00 -2.1798612013-01-01 00:05:00 -1.3698492013-01-01 00:06:00 -0.954208... ...2013-02-28 23:53:00 0.1031142013-02-28 23:54:00 -1.3034222013-02-28 23:55:00 0.4519432013-02-28 23:56:00 0.2205342013-02-28 23:57:00 -1.6242202013-02-28 23:58:00 0.0939152013-02-28 23:59:00 -1.087454[84960 rows x 1 columns]In [60]: dft[’2013-1’:’2013-2-28’]Out[60]:A2013-01-01 00:00:00 0.1764442013-01-01 00:01:00 0.4033102013-01-01 00:02:00 -0.1549512013-01-01 00:03:00 0.3016242013-01-01 00:04:00 -2.1798612013-01-01 00:05:00 -1.3698492013-01-01 00:06:00 -0.954208... ...2013-02-28 23:53:00 0.1031142013-02-28 23:54:00 -1.3034222013-02-28 23:55:00 0.4519432013-02-28 23:56:00 0.2205342013-02-28 23:57:00 -1.6242202013-02-28 23:58:00 0.0939152013-02-28 23:59:00 -1.087454[84960 rows x 1 columns]In [61]: dft[’2013-1’:’2013-2-28 00:00:00’]Out[61]:A2013-01-01 00:00:00 0.1764442013-01-01 00:01:00 0.4033102013-01-01 00:02:00 -0.1549512013-01-01 00:03:00 0.3016242013-01-01 00:04:00 -2.1798612013-01-01 00:05:00 -1.3698492013-01-01 00:06:00 -0.954208... ...2013-02-27 23:54:00 0.8970512013-02-27 23:55:00 -0.3092302013-02-27 23:56:00 1.9447132013-02-27 23:57:00 0.3692652013-02-27 23:58:00 0.0530712013-02-27 23:59:00 -0.0197342013-02-28 00:00:00 1.388189[83521 rows x 1 columns]In [62]: dft[’2013-1-15’:’2013-1-15 12:30:00’]Out[62]:A2013-01-15 00:00:00 0.5012882013-01-15 00:01:00 -0.6051982013-01-15 00:02:00 0.2151462013-01-15 00:03:00 0.9247322013-01-15 00:04:00 -2.2285192013-01-15 00:05:00 1.5173312013-01-15 00:06:00 -1.188774... ...2013-01-15 12:24:00 1.3583142013-01-15 12:25:00 -0.7377272013-01-15 12:26:00 1.8383232013-01-15 12:27:00 -0.7740902013-01-15 12:28:00 0.6222612013-01-15 12:29:00 -0.6316492013-01-15 12:30:00 0.193284[751 rows x 1 columns]

Datetime Indexing

In [64]: dft[datetime(2013, 1, 1):datetime(2013,2,28)]Out[64]:                    A2013-01-01 00:00:00 0.1764442013-01-01 00:01:00 0.4033102013-01-01 00:02:00 -0.1549512013-01-01 00:03:00 0.3016242013-01-01 00:04:00 -2.1798612013-01-01 00:05:00 -1.3698492013-01-01 00:06:00 -0.954208... ...2013-02-27 23:54:00 0.8970512013-02-27 23:55:00 -0.3092302013-02-27 23:56:00 1.9447132013-02-27 23:57:00 0.3692652013-02-27 23:58:00 0.0530712013-02-27 23:59:00 -0.0197342013-02-28 00:00:00 1.388189[83521 rows x 1 columns]In [65]: dft[datetime(2013, 1, 1, 10, 12, 0):datetime(2013, 2, 28, 10, 12, 0)]Out[65]:A2013-01-01 10:12:00 -0.2467332013-01-01 10:13:00 -1.4292252013-01-01 10:14:00 -1.2653392013-01-01 10:15:00 0.7109862013-01-01 10:16:00 -0.8182002013-01-01 10:17:00 0.5435422013-01-01 10:18:00 1.577713... ...2013-02-28 10:06:00 0.3112492013-02-28 10:07:00 2.3660802013-02-28 10:08:00 -0.4903722013-02-28 10:09:00 0.3733402013-02-28 10:10:00 0.6384422013-02-28 10:11:00 1.3301352013-02-28 10:12:00 -0.945450[83521 rows x 1 columns]

DateOffset objects

In [70]: from pandas.tseries.offsets import *In [71]: d + DateOffset(months=4, days=5)Out[71]: Timestamp(’2008-12-23 09:00:00’)
0 0
原创粉丝点击