pandas数据操作

来源:互联网 发布:android图案解锁源码 编辑:程序博客网 时间:2024/06/18 13:46

pandas数据操作

字符串方法

Series对象在其str属性中配备了一组字符串处理方法,可以很容易的应用到数组中的每个元素

import numpy as npimport pandas as pdt = pd.Series(['a_b_c_d','c_d_e',np.nan,'f_g_h'])t
0    a_b_c_d1      c_d_e2        NaN3      f_g_hdtype: object
t.str.cat(['A','B','C','D'],sep=',') # 拼接字符串
0    a_b_c_d,A1      c_d_e,B2          NaN3      f_g_h,Ddtype: object
t.str.split('_') # 切分字符串
0    [a, b, c, d]1       [c, d, e]2             NaN3       [f, g, h]dtype: object
t.str.get(0) # 获取指定位置的字符串
0      a1      c2    NaN3      fdtype: object
t.str.replace("_", ".") # 替换字符串
0    a.b.c.d1      c.d.e2        NaN3      f.g.hdtype: object
t.str.pad(10, fillchar="?") # 左补齐
0    ???a_b_c_d1    ?????c_d_e2           NaN3    ?????f_g_hdtype: object
t.str.pad(10, side="right", fillchar="?") # 右补齐
0    a_b_c_d???1    c_d_e?????2           NaN3    f_g_h?????dtype: object
t.str.center(10, fillchar="?") #中间补齐
0    ?a_b_c_d??1    ??c_d_e???2           NaN3    ??f_g_h???dtype: object
t.str.find('d') # 查找给定字符串的位置,左边开始
0    6.01    2.02    NaN3   -1.0dtype: float64
t.str.rfind('d') # 查找给定字符串的位置,右边开始
0    6.01    2.02    NaN3   -1.0dtype: float64

数据转置(行列转换)

dates = pd.date_range('20130101',periods=10)dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',               '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',               '2013-01-09', '2013-01-10'],              dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D'])df.head()
A B C D 2013-01-01 -0.665173 0.516813 0.745156 -0.303295 2013-01-02 -0.953574 2.125147 0.238382 -0.400209 2013-01-03 -0.233966 2.066662 0.331000 -2.802471 2013-01-04 2.038273 0.982127 -1.096000 -1.051818 2013-01-05 -1.438657 -1.208042 -0.375673 0.384522
df.head().T # 行列转换
2013-01-01 00:00:00 2013-01-02 00:00:00 2013-01-03 00:00:00 2013-01-04 00:00:00 2013-01-05 00:00:00 A -0.665173 -0.953574 -0.233966 2.038273 -1.438657 B 0.516813 2.125147 2.066662 0.982127 -1.208042 C 0.745156 0.238382 0.331000 -1.096000 -0.375673 D -0.303295 -0.400209 -2.802471 -1.051818 0.384522

对数据应用function

df.head().apply(np.cumsum) # cumsum 累加
A B C D 2013-01-01 -0.665173 0.516813 0.745156 -0.303295 2013-01-02 -1.618747 2.641960 0.983537 -0.703504 2013-01-03 -1.852713 4.708622 1.314537 -3.505975 2013-01-04 0.185560 5.690749 0.218537 -4.557793 2013-01-05 -1.253098 4.482707 -0.157135 -4.173271

频率

计算值出现的次数,类似直方图

s = pd.Series(np.random.randint(0, 7, size=10))s
0    31    32    13    64    35    36    57    28    19    0dtype: int32
s.value_counts()
3    41    26    15    12    10    1dtype: int64
原创粉丝点击