pandas官方文档cookbook(4)中Arithmetic&Slicing&Sorting翻译

来源:互联网 发布:seo网络优化培训 编辑:程序博客网 时间:2024/06/11 20:35

文档版本:0.20.3
这些例子是用python3.4写出来的。对于较早的python版本需要对代码做些相应的调整。
Pandas(pd)和Numpy(np)是唯一两个默认导入的包。其余的包会显示导入给新用户看。
若有翻译不当的地方,请多多指教。

这份文档中的例子都是从Stack-Overflow和Github中别人提问的比较经典的问题,作者从中进行提炼与总结。

Arithmetic

对多重索引执行算法需要进行广播

In [61]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']])In [62]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); dfOut[62]:       A                   B                   C                O         I         O         I         O         In  1.920906 -0.388231 -2.314394  0.665508  0.402562  0.399555m -1.765956  0.850423  0.388054  0.992312  0.744086 -0.739776In [63]: df = df.div(df['C'],level=1); dfOut[63]:       A                   B              C           O         I         O         I    O    In  4.771702 -0.971660 -5.749162  1.665625  1.0  1.0m -2.373321 -1.149568  0.521518 -1.341367  1.0  1.0

切片

用xs函数对多重索引进行切片

In [64]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')]In [65]: index = pd.MultiIndex.from_tuples(coords)In [66]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); dfOut[66]:     MyDataAA one      11   six      22BB one      33   two      44   six      55

获取第一水平和第一个轴的交叉部分

In [67]: df.xs('BB',level=0,axis=0)  #Note : level and axis are optional, and default to zeroOut[67]:      MyDataone      33two      44six      55

获取第二水平和第一个轴的交叉部分

In [68]: df.xs('six',level=1,axis=0)Out[68]:     MyDataAA      22BB      55

用xs函数对多重索引进行切片方法二

In [69]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci']))In [70]: headr = list(itertools.product(['Exams','Labs'],['I','II']))In [71]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course'])In [72]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-namedIn [73]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)]In [74]: df = pd.DataFrame(data,indx,cols); dfOut[74]:            Exams     Labs                   I  II    I  IIStudent Course                   Ada     Comp      70  71   72  73        Math      71  73   75  74        Sci       72  75   75  75Quinn   Comp      73  74   75  76        Math      74  76   78  77        Sci       75  78   78  78Violet  Comp      76  77   78  79        Math      77  79   81  80        Sci       78  81   81  81In [75]: All = slice(None)In [76]: df.loc['Violet']Out[76]:        Exams     Labs               I  II    I  IICourse                   Comp      76  77   78  79Math      77  79   81  80Sci       78  81   81  81In [77]: df.loc[(All,'Math'),All]Out[77]:            Exams     Labs                   I  II    I  IIStudent Course                   Ada     Math      71  73   75  74Quinn   Math      74  76   78  77Violet  Math      77  79   81  80In [78]: df.loc[(slice('Ada','Quinn'),'Math'),All]Out[78]:                Exams     Labs                       I  II    I  IIStudent Course                   Ada     Math      71  73   75  74Quinn   Math      74  76   78  77In [79]: df.loc[(All,'Math'),('Exams')]Out[79]:                  I  IIStudent Course        Ada     Math    71  73Quinn   Math    74  76Violet  Math    77  79In [80]: df.loc[(All,'Math'),(All,'II')]Out[80]:                Exams Labs              II   IIStudent Course           Ada     Math      73   74Quinn   Math      76   77Violet  Math      79   80

排序

在多重索引中用某一列进行排序

In [81]: df.sort_values(by=('Labs', 'II'), ascending=False)Out[81]:                Exams     Labs                       I  II    I  IIStudent Course                   Violet  Sci       78  81   81  81        Math      77  79   81  80        Comp      76  77   78  79Quinn   Sci       75  78   78  78        Math      74  76   78  77        Comp      73  74   75  76Ada     Sci       72  75   75  75        Math      71  73   75  74        Comp      70  71   72  73
阅读全文
0 0
原创粉丝点击