pandas排序与统计

来源：互联网发布：python 元组添加元素编辑：程序博客网时间：2024/06/14 16:18

《Python for Data Analysis》

排序

`sort_index()`

对行或列索引进行排序

In [1]: import pandas as pdIn [2]: from pandas import DataFrame, SeriesIn [3]: obj = Series(range(4), index=['d','a','b','c'])In [4]: objOut[4]:d    0a    1b    2c    3dtype: int64In [5]: obj.sort_index()Out[5]:a    1b    2c    3d    0dtype: int64In [6]: import numpy as npIn [8]: frame = DataFrame(np.arange(8).reshape((2,4)), index=['three','one'],   ...:                   columns=['d','a','b','c'])In [9]: frameOut[9]:       d  a  b  cthree  0  1  2  3one    4  5  6  7In [10]: frame.sort_index()Out[10]:       d  a  b  cone    4  5  6  7three  0  1  2  3In [11]: frame.sort_index(axis=1)Out[11]:       a  b  c  dthree  1  2  3  0one    5  6  7  4In [12]: frame.sort_index(axis=1, ascending=False)Out[12]:       d  c  b  athree  0  3  2  1one    4  7  6  5

`sort_values`

对Series按值进行排序, 排序时，任何缺失值默认都会被放到Series的末尾。

In [18]: obj = Series([4, np.nan, 6, np.nan, -3, 2])In [19]: objOut[19]:0    4.01    NaN2    6.03    NaN4   -3.05    2.0dtype: float64In [21]: obj.sort_values()Out[21]:4   -3.05    2.00    4.02    6.01    NaN3    NaNdtype: float64

在DataFrame上，根据一个或多个列中的值进行排序。将一个或多个列的名字传递给by选项即可达到该目的：

In [16]: frame.sort_values(by='b')Out[16]:       d  a  b  cthree  0  1  2  3one    4  5  6  7

汇总和统计

sum、mean、max

约简方法的选项

选项说明 axis 约简的轴。DataFrame的行用0，列用1 skipna 排除缺失值，默认值为True level 如果轴是层次化索引的（MiltiIndex）,根据level分组约简。

间接统计

idxmin, idxmax：达到最小值或最大值的索引。

累积型

cumsum

针对列进行汇总统计

df.describe:数值型和非数值型不同。

唯一值、值计数以及成员资格

unique:可以得到Series中的唯一值数组。

isin：用于判断矢量化集合的成员资格

value_counts:用于计算一个Series中各值出现的概率。

2 0