pandas.DataFrame.describe

来源：互联网发布：东方财富网龙虎榜数据编辑：程序博客网时间：2024/04/30 19:30

file:///C:/Users/Administrator/Desktop/pandas/html/generated/pandas.DataFrame.describe.html?highlight=pandas

pandas.Series.describe

DataFrame.describe(percentiles=None, include=None, exclude=None)

生成描述性统计，总结数据集分布的中心趋势，分散和形状，不包括NaN值。

分析数字和对象系列，以及混合数据类型的DataFrame列集。输出将根据所提供的内容而有所不同。有关详细信息，请参阅下面的注释。

变量：

1、百分位数：数字列表，可选

输出中包含的百分位数。全部应该在0和1之间。默认值为[.25，.5，.75]，返回第25，第50和第75百分位数。

2、包括：'all'，dtypes的列表或无（默认），可选

要包括在结果中的白名单数据类型。忽略了系列。以下是选项：

l 'all'：输入的所有列都将包含在输出中。

l 类似dtypes的列表：将结果限制为提供的数据类型。将结果限制为数字类型，提交numpy.number。要将其限制为分类对象，请提交numpy.object数据类型。字符串也可以以select_dtypes的样式使用（例如，df.describe（include = ['O']））。

l 无（默认）：结果将包括所有数字列。

3、排除：类型为dtypes或None（默认），可选

从结果中忽略的黑名单数据类型。忽略了系列。以下是选项：

l 类似dtypes的列表：从结果中排除提供的数据类型。选择数字类型submit numpy.number。要选择分类对象，请提交数据类型numpy.object。字符串也可以以select_dtypes的样式使用（例如，df.describe（include = ['O']））。

l 无（默认）：结果将不排除任何内容。

总结：系列/ DataFrame的汇总统计

注释:

对于数值数据，结果的索引将包括计数，平均值，标准差，最小值，最大值以及较低的百分位数和50。默认情况下，较低的百分位数为25，较高的百分位数为75.50百分位数与中位数相同。

对于对象数据（例如字符串或时间戳），结果的索引将包括count，unique，top和freq。顶部是最常见的价值。频率是最常见的频率。时间戳还包括第一个和最后一个项目。

如果多个对象值具有最高的计数，则计数和顶部结果将从计数最高的那些中任意选择。

对于通过DataFrame提供的混合数据类型，默认值仅返回数字列的分析。如果include ='all'作为选项提供，则结果将包括每种类型的属性的并集。

可以使用include和exclude参数来限制DataFrame中哪些列被分析输出。分析系列时，参数将被忽略。

例子：

描述数字系列。

>>> s = pd.Series([1, 2, 3])

>>> s.describe()

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

描述一个分类系列。

>>> s = pd.Series(['a', 'a', 'b', 'c'])

>>> s.describe()

count 4

unique 3

top a

freq 2

dtype: object

描述时间戳系列。

>>> s = pd.Series([

... np.datetime64("2000-01-01"),

... np.datetime64("2010-01-01"),

... np.datetime64("2010-01-01")

... ])

>>> s.describe()

count 3

unique 2

top 2010-01-0100:00:00

freq 2

first 2000-01-01 00:00:00

last 2010-01-01 00:00:00

dtype: object

描述DataFrame。默认情况下只返回数字字段。

>>> df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],

... columns=['numeric', 'object'])

>>> df.describe()

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

描述DataFrame的所有列，而不管数据类型如何。

>>> df.describe(include='all')

numericobject

count 3.0 3

unique NaN 3

top NaN b

freq NaN 1

mean 2.0 NaN

std 1.0 NaN

min 1.0 NaN

25% 1.5 NaN

50% 2.0 NaN

75% 2.5 NaN

max 3.0 NaN

通过访问DataFrame作为属性来描述列。

>>> df.numeric.describe()

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

Name: numeric, dtype: float64

在DataFrame描述中仅包含数字列。

>>> df.describe(include=[np.number])

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

在DataFrame描述中只包含字符串列。

>>> df.describe(include=[np.object])

object

count 3

unique 3

top b

freq 1

从DataFrame描述中排除数字列。

>>> df.describe(exclude=[np.number])

object

count 3

unique 3

top b

freq 1

从DataFrame描述中排除对象列。

>>> df.describe(exclude=[np.object])

numeric

count 3.0

mean 2.0

std 1.0

min 1.0

25% 1.5

50% 2.0

75% 2.5

max 3.0

阅读全文

0 0