pandas做数据分析(五):统计相关函数
来源:互联网 发布:男士保湿 知乎 编辑:程序博客网 时间:2024/05/10 03:04
一.计数操作
1.pandas.Series.value_counts
Series.value_counts(normalize=False,sort=True,ascending=False, bins=None, dropna=True)
作用:返回一个包含值和该值出现次数的Series对象,次序按照出现的频率由高到低排序.
参数:
normalize : 布尔值,默认为False,如果是True的话,就会包含该值出现次数的频率.
sort : 布尔值,默认为True.排序控制.
ascending : 布尔值,默认为False,以升序排序
bins : integer, optional
Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data
dropna : 布尔型,默认为True,表示不包括NaN
2.pandas.DataFrame.count
DataFrame.count(axis=0, level=None, numeric_only=False)
Return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)
Parameters:
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame
numeric_only : boolean, default False
Include only float, int, boolean data
Returns:
count : Series (or DataFrame if level specified)
二.最大最小值
三.标准统计函数
1.pandas.DataFrame.sum
返回指定轴上值的和.
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
参数:
axis : {index (0), columns (1)}
skipna : 布尔值,默认为True.表示跳过NaN值.如果整行/列都是NaN,那么结果也就是NaN
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
Returns:
sum : Series or DataFrame (if level specified)
import numpy as npimport pandas as pddf=pd.DataFrame(data=[[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]], index=["a","b","c","d"], columns=["one","two"])print("df:")print(df)#直接使用sum()方法,返回一个列求和的Series,自动跳过NaN值print("df.sum()")print(df.sum())#当轴为1.就会按行求和print("df.sum(axis=1)")print(df.sum(axis=1))#选择skipna=False可以禁用跳过Nan值print("df.sum(axis=1,skipna=False):")print(df.sum(axis=1,skipna=False))
结果:
2.pandas.DataFrame.mean
返回指定轴上值的平均数.
DataFrame.mean(axis=None,skipna=None,level=None,numeric_only=None, **kwargs)
参数:
axis : {index (0), columns (1)}
skipna :布尔值,默认为True.表示跳过NaN值.如果整行/列都是NaN,那么结果也就是NaN
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
例子:
import numpy as npimport pandas as pddf=pd.DataFrame(data=[[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]], index=["a","b","c","d"], columns=["one","two"])print("df:")print(df)#直接使用mean()方法,返回一个列求平均数的Series,自动跳过NaN值print("df.mean()")print(df.mean())#当轴为1.就会按行求平均数print("df.mean(axis=1)")print(df.mean(axis=1))#选择skipna=False可以禁用跳过Nan值print("df.mean(axis=1,skipna=False):")print(df.mean(axis=1,skipna=False))
结果:
四.排序
1.pandas.DataFrame.sort_values
DataFrame.sort_values(by,axis=0,ascending=True,inplace=False, kind='quicksort', na_position='last')
Sort by the values along either axis
参数:
by : str or list of str
Name or list of names which refer to the axis items.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Axis to direct sorting
ascending : bool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
inplace : bool, default False
if True, perform operation in-place
kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
na_position : {‘first’, ‘last’}, default ‘last’
first puts NaNs at the beginning, last puts NaNs at the end
Returns:
sorted_obj : DataFrame
- pandas做数据分析(五):统计相关函数
- Python数据分析模块 | pandas做数据分析(三):统计相关函数
- pandas做数据分析(四):常用函数
- 利用Python数据分析:pandas入门(五)
- 入手pandas分析统计
- 数据分析之Pandas-03绘图函数
- 数据分析处理库Pandas-常用函数
- pandas 数据统计
- Pandas 描述统计函数
- Pandas统计特征函数
- pandas做数据分析(一):基本数据对象
- 用Python做数据分析:Pandas常用数据查询语法
- 数据分析之Pandas(三):汇总、统计、相关系数和协方差
- 利用Python进行数据分析(五)之pandas入门
- pandas做数据分析(三):常用预处理操作
- 用python做数据分析|pandas库:DataFrame基本操作
- 使用Bitmap做数据分析统计
- Python数据分析模块 | pandas做数据分析(一):基本数据对象
- 如何解决chrome flash 过期
- 找石油
- 构造函数与析构函数
- 数字段计数
- TCP流量控制与拥塞控制
- pandas做数据分析(五):统计相关函数
- 手机服务器的微架构(wifi互传)
- JSP的动作(一)
- 【数据迁移】Mysql 在线数据迁移的一点想法
- ImageMagic identify 获取图片尺寸
- java基础(1)
- 【解决方案】jquery circliful插件的边缘模糊问题
- Linux面试题&答案03
- StringBuffer类