Pandas GroupBy对象

来源：互联网发布：知天气哪个版本好编辑：程序博客网时间：2024/06/01 22:40

创建GroupBy对象

GroupBy对象可以通过pandas.DataFrame.groupby(), pandas.Series.groupby()来创建。

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]

Parameters:

by : mapping, function, str, or iterable
axis : int, default 0
level : int, level name, or sequence of such, default None(复合索引的时候指定索引层级)
as_index : boolean, default True(by列当成索引)
sort : boolean, default True(排序)
group_keys : boolean, default True(?)
squeeze : boolean, default False(?)

Returns:

GroupBy object

索引与迭代

属性描述 GroupBy.iter() Groupby iterator GroupBy.groups dict {group name -> group labels} GroupBy.indices dict {group name -> group indices} GroupBy.get_group(name[, obj]) Constructs NDFrame from group with provided name Grouper([key, level, freq, axis, sort]) A Grouper allows the user to specify a groupby instruction for a target

函数应用（Function application）

函数应用经常结合numpy库与lamda来使用

GroupBy.apply(func, *args, **kwargs)
GroupBy.aggregate(func, *args, **kwargs)
聚合函数可以使用字符串简写相应的算法比如：
GroupBy.agg({“column1”:”sum”,”column2”:”mean”})
GroupBy.transform(func, *args, **kwargs)
filter

描述统计

数据框（DataFrame）与序列（Series）通用函数

Function Describe 统计函数 GroupBy.sum() 计算每组的和 GroupBy.ohlc() Compute sum of values, excluding missing values GroupBy.cumcount([ascending]) Number each item in each group from 0 to the length of that group - 1. GroupBy.mean(*args, **kwargs) 均值，不包含缺失值 GroupBy.prod() Compute prod of group values GroupBy.var([ddof]) 方差，不包含缺失值 GroupBy.std([ddof]) 标准差，不包含缺失值 GroupBy.sem([ddof]) 标准误，不包含缺失值描述函数 GroupBy.size() 组大小 GroupBy.count() 组元素个数，不包含缺失值 GroupBy.max() 组最大值 GroupBy.min() 组最小值 GroupBy.median() 组中间值索引函数 GroupBy.first() Compute first of group values GroupBy.head([n]) Returns first n rows of each group. GroupBy.last() Compute last of group values GroupBy.tail([n]) Returns last n rows of each group GroupBy.nth(n[, dropna]) 每组第n条数据

数据框（DataFrame）与序列（Series）不一致函数

Function Describe DataFrameGroupBy.agg(arg,?*args,?**kwargs) Aggregate using input function or dict of {column -> DataFrameGroupBy.all([axis,?bool_only,?…]) Return whether all elements are True over requested axis DataFrameGroupBy.any([axis,?bool_only,?…]) Return whether any element is True over requested axis DataFrameGroupBy.bfill([limit]) Backward fill the values DataFrameGroupBy.corr([method,?min_periods]) Compute pairwise correlation of columns, excluding NA/null values DataFrameGroupBy.count() Compute count of group, excluding missing values DataFrameGroupBy.cov([min_periods]) Compute pairwise covariance of columns, excluding NA/null values DataFrameGroupBy.cummax([axis,?skipna]) Return cumulative max over requested axis. DataFrameGroupBy.cummin([axis,?skipna]) Return cumulative minimum over requested axis. DataFrameGroupBy.cumprod([axis]) Cumulative product for each group DataFrameGroupBy.cumsum([axis]) Cumulative sum for each group DataFrameGroupBy.describe([percentiles,?…]) Generate various summary statistics, excluding NaN values. DataFrameGroupBy.diff([periods,?axis]) 1st discrete difference of object DataFrameGroupBy.ffill([limit]) Forward fill the values DataFrameGroupBy.fillna([value,?method,?…]) Fill NA/NaN values using the specified method DataFrameGroupBy.hist(data[,?column,?by,?…]) Draw histogram of the DataFrame’s series using matplotlib / pylab. DataFrameGroupBy.idxmax([axis,?skipna]) Return index of first occurrence of maximum over requested axis. DataFrameGroupBy.idxmin([axis,?skipna]) Return index of first occurrence of minimum over requested axis. DataFrameGroupBy.mad([axis,?skipna,?level]) Return the mean absolute deviation of the values for the requested axis DataFrameGroupBy.pct_change([periods,?…]) Percent change over given number of periods. DataFrameGroupBy.plot Class implementing the .plot attribute for groupby objects DataFrameGroupBy.quantile([q,?axis,?…]) Return values at the given quantile over requested axis, a la numpy.percentile. DataFrameGroupBy.rank([axis,?method,?…]) Compute numerical data ranks (1 through n) along axis. DataFrameGroupBy.resample(rule,?*args,?**kwargs) Provide resampling when using a TimeGrouper DataFrameGroupBy.shift([periods,?freq,?axis]) Shift each group by periods observations DataFrameGroupBy.size() Compute group sizes DataFrameGroupBy.skew([axis,?skipna,?level,?…]) Return unbiased skew over requested axis DataFrameGroupBy.take(indices[,?axis,?…]) Analogous to ndarray.take DataFrameGroupBy.tshift([periods,?freq,?axis]) Shift the time index, using the index’s frequency if available.

仅支持序列（Series）的函数

Function Describe SeriesGroupBy.nlargest(*args,?**kwargs) Return the largest?n?elements. SeriesGroupBy.nsmallest(*args,?**kwargs) Return the smallest?n?elements. SeriesGroupBy.nunique([dropna]) Returns number of unique elements in the group SeriesGroupBy.unique() Return np.ndarray of unique values in the object. SeriesGroupBy.value_counts([normalize,?…])

仅支持数据框（DataFrame）的函数

Function Describe DataFrameGroupBy.corrwith(other[,?axis,?drop]) Compute pairwise correlation between rows or columns of two DataFrame objects. DataFrameGroupBy.boxplot(grouped[,?…]) Make box plots from DataFrameGroupBy data.

0 0