【跟着stackoverflow学Pandas】- apply、applymap、map 三者使用差异

来源：互联网发布：床单跟床笠哪个好知乎编辑：程序博客网时间：2024/05/17 02:57

最近做一个系列博客，跟着stackoverflow学Pandas。

专栏地址：http://blog.csdn.net/column/details/16726.html

以 pandas作为关键词，在stackoverflow中进行搜索，随后安照 votes 数目进行排序：
https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15

Difference between map, applymap and apply methods in Pandas - map、apply、applymap 三者使用差异

https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas

数据准备

import pandas as pdimport numpy as npdf= pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

apply

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
apply 既可以操作 DataFrame数据，也可以操作Series数据。

func : functionFunction to apply to each column/row# 该参数必须是一个函数，这个函数的输入是dataframe的行或者列axis : {0 or ‘index’, 1 or ‘columns’}, default 00 or ‘index’: apply function to each column1 or ‘columns’: apply function to each row#对行、还是对列进行操作broadcast : boolean, default FalseFor aggregation functions, return object of same size with values propagatedraw : boolean, default FalseIf False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performancereduce : boolean or None, default NoneTry to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned.

f = lambda x: x.max() - x.min()df.apply(f) # 默认对行进行操作，获取每一行的最大值和最小值的差# col1    6.253621# col2    5.970929# col3    6.128654# dtype: float64

applymap

applymap 部分行、列，对所有元素进行操作。
操作对象可以是DataFrame 或者 Series

format = lambda x: '%.2f' % xprint df.applymap(format)#             b      d      e# Utah    -0.66   0.59   0.38# Ohio     1.65  -0.06  -1.24# Texas    0.62   0.03  -0.20# Oregon  -1.24   0.12  -1.10

map

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html

map 仅面向 Series 类型数据

format = lambda x: '%.2f' % xprint df['e'].map(format)# Utah       0.38# Ohio      -1.24# Texas     -0.20# Oregon    -1.10# Name: e, dtype: object

根据官方文档， map 还有很多特殊的用法。

数值替换

x = pd.Series([1,2,3], index=['one', 'two', 'three'])print x# one      1# two      2# three    3# dtype: int64y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])print y# 1    foo# 2    bar# 3    baz# dtype: objectx.map(y)# one      foo# two      bar# three    baz# dtype: objectz = {1: 'A', 2: 'B', 3: 'C'}x.map(z)# one      A# two      B# three    C# dtype: object

合并字符串

s2 = s.map('this is a string {}'.format, na_action=None)print s2# 0    this is a string 1.0# 1    this is a string 2.0# 2    this is a string 3.0# 3    this is a string nan# dtype: object# 忽略NaNs3 = s.map('this is a string {}'.format, na_action='ignore')print s3# 0    this is a string 1.0# 1    this is a string 2.0# 2    this is a string 3.0# 3                     NaN# dtype: object

阅读全文

0 0