【跟着stackoverflow学Pandas】- 删除带有NaN的行

来源:互联网 发布:手机淘宝链接怎么弄 编辑:程序博客网 时间:2024/05/17 15:21

最近做一个系列博客,跟着stackoverflow学Pandas。

专栏地址:http://blog.csdn.net/column/details/16726.html

以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序:
https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15

How to drop rows of Pandas DataFrame whose value in certain columns is NaN - 删除带有NaN的行

数据准备

我们随机生成了10x3列的数据,然后针对某些数据赋值 NaN。

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(10,3), columns=['col1', 'col2', 'col3'])df.iloc[::2,0] = np.nandf.iloc[::4,1] = np.nandf.iloc[::3,2] = np.nanprint df#        col1      col2      col3# 0       NaN       NaN       NaN# 1 -0.498336 -0.960804  0.705309# 2       NaN -2.120032  2.123329# 3  0.791883 -0.283840       NaN# 4       NaN       NaN -1.241788# 5 -0.399644 -0.968515 -1.509056# 6       NaN  0.897637       NaN# 7  1.826128  1.015091 -0.497022# 8       NaN       NaN -1.889871# 9  0.379287 -1.762229       NaN

pandas.notnull

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.notnull.html

可以接受Series 或者 DataFrame 类型的数据

pandas.notnull 被设计用来取代 np.isfinite / numpy.isnan

pd.notnull(df['col1'])# 0    False# 1     True# 2    False# 3     True# 4    False# 5     True# 6    False# 7     True# 8    False# 9     True# Name: col1, dtype: boolprint pd.notnull(df)#     col1   col2   col3# 0  False  False  False# 1   True   True   True# 2  False   True   True# 3   True   True  False# 4  False  False   True# 5   True   True   True# 6  False   True  False# 7   True   True   True# 8  False  False   True# 9   True   True  False

np.isfinite / numpy.isnan

np.isfinite 会对数据进行判断,如果是有限数据返回True。我们可以通过对不同列的bool值组合来满足我们的取值要求。
numpy.isnan 判断是否是NaN

np.isfinite(df['col1'])# 1    True# 3    True# 5    True# 7    True# 9    True# Name: col1, dtype: booldf1 = df[np.isfinite(df['col1'])]print df1#        col1      col2      col3# 1 -0.498336 -0.960804  0.705309# 3  0.791883 -0.283840       NaN# 5 -0.399644 -0.968515 -1.509056# 7  1.826128  1.015091 -0.497022# 9  0.379287 -1.762229       NaN

drop

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

drop 可以接受多个参数:

axis : {0 or ‘index’, 1 or ‘columns’}, or tuple/list thereof
Pass tuple or list to drop on multiple axes

how : {‘any’, ‘all’}
any : if any NA values are present, drop that label
all : if all values are NA, drop that label

thresh : int, default None
int value : require that many non-NA values

subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include

inplace : boolean, default False
If True, do operation inplace and return None.

# 默认是删除有NaN的行print df.dropna() #        col1      col2      col3# 1  1.944899 -1.792510 -0.612904# 5 -0.609380  1.087689 -1.145582# 7 -2.045037  1.043837  0.429135print df.dropna(how='all')  #删除全部是NaN的行#        col1      col2      col3# 1  1.944899 -1.792510 -0.612904# 2       NaN  0.780487 -1.239197# 3 -1.050320 -0.121033       NaN# 4       NaN       NaN -0.537213# 5 -0.609380  1.087689 -1.145582# 6       NaN -0.721761       NaN# 7 -2.045037  1.043837  0.429135# 8       NaN       NaN -0.096989# 9  1.514520  0.224193       NaN

更多的可以参考,drop的官方说明。

阅读全文
0 0
原创粉丝点击