pandas入门——数据合并merge函数

来源:互联网 发布:淘宝好看的小白鞋 编辑:程序博客网 时间:2024/06/04 18:18

数据合并merge函数

  • 创建数据集
# 导入pandas和numpy包import pandas as pdimport numpy as np# 创建两个数据框df_left = pd.DataFrame(data=np.ones((5,6)),columns=["a","b","c","d","e","f"],index=["k1","k2","k3","k4","k5"])df_right = pd.DataFrame(data=np.ones((5,6))*2,columns=["e","f","g","h","j","k"],index=["k3","k4","k5","k6","k7"])df_left["key1"] = ["k1","k0","k0","k1","k1"]df_left["key2"] = ["k0","k0","k1","k1","k0"]df_right["key1"] = ["k1","k0","k0","k0","k1"]df_right["key2"] = ["k0","k1","k1","k1","k0"]print(df_right)print(df_left)
    e   f   g   h   j   k   key1    key2k3  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0k4  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1k5  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1k6  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1k7  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0    a   b   c   d   e   f   key1    key2k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0
  • merge默认的合并方式是inner
print(pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="inner"))
a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.01   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.02   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.03   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.04   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.05   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.06   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
  • merge的合并方式是outer 并显示出merge的方式
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="outer",indicator=True)
a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both7   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both8   1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  NaN NaN NaN NaN NaN NaN left_only
  • 使用left的方式进行合并 并指定索引位进行合并
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="left",left_index=True,right_index=True,indicator=True)
a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _mergek1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  NaN NaN NaN NaN NaN NaN left_onlyk2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_onlyk3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 bothk4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 bothk5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
  • 使用right的方式进行合并 并指定索引位进行合并 且对数据追加后缀
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="right",left_index=True,right_index=True,indicator=True,suffixes=("_left","_right"))
a   b   c   d   e_left  f_left  key1    key2    e_right f_right g   h   j   k   _mergek3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 bothk4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 bothk5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 bothk6  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_onlyk7  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
原创粉丝点击