pandas入门——多表操作

来源:互联网 发布:tensorflow函数 编辑:程序博客网 时间:2024/05/29 14:41

多表操作

  • concat函数
import pandas as pdimport numpy dictionary1 = {"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"]}df1 = pd.DataFrame(data=dictionary1,index=[0,1,2,3])dictionary2 = {"A":["A4","A5","A6","A7"],"B":["B4","B5","B6","B7"],"C":["C4","C5","C6","C7"],"D":["D4","D5","D6","D7"]}df2 = pd.DataFrame(data=dictionary2,index=[4,5,6,7])dictionary3 = {"A":["A8","A9","A10","A11"],"B":["B8","B9","B10","B11"],"C":["C8","C9","C10","C11"],"D":["D8","D9","D10","D11"]}df3 = pd.DataFrame(data=dictionary3,index=[8,9,10,11])# concat 函数的作用是将多个数据框对象进行组合,默认的组合方式是按照列来组合pd.concat(objs=[df1,df2,df3])pd.concat(objs=[df1,df2,df3],axis=1,ignore_index=True)

concat函数可以组合多张表,默认是按照列的方式进行组合,当需要按照行的方式进行组合的时候,需要指定参数axis为1,当对应值不存在的时候,会用miss value来填充,ignore_index参数会忽略原有的索引并重新赋予数据集索引

  • merge函数
left = {"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"],"Key0":["K0","K0","K1","K2"],"Key1":["K0","K1","K0","K1"]}right = {"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"],"Key0":["K0","K1","K1","K2"],"Key1":["K0","K0","K0","K0"]}left = pd.DataFrame(data=left)right = pd.DataFrame(data=right)pd.merge(left=left,right=right,how="inner",on="Key1",suffixes=("_left","_right"))

merge函数用来组合两张表,它不同于concat函数,每次只能组合两张表,但相对于concat函数,它更加灵活。
left参数用来指定左表数据集
right参数用来指定右表数据集
how参数用来指定按照什么方式来组合表,默认为left是按照左表的方式组合
on参数用来指定哪一列为主列并按照此列来组合两个数据集
suffixes参数用来指定相同列名的情况下为列名添加后缀

原创粉丝点击