pandas中行列转换

来源：互联网发布：淘宝台湾版编辑：程序博客网时间：2024/05/04 06:38

①列转行方法

stack函数：pandas.DataFrame.stack(self, level=-1, dropna=True)

通过?pandas.DataFrame.stack命令查看帮助文档

[python] view plain copy
Signature: pandas.DataFrame.stack(self, level=-1, dropna=True)  
Docstring:  
Pivot a level of the (possibly hierarchical) column labels, returning a  
DataFrame (or Series in the case of an object with a single level of  
column labels) having a hierarchical index with a new inner-most level  
of row labels.  
The level involved will automatically get sorted.  

a、对于普通的DataFrame而言，直接列索引转换到最内层行索引，生一个Series对象

[python] view plain copy
In [16]: import pandas as pd  
    ...: import numpy as np  
    ...: df = pd.DataFrame(np.arange(6).reshape(2,3),index=['AA','BB'],columns=  
    ...: ['three','two','one'])  
    ...: df  
    ...:  
Out[16]:  
    three  two  one  
AA      0    1    2  
BB      3    4    5  
  
In [17]: df.stack()  
Out[17]:  
AA  three    0  
    two      1  
    one      2  
BB  three    3  
    two      4  
    one      5  
dtype: int32  
  
In [18]: df.stack(level=0)  
Out[18]:  
AA  three    0  
    two      1  
    one      2  
BB  three    3  
    two      4  
    one      5  
dtype: int32  
  
In [19]: df.stack(level=-1)  
Out[19]:  
AA  three    0  
    two      1  
    one      2  
BB  three    3  
    two      4  
    one      5  
dtype: int32  

b、对于层次化索引的DataFrame而言，可以将指定的索引层转换到行上，默认是将最内层的列索引转换到最内层行

[python] view plain copy
In [31]: import pandas as pd  
    ...: import numpy as np  
    ...: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns=  
    ...: [['two','two','one','one'],['A','B','C','D']])  
    ...: df  
    ...:  
Out[31]:  
   two    one  
     A  B   C  D  
AA   0  1   2  3  
BB   4  5   6  7  
  
In [32]: df.stack()  
Out[32]:  
      one  two  
AA A  NaN  0.0  
   B  NaN  1.0  
   C  2.0  NaN  
   D  3.0  NaN  
BB A  NaN  4.0  
   B  NaN  5.0  
   C  6.0  NaN  
   D  7.0  NaN  
  
In [33]: df.stack(level=0)  
Out[33]:  
          A    B    C    D  
AA one  NaN  NaN  2.0  3.0  
   two  0.0  1.0  NaN  NaN  
BB one  NaN  NaN  6.0  7.0  
   two  4.0  5.0  NaN  NaN  
  
In [34]: df.stack(level=1)  
Out[34]:  
      one  two  
AA A  NaN  0.0  
   B  NaN  1.0  
   C  2.0  NaN  
   D  3.0  NaN  
BB A  NaN  4.0  
   B  NaN  5.0  
   C  6.0  NaN  
   D  7.0  NaN  
  
In [35]: df.stack(level=-1)  
Out[35]:  
      one  two  
AA A  NaN  0.0  
   B  NaN  1.0  
   C  2.0  NaN  
   D  3.0  NaN  
BB A  NaN  4.0  
   B  NaN  5.0  
   C  6.0  NaN  
   D  7.0  NaN  
  
In [36]: df.stack(level=[0,1])  
Out[36]:  
AA  one  C    2.0  
         D    3.0  
    two  A    0.0  
         B    1.0  
BB  one  C    6.0  
         D    7.0  
    two  A    4.0  
         B    5.0  
dtype: float64  

unstack函数：pandas.DataFrame.unstack(self, level=-1, fill_value=None)

通过?pandas.DataFrame.unstack命令查看帮助文档

[python] view plain copy
Signature: pandas.DataFrame.unstack(self, level=-1, fill_value=None)  
Docstring:  
Pivot a level of the (necessarily hierarchical) index labels, returning  
a DataFrame having a new level of column labels whose inner-most level  
consists of the pivoted index labels. If the index is not a MultiIndex,  
the output will be a Series (the analogue of stack when the columns are  
not a MultiIndex).  
The level involved will automatically get sorted.  

a、对于普通的DataFrame而言，直接将列索引转换到行索引的最外层索引，生成一个Series对象

[python] view plain copy
In [20]: df  
Out[20]:  
    three  two  one  
AA      0    1    2  
BB      3    4    5  
  
In [21]: df.unstack()  
Out[21]:  
three  AA    0  
       BB    3  
two    AA    1  
       BB    4  
one    AA    2  
       BB    5  
dtype: int32  
  
In [22]: df.unstack(0)  
Out[22]:  
three  AA    0  
       BB    3  
two    AA    1  
       BB    4  
one    AA    2  
       BB    5  
dtype: int32  
  
In [23]: df.unstack(-1)  
Out[23]:  
three  AA    0  
       BB    3  
two    AA    1  
       BB    4  
one    AA    2  
       BB    5  
dtype: int32  

b、对于层次化索引的DataFrame而言，和stack函数类似，似乎把两层索引当作一个整体，当level为列表时报错

[python] view plain copy
In [37]: df  
Out[37]:  
   two    one  
     A  B   C  D  
AA   0  1   2  3  
BB   4  5   6  7  
  
In [38]: df.unstack()  
Out[38]:  
two  A  AA    0  
        BB    4  
     B  AA    1  
        BB    5  
one  C  AA    2  
        BB    6  
     D  AA    3  
        BB    7  
dtype: int32  
  
In [39]: df.unstack(0)  
Out[39]:  
two  A  AA    0  
        BB    4  
     B  AA    1  
        BB    5  
one  C  AA    2  
        BB    6  
     D  AA    3  
        BB    7  
dtype: int32  
  
In [40]: df.unstack(1)  
Out[40]:  
two  A  AA    0  
        BB    4  
     B  AA    1  
        BB    5  
one  C  AA    2  
        BB    6  
     D  AA    3  
        BB    7  
dtype: int32  
  
In [41]: df.unstack(-1)  
Out[41]:  
two  A  AA    0  
        BB    4  
     B  AA    1  
        BB    5  
one  C  AA    2  
        BB    6  
     D  AA    3  
        BB    7  
dtype: int32  
  
In [42]: df.unstack(level=[0,1])  
  
IndexError: Too many levels: Index has only 1 level, not 2  

那再试下level=5，发现也正常，这里的level怎么理解？--遗留问题

[python] view plain copy
In [44]: df  
Out[44]:  
   two    one  
     A  B   C  D  
AA   0  1   2  3  
BB   4  5   6  7  
  
In [45]: df.unstack(level=5)  
Out[45]:  
two  A  AA    0  
        BB    4  
     B  AA    1  
        BB    5  
one  C  AA    2  
        BB    6  
     D  AA    3  
        BB    7  
dtype: int32  

melt函数：pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)

通过?pandas.melt查看帮助文档

[python] view plain copy
Signature: pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)  
Docstring:  
"Unpivots" a DataFrame from wide format to long format, optionally leaving  
identifier variables set.  
  
This function is useful to massage a DataFrame into a format where one  
or more columns are identifier variables (`id_vars`), while all other  
columns, considered measured variables (`value_vars`), are "unpivoted" to  
the row axis, leaving just two non-identifier columns, 'variable' and  
'value'.  

首先拿普通的DataFrame实验下，看看melt函数怎么转换的

[python] view plain copy
In [46]: df = pd.DataFrame(np.arange(8).reshape(2,4),index=['AA','BB'],columns=  
    ...: ['A','B','C','D'])  
    ...: df  
    ...:  
Out[46]:  
    A  B  C  D  
AA  0  1  2  3  
BB  4  5  6  7  
  
In [47]: pd.melt(df,id_vars=['A','C'],value_vars=['B','D'],var_name='B|D',value  
    ...: _name='(B|D)_value')  
Out[47]:  
   A  C B|D  (B|D)_value  
0  0  2   B            1  
1  4  6   B            5  
2  0  2   D            3  
3  4  6   D            7  
  
In [48]: pd.melt(df,id_vars=['A'],value_vars=['B','D'],var_name='B|D',value_nam  
    ...: e='(B|D)_value')  
Out[48]:  
   A B|D  (B|D)_value  
0  0   B            1  
1  4   B            5  
2  0   D            3  
3  4   D            7  
  
In [49]: pd.melt(df,id_vars=['A'],value_vars=['B'],var_name='B',value_name='B_v  
    ...: alue')  
Out[49]:  
   A  B  B_value  
0  0  B        1  
1  4  B        5  

结论：从上述结果可以看出，id_vars可以理解为结果需要保留的原始列，value_vars可以理解为需需要列转行的列名；var_name把列转行的列变量重新命名，默认为variable；value_name列转行对应变量的值的名称

[python] view plain copy
In [50]: df1 = pd.DataFrame(np.arange(8).reshape(2,4),columns=[list('ABCD'),lis  
    ...: t('EFGH')])  
    ...: df1  
    ...:  
Out[50]:  
   A  B  C  D  
   E  F  G  H  
0  0  1  2  3  
1  4  5  6  7  
  
In [51]: pd.melt(df1,col_level=0,id_vars=['A'],value_vars=['D'])  
Out[51]:  
   A variable  value  
0  0        D      3  
1  4        D      7  

②行转列方法

unstack函数：pandas.DataFrame.unstack(self, level=-1, fill_value=None)

[python] view plain copy
In [26]: df2=df.stack()  
    ...: df2  
    ...:  
Out[26]:  
AA  three    0  
    two      1  
    one      2  
BB  three    3  
    two      4  
    one      5  
dtype: int32  
  
In [27]: df2.unstack()  
Out[27]:  
    three  two  one  
AA      0    1    2  
BB      3    4    5  
  
In [28]: df2.unstack(0)  
Out[28]:  
       AA  BB  
three   0   3  
two     1   4  
one     2   5  
  
In [29]: df2.unstack(1)  
Out[29]:  
    three  two  one  
AA      0    1    2  
BB      3    4    5  
  
In [30]: df2.unstack(-1)  
Out[30]:  
    three  two  one  
AA      0    1    2  
BB      3    4    5  

阅读全文

0 0