Python3pandas库Series用法(基础整理)

来源:互联网 发布:监听2121端口 编辑:程序博客网 时间:2024/06/06 01:31

构造/初始化Series的3种方法:

(1)用列表list构建Series
(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key
(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
(3)用numpy array来构建Series

选择数据

(1)可以像对待一个list一样对待一个Series,完成各种切片的操作
(2)Series就像一个dict,前面定义的index就是用来选择数据的
(3)boolean indexing,和numpy很像

Series元素赋值

(1)直接利用索引值赋值
(2)不要忘了上面的boolean indexing,在赋值里它也可以用

数学运算

数据缺失


构造/初始化Series的3种方法:

(1)用列表list构建Series

import pandas as pdmy_list=[7,'Beijing','19大',3.1415,-10000,'Happy']s=pd.Series(my_list)print(type(s))print(s)
<class 'pandas.core.series.Series'>0           71     Beijing2        193      3.14154      -100005       Happydtype: object

(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key

s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],index=['A','B','C','D','E','F'])print(s)
A           7B     BeijingC        19D      3.1415E      -10000F       Happydtype: object

(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)
Beijing      55000.0Guangzhou    45000.0Hangzhou     20000.0Shanghai     60000.0Suzhou           NaNshenzhen     50000.0Name: income, dtype: float64

(3)用numpy array来构建Series

import numpy as npd=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])print(d)
a   -0.329401b   -0.435921c   -0.232267d   -0.846713e   -0.406585dtype: float64

选择数据

(1)可以像对待一个list一样对待一个Series,完成各种切片的操作

import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)
Beijing      55000.0Guangzhou    45000.0Hangzhou     20000.0Shanghai     60000.0Suzhou           NaNshenzhen     50000.0Name: income, dtype: float64
print(apts[3])
60000.0
print(apts[[3,4,1]])
Shanghai     60000.0Suzhou           NaNGuangzhou    45000.0Name: income, dtype: float64
print(apts[1:])
Guangzhou    45000.0Hangzhou     20000.0Shanghai     60000.0Suzhou           NaNshenzhen     50000.0Name: income, dtype: float64
print(apts[:-2])
Beijing      55000.0Guangzhou    45000.0Hangzhou     20000.0Shanghai     60000.0Name: income, dtype: float64
print(apts[1:]+apts[:-1])
Beijing           NaNGuangzhou     90000.0Hangzhou      40000.0Shanghai     120000.0Suzhou            NaNshenzhen          NaNName: income, dtype: float64

(2)Series就像一个dict,前面定义的index就是用来选择数据的

import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts['Shanghai']) ###
60000.0
print('Hangzhou' in apts)
True
print('Choingqing' in apts)
False

(3)boolean indexing,和numpy很像

import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')less_than_50000=(apts<=50000) ###print(apts[less_than_50000])
Guangzhou    45000.0Hangzhou     20000.0shenzhen     50000.0Name: income, dtype: float64

注:可以使用numpy的各种函数mean,median,max,min

print(apts.mean()) 
46000.0

Series元素赋值

(1)直接利用索引值赋值

import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)print('Old income of shenzhen:{}'.format(apts['shenzhen']))
Beijing      55000.0Guangzhou    45000.0Hangzhou     20000.0Shanghai     60000.0Suzhou           NaNshenzhen     50000.0Name: income, dtype: float64Old income of shenzhen:50000.0
apts['shenzhen']=70000  ###print(apts)print('New income of shenzhen:{}'.format(apts['shenzhen']))
Beijing      55000.0Guangzhou    45000.0Hangzhou     20000.0Shanghai     60000.0Suzhou           NaNshenzhen     70000.0Name: income, dtype: float64New income of shenzhen:70000.0

(2)不要忘了上面的boolean indexing,在赋值里它也可以用

import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000)  ###print(less_than_50000)apts[less_than_50000]=40000  ###print(apts)
Beijing      FalseGuangzhou     TrueHangzhou      TrueShanghai     FalseSuzhou       Falseshenzhen     FalseName: income, dtype: boolBeijing      55000.0Guangzhou    40000.0Hangzhou     40000.0Shanghai     60000.0Suzhou           NaNshenzhen     70000.0Name: income, dtype: float64

数学运算

import pandas as pdimport numpy as npcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000)  apts[less_than_50000]=40000  print(apts)print(apts/2)   ###print(apts**1.5)   ###print(np.log(apts))   ###apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)print(apts+apts2)   ###

数据缺失

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000less_than_50000=(apts<50000)apts[less_than_50000]=40000print(apts)
Beijing      55000.0Guangzhou    40000.0Hangzhou     40000.0Shanghai     60000.0Suzhou           NaNshenzhen     70000.0Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)
Beijing      10000Chongqing    30000Guangzhou     7000Shanghai      8000Tianjin      40000shenzhen      6000dtype: int64
print('Hangzhou' in apts)   ###print('Hangzhou' in apts2)
TrueFalse
print(apts.notnull()) #boolean条件   ###
Beijing       TrueGuangzhou     TrueHangzhou      TrueShanghai      TrueSuzhou       Falseshenzhen      TrueName: income, dtype: bool
print(apts.isnull())   ###
Beijing      FalseGuangzhou    FalseHangzhou     FalseShanghai     FalseSuzhou        Trueshenzhen     FalseName: income, dtype: bool
print(apts[apts.isnull()])   #利用缺失索引布尔值取元素
Suzhou   NaNName: income, dtype: float64
apts=apts+apts2   #索引缺失相加print(apts)
Beijing      65000.0Chongqing        NaNGuangzhou    47000.0Hangzhou         NaNShanghai     68000.0Suzhou           NaNTianjin          NaNshenzhen     76000.0dtype: float64
apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值print(apts)
Beijing      65000.0Chongqing    64000.0Guangzhou    47000.0Hangzhou     64000.0Shanghai     68000.0Suzhou       64000.0Tianjin      64000.0shenzhen     76000.0dtype: float64