Python3pandas库Series用法(基础整理)
来源:互联网 发布:监听2121端口 编辑:程序博客网 时间:2024/06/06 01:31
构造/初始化Series的3种方法:
(1)用列表list构建Series
(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key
(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
(3)用numpy array来构建Series
选择数据
(1)可以像对待一个list一样对待一个Series,完成各种切片的操作
(2)Series就像一个dict,前面定义的index就是用来选择数据的
(3)boolean indexing,和numpy很像
Series元素赋值
(1)直接利用索引值赋值
(2)不要忘了上面的boolean indexing,在赋值里它也可以用
数学运算
数据缺失
构造/初始化Series的3种方法:
(1)用列表list构建Series
import pandas as pdmy_list=[7,'Beijing','19大',3.1415,-10000,'Happy']s=pd.Series(my_list)print(type(s))print(s)
<class 'pandas.core.series.Series'>0 71 Beijing2 19大3 3.14154 -100005 Happydtype: object
(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key
s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],index=['A','B','C','D','E','F'])print(s)
A 7B BeijingC 19大D 3.1415E -10000F Happydtype: object
(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
(3)用numpy array来构建Series
import numpy as npd=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])print(d)
a -0.329401b -0.435921c -0.232267d -0.846713e -0.406585dtype: float64
选择数据
(1)可以像对待一个list一样对待一个Series,完成各种切片的操作
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
print(apts[3])
60000.0
print(apts[[3,4,1]])
Shanghai 60000.0Suzhou NaNGuangzhou 45000.0Name: income, dtype: float64
print(apts[1:])
Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
print(apts[:-2])
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Name: income, dtype: float64
print(apts[1:]+apts[:-1])
Beijing NaNGuangzhou 90000.0Hangzhou 40000.0Shanghai 120000.0Suzhou NaNshenzhen NaNName: income, dtype: float64
(2)Series就像一个dict,前面定义的index就是用来选择数据的
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts['Shanghai']) ###
60000.0
print('Hangzhou' in apts)
True
print('Choingqing' in apts)
False
(3)boolean indexing,和numpy很像
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')less_than_50000=(apts<=50000) ###print(apts[less_than_50000])
Guangzhou 45000.0Hangzhou 20000.0shenzhen 50000.0Name: income, dtype: float64
注:可以使用numpy的各种函数mean,median,max,min
print(apts.mean())
46000.0
Series元素赋值
(1)直接利用索引值赋值
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)print('Old income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64Old income of shenzhen:50000.0
apts['shenzhen']=70000 ###print(apts)print('New income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64New income of shenzhen:70000.0
(2)不要忘了上面的boolean indexing,在赋值里它也可以用
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000) ###print(less_than_50000)apts[less_than_50000]=40000 ###print(apts)
Beijing FalseGuangzhou TrueHangzhou TrueShanghai FalseSuzhou Falseshenzhen FalseName: income, dtype: boolBeijing 55000.0Guangzhou 40000.0Hangzhou 40000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64
数学运算
import pandas as pdimport numpy as npcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000) apts[less_than_50000]=40000 print(apts)print(apts/2) ###print(apts**1.5) ###print(np.log(apts)) ###apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)print(apts+apts2) ###
数据缺失
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000less_than_50000=(apts<50000)apts[less_than_50000]=40000print(apts)
Beijing 55000.0Guangzhou 40000.0Hangzhou 40000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)
Beijing 10000Chongqing 30000Guangzhou 7000Shanghai 8000Tianjin 40000shenzhen 6000dtype: int64
print('Hangzhou' in apts) ###print('Hangzhou' in apts2)
TrueFalse
print(apts.notnull()) #boolean条件 ###
Beijing TrueGuangzhou TrueHangzhou TrueShanghai TrueSuzhou Falseshenzhen TrueName: income, dtype: bool
print(apts.isnull()) ###
Beijing FalseGuangzhou FalseHangzhou FalseShanghai FalseSuzhou Trueshenzhen FalseName: income, dtype: bool
print(apts[apts.isnull()]) #利用缺失索引布尔值取元素
Suzhou NaNName: income, dtype: float64
apts=apts+apts2 #索引缺失相加print(apts)
Beijing 65000.0Chongqing NaNGuangzhou 47000.0Hangzhou NaNShanghai 68000.0Suzhou NaNTianjin NaNshenzhen 76000.0dtype: float64
apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值print(apts)
Beijing 65000.0Chongqing 64000.0Guangzhou 47000.0Hangzhou 64000.0Shanghai 68000.0Suzhou 64000.0Tianjin 64000.0shenzhen 76000.0dtype: float64
阅读全文
0 0
- Python3pandas库Series用法(基础整理)
- Python3pandas库DataFrame用法(基础整理)
- Python3pandas库DataFrame的分组,拼接,统计运算等用法(基础整理)
- Python3pandas库transform用法
- ruby基础用法简单整理
- pandas.Series函数用法
- Pandas基础复习-Series
- pandas---Series基础使用
- Struts Series(1) - Struts基础
- Series
- Google Guava 库用法整理
- Google Guava 库用法整理
- Google Guava 库用法整理
- Google Guava 库用法整理
- Google Guava 库用法整理
- Google Guava 库用法整理
- Google Guava 库用法整理
- Google Guava 库用法整理
- C#--数组
- vue.js 组件 之 prop 传递数据
- python多线程
- sublime最好用部分功能
- 字符串的查找:朴素查找算法和KMP算法
- Python3pandas库Series用法(基础整理)
- poi设置excel行高
- PAT甲级 1111. Online Map (30)
- 流水账笔记:PE文件格式(导出表)
- 特征提取与特征选择
- Android头像选择
- docker镜像操作
- PAT 甲级 1112. Stucked Keyboard (20)
- eclipse导入maven项目时报Could not calculate build plan: Plugin org.apache.maven.plugins:maven-resources