python3 pandas

来源:互联网 发布:wwwurlencode php.net 编辑:程序博客网 时间:2024/06/05 14:32

1.引入pandas库

import pandas as pd
通过pd.调用方法

2.读取csv文件

data = pd.read_csv(filePath+fileName)
文件用逗号分隔

例如:

SYSNM,DS_SYSNM,SUBOBJNM,OBJNM,VALUE,TS,DURATIONA0402003,A0402003,CPU,CPU_UTIL,0,2017-10-25 00:00:16,60A0402003,A0402003,CPU,CPU_UTIL,0,2017-10-25 00:01:11,60A0402003,A0402003,CPU,CPU_UTIL,0,2017-10-25 00:02:06,60A0402003,A0402003,CPU,CPU_UTIL,0,2017-10-25 00:03:01,60A0402003,A0402003,CPU,CPU_UTIL,0.01,2017-10-25 00:03:56,60

首行传入 作为 columns

print('data.columns:{}'.format(data.columns))#.csv文件首行作为列名print('data.index:{}'.format(data.index))#行号作为索引 从0开始print('len(data):{}'.format(len(data)))#data长度 不包括首行data.rename(columns={'TS': 'TS_2'}, inplace=True) #修改列名为TS_2print('after rename() data.columns:{} '.format(data.columns))data.rename(columns={'SYSNM': 'SYSNM_2','TS_2': 'TS'}, inplace=True) #修改多个列名print('after rename() data.columns:{} '.format(data.columns))print('print data by columns name = TS:{}'.format(data['TS']))#根据columns name 输出列print('print data by columns index = 5:{}'.format(data.ix[:,5]))#根据columns index 输出第六列print('print data by line index = 3:{}'.format(data.ix[3,:]))#根据columns index 输出第4行 索引从0开始print('print data by line index = [0,2] and colunms name = [TS,VALUE]:{}'.format(data.ix[0:2,['TS', 'VALUE']]))#输出1-3行 TS、VALUE列数据
输出

pydev debugger: starting (pid: 10864)data.columns:Index(['SYSNM', 'DS_SYSNM', 'SUBOBJNM', 'OBJNM', 'VALUE', 'TS', 'DURATION'], dtype='object')data.index:RangeIndex(start=0, stop=5, step=1)len(data):5after rename() data.columns:Index(['SYSNM', 'DS_SYSNM', 'SUBOBJNM', 'OBJNM', 'VALUE', 'TS_2', 'DURATION'], dtype='object') after rename() data.columns:Index(['SYSNM_2', 'DS_SYSNM', 'SUBOBJNM', 'OBJNM', 'VALUE', 'TS', 'DURATION'], dtype='object') print data by columns name = TS:0    2017-10-25 00:00:161    2017-10-25 00:01:112    2017-10-25 00:02:063    2017-10-25 00:03:014    2017-10-25 00:03:56Name: TS, dtype: objectprint data by columns index = 5:0    2017-10-25 00:00:161    2017-10-25 00:01:112    2017-10-25 00:02:063    2017-10-25 00:03:014    2017-10-25 00:03:56Name: TS, dtype: objectprint data by line index = 3:SYSNM_2                A0402003DS_SYSNM               A0402003SUBOBJNM                    CPUOBJNM                  CPU_UTILVALUE                         0TS          2017-10-25 00:03:01DURATION                     60Name: 3, dtype: objectprint data by line index = [0,2] and colunms name = [5,6]:                    TS  VALUE0  2017-10-25 00:00:16    0.01  2017-10-25 00:01:11    0.02  2017-10-25 00:02:06    0.0
3.赋值

批量列赋值

data['newColumn'] = 'new value'#追加一列 赋值print('data append new column:{}'.format(data['newColumn']))length = len(data)data['newColumn_2'] = pd.Series(np.array([1,2,3,4,5]))print('data append new column:{}'.format(data['newColumn_2']))

输出

pydev debugger: starting (pid: 7220)data append new column:0    new value1    new value2    new value3    new value4    new valueName: newColumn, dtype: objectdata append new column:0    11    22    33    44    5Name: newColumn_2, dtype: int32

4.数据类型

data.dtypes 

参考http://www.cnblogs.com/en-heng/p/5630849.html