python对字符串型数据处理

来源：互联网发布：sql降序排列语句编辑：程序博客网时间：2024/06/16 19:06

1.sklearn

1.1 labelEncoder

from sklearn import preprocessingle = preprocessing.LabelEncoder()le.fit(df['Col1'])df['Col3'] = le.transform(df['Col3'])

再来一个示例

###from sklearn import preprocessingfrom sklearn.preprocessing import LabelEncoderle = preprocessing.LabelEncoder()le.fit(["paris", "paris", "tokyo", "amsterdam"])LabelEncoder()print(list(le.classes_))# ['amsterdam', 'paris', 'tokyo']print(le.transform(["tokyo", "tokyo", "paris"])) # array([2, 2, 1])

这里结合读取文件，来实现字符编码。

import numpy as npimport pandas as pdimport xlrdfrom tqdm import tqdmfrom sklearn import preprocessingfrom sklearn.preprocessing import LabelEncoder#### obtain cols of XX typedef obtain_x(train_df,xtype):    dtype_df = train_df.dtypes.reset_index()    print('dtype_df\n',dtype_df)    dtype_df.columns = ['col','type']    return dtype_df[dtype_df.type==xtype].col.valuestrain_df = pd.read_excel(r'G:\test_onehot.xlsx')# print('train_df',train_df)# obtain str colsstr_col = obtain_x(train_df,'object')#获得字符串类型列代号print('str_col\n',str_col)str_col_list=str_col.tolist()print('str_list\n',str_col_list)# print('obtained float cols, and count:',len(float64_col))print('train_df[str_col_list]\n',train_df[str_col_list])###编码le = preprocessing.LabelEncoder()# list= [col for col in str_col ]list=[]# list=str_col_list# list.append(train_df[col] for col in str_col_list)list.append(train_df[str_col_list[0]])list.append(train_df[str_col_list[1]])print('list\n',list[1][0])le.fit(list[0])LabelEncoder()print('le.transform(list[0])\n',le.transform(list[0]))

2、使用pandas处理

2.1 独热编码

import pandas as pdtrain_df = pd.read_excel(r'G:\test_onehot.xlsx')# print('train_df',train_df)#get_dummies# obtain str colsstr_col = obtain_x(train_df,'object')#获得字符串类型列代号train_df_dummy=pd.get_dummies(train_df[str_col])train_df=train_df.drop(str_col,axis=1)train_df=train_df.join(train_df_dummy)print('train_df\n',train_df)

参考：
1. pandas处理字符串型数据；
2. sklearn_labelEncoder;
3. 独热编码CSDN；
4. 独热编码_GitHub；
5. 独热编码的两种实现方式panda和sklearn

阅读全文

'); })();