pandas常用功能整理

来源:互联网 发布:fanuc数控系统编程实例 编辑:程序博客网 时间:2024/06/01 15:37

1.读写

读指定格式txt:

train = pd.read_table('/home/hadoop/jzzz/train/subsidy_train.txt',sep=',',header=-1) #助学金
train.columns = ['id','money']

读写csv:college.to_csv('/home/hadoop/jzzz/input/college1.csv',index=True)
college = pd.read_csv('/home/hadoop/jzzz/input/college1.csv')
college.columns = ['college','num']

2.对某列同一值的相应计算:

取最大值:college = pd.DataFrame(score_train_test.groupby(['college'])['score'].max()) #对college相同的项的score值取最大项

计算出现次数:college = pd.DataFrame(score_train_test['college'].value_counts())

统计每个学生的总消费次数:card = pd.DataFrame(card_train_test.groupby(['id'])['consume'].count())

均值:~.mean()


3.合并表格:score_train_test = pd.merge(score_train_test, college, how='left',on='college') #合并score_train_test和college,左外链接,用于链接的列索引为college

接在后面:card_train_test = pd.concat([card_train,card_test])


4.提取满足条件的对应元素:train_shitang=card_train.loc[card_train.how == '食堂']

提取某列值:ids = test['id'].values

提取某项非空的值:train = train_test[train_test['money'].notnull()]    空值:~.isnull()

置NaN为-1:train = train.fillna(-1)


5.predictors = [x for x in train.columns if x not in [target]] #对于训练列中的每一个x,如果x不在target里,将所有的x生成一个新表predictors


6.去重:train_shitang = train_shitang.drop_duplicates()


7.python中range循环的用法 for i in range():

3种:
1: range(10),等于[0,1,2,3,4,5,6,7,8,9]
2: range(1,9),等于[1,2,3,4,5,6,7,8]
3: range(1,9,2),等于[1,3,5,7]

0 0
原创粉丝点击