machine learning in coding（python）：拼接原始数据；生成高次特征

来源：互联网发布：齐鲁石化网络电视台编辑：程序博客网时间：2024/05/01 05:38

拼接原始数据：

train_data = pd.read_csv('train.csv')test_data = pd.read_csv('test.csv')all_data = np.vstack((train_data.ix[:,1:-1], test_data.ix[:,1:-1]))

numpy下的合并数组vstack和hstack函数：

>>> a = np.ones((2,2))>>> b = np.eye(2)>>> print np.vstack((a,b))[[ 1.  1.] [ 1.  1.] [ 1.  0.] [ 0.  1.]]>>> print np.hstack((a,b))[[ 1.  1.  1.  0.] [ 1.  1.  0.  1.]]

生成高（2）次特征：

def group_data(data, degree=2, hash=hash):    new_data = []    m,n = data.shape    for indicies in combinations(range(n), degree):            new_data.append([hash(tuple(v)) for v in data[:,indicies]])    return array(new_data).T

在生成高次特征之前，先做“LabelEncoder”操作。。。。

from kaggle

1 0

machine learning in coding（python）：拼接原始数据；生成高次特征
machine learning in coding（python）：使用贪心搜索【进行特征选择】
machine learning in coding（python）：polynomial curve fitting，python拟合多项式
machine learning in coding（python）：根据关键字合并多个表（构建组合feature）
machine learning in coding（python）：pandas数据包DataFrame数据结构简介
machine learning in coding（python）：使用xgboost构建预测模型
machine learning in coding（python）：使用交叉验证【选择模型超参数】
Machine Learning in Python
机器学习实战(Machine Learning in Action)参考答案以及原始数据
Learning Scikit-learn Machine Learning in Python
scikit-learn: machine learning in Python系列（一）
scikit-learn: machine learning in Python
【ML】【python】Machine Learning in Action
Machine Learning in Python part 1
Machine Learning in Python part 2
Machine Learning in Python (Scikit-learn)-(转)
Large scale machine learning in Python
Machine Learning In Action：KNN(Python)
HDU 1874--畅通工程续【最短路 && floyd && 水题】
[ACM] hdu 2717 Catch That Cow (BFS）
Get Luffy Out （poj 2723 二分+2-SAT）
LeetCode之Recover Binary Search Tree
OC之集合家族
machine learning in coding（python）：拼接原始数据；生成高次特征
[ACM] hdu 1035 Robot Motion (模拟或DFS）
c++学习笔记
POJ 2750 Potted Flower（线段树 + DP）
No package 'theoraenc' found gstreamer
Maven私服搭建
抛硬币的模拟
四道面试题
HDU 1312 Red and Black