tensorflow 多元全连接

来源:互联网 发布:java tcp 多线程 编辑:程序博客网 时间:2024/06/06 13:57

老师给了一个数据集,然后要做回归,看R^2。给了个做分类的数据然后让你做回归。。。坑 还说是超过0.07的R方才有资格跟他讨论超参数调优

数据下载链接: https://pan.baidu.com/s/1slUKBrn 密码: kyn4

sklearn上多元回归、随机森林500颗树的效果 R2 score=0.06
因为正在学tensorflow,所以打算用TF实现同样的效果

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.metrics import r2_scoreimport tensorflow as tffrom sklearn import preprocessingts=pd.read_csv('file:///E:/data/HFT_XY_unselected.csv').astype('float32')ts=ts.dropna()ts.drop('Unnamed: 0',axis=1,inplace=True)#删去列featrue_columns = ['X'+str(i) for i in range(1,333)]X=ts[featrue_columns].copy()#X=X.iloc[:,1:10]#Yy=ts.realY.copy()#数据预处理——主成分提取pca=PCA(n_components=50)#'mle',自动确定保留的特征数reduce_X = pca.fit_transform(X)#特征得分,方差贡献率sum(pca.explained_variance_ratio_)PCA_X=reduce_X.astype('float32')

这里可以看到PCA得分,大概能体现90%左右的方差,个人认为效果还可以

#拆分训练集和测试集train_size=int(PCA_X.shape[0]*0.7)X_train=PCA_X[:train_size].reshape(-1,50)y_train=y[:train_size].reshape(-1,1)X_test=PCA_X[train_size:].reshape(-1,50)y_test=y[train_size:].reshape(-1,1)#标准化scaler = preprocessing.StandardScaler().fit(X_train)print (scaler.mean_, scaler.scale_)x_data_standard = scaler.transform(X_train)n_samples = X_train.shape[0]#行数# 学习率learning_rate = 2# 迭代次数training_epochs = 10# 每多少次输出一次迭代结果display_step = 1x = tf.placeholder(tf.float32, shape=[None, 50])#输入层50个变量y_ = tf.placeholder(tf.float32, shape=[None,1])#输出层keep_prob=tf.placeholder(tf.float32)# 定义模型参数#隐藏层W1 = tf.Variable(tf.truncated_normal([50,200],stddev=0.1),name='W')b1 = tf.Variable(tf.constant(0.1,shape=[200]),name='b')L1 = tf.nn.tanh(tf.matmul(x,W1) + b1,name="L")L1_drop = tf.nn.dropout(L1,keep_prob)#中间层W2 = tf.Variable(tf.truncated_normal([200,50],stddev=0.1),name='W')b2 = tf.Variable(tf.constant(0.1,shape=[50]),name='b')L2 = tf.nn.tanh(tf.matmul(L1_drop,W2) + b2,name="L")L2_drop = tf.nn.dropout(L2,keep_prob)#输出层W3=tf.Variable(tf.truncated_normal([50,1],stddev=0.1),name='W')b3=tf.Variable(tf.constant(0.1,shape=[1]),name='b')#模型pred= tf.matmul(L2_drop ,W3)+b3#pred=tf.matmul(X_train, W) + b# 定义损失函数cost = tf.reduce_sum(tf.pow(pred-y_, 2)) / (2 * n_samples)# 使用Adam算法optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)correct_prediction = tf.equal(tf.argmax(y_,1), tf.argmax(pred,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) with tf.Session() as sess:    #初始化    init = tf.global_variables_initializer()    sess.run(init)    #交叉验证    for step in range(500):        #for (xs, ys) in zip(x_data_standard[1:100], y_train[1:100]):        sess.run(optimizer, feed_dict={x: x_data_standard, y_: y_train,keep_prob: 0.8})        if step % 10 == 0:            # to visualize the result and improvement            prediction_value = sess.run(pred, feed_dict={x: scaler.transform(X_test), y_: y_test,keep_prob: 0.8})            score = r2_score(y_test,prediction_value)            print('R2 score %i:'%step,score)

R2 score 0: -30244.3268412
R2 score 10: -1340.94961339
R2 score 20: -596.268456144
R2 score 30: -194.559941395
R2 score 40: -39.0541979543
R2 score 50: -3.91464581928
R2 score 60: -2.91056697861
R2 score 70: -2.67173714001
R2 score 80: -1.0624426114
R2 score 90: -0.165357421399
R2 score 100: -0.0385794292803
R2 score 110: -0.0354491289759
R2 score 120: -0.0131951483968
R2 score 130: 0.00014439312723
R2 score 140: 0.00127453319753
R2 score 150: 0.00167685360921
R2 score 160: 0.00177724882822
R2 score 170: 0.00193877464792
R2 score 180: 0.00233761433903
R2 score 190: 0.00232340264801
R2 score 200: 0.00220291985519

最后多次训练R方能接近0.05左右,这里网络设计的比较糟糕,计算慢效果差
只是为了记录一下全连接网络,和上篇文章TF一元线性回归的单特征、多特征的对比