Titanic: Machine Learning from Disaster
来源:互联网 发布:花千骨翅膀进阶数据 编辑:程序博客网 时间:2024/06/04 18:46
我使用了逻辑回归模型做的分类,只使用了7个特征,很显然分类效果很差只有43.54%,先附上自己的渣代码,后续优化
import matplotlib.pyplot as pltimport pandas as pdimport numpy as npfrom pydoc import describefrom plot_test import weightdef sigmoid(inX): inX = inX.astype(float) return 1.0 / (1.0 + np.exp(-1.0*inX))def gradAscent(dataMat, classLabel): m, n = dataMat.shape alpha = 0.001 maxCycles = 500 weights = np.ones((n, 1))# print 'weight:\n',weights for k in range(maxCycles): h = sigmoid(dataMat * weights) error = classLabel - h weights = weights + alpha * dataMat.T * error return weightsdef getData(train, test): train = pd.read_csv(train, header=0) m, n = train.shape train.Age = train.Age.fillna(train.Age.median()) train.loc[train.Sex == 'male', 'Sex'] = 1 train.loc[train.Sex == 'female', 'Sex'] = 0 train.Embarked = train.Embarked.fillna('S') train.loc[train.Embarked == 'S', 'Embarked'] = 0 train.loc[train.Embarked == 'C', 'Embarked'] = 1 train.loc[train.Embarked == 'Q', 'Embarked'] = 2 label_mat = np.mat([train.Survived]).T data_mat = np.mat([train.Pclass, train.Sex, train.Age, train.SibSp, train.Parch, train.Fare, train.Embarked]).T data_mat = np.hstack((np.ones((m,1)), data_mat)) test = pd.read_csv(test, header=0) m2, n2 = test.shape test.Age = test.Age.fillna(test.Age.median()) test.loc[test.Sex == 'male', 'Sex'] = 1 test.loc[test.Sex == 'female', 'Sex'] = 0 test.Embarked = test.Embarked.fillna('S') test.loc[test.Embarked == 'S', 'Embarked'] = 0 test.loc[test.Embarked == 'C', 'Embarked'] = 1 test.loc[test.Embarked == 'Q', 'Embarked'] = 2 print test.Fare[152] print test.Fare.median() print test.Pclass test.Fare = test.Fare.fillna(test.Fare.median()) test_mat = np.mat([test.Pclass, test.Sex, test.Age, test.SibSp, test.Parch, test.Fare, test.Embarked]).T test_mat = np.hstack((np.ones((m2,1)), test_mat))# print data_mat.shape# print test_mat.shape# print label_mat.shape# print data_mat[0]# print test_mat[0] return data_mat, test_mat, label_matdata, test, label = getData('train.csv', 'test.csv')weights = gradAscent(data, label)count = 0f = open('re.csv', 'w')k = 0for i in range(892, 1310): print k t = int(sigmoid(test[k]*weights)) k += 1 temp = str(i)+','+str(t)+'\n' f.write(temp)f.close()
0 0
- Titanic: Machine Learning from Disaster
- Titanic: Machine Learning from Disaster
- Titanic: Machine Learning from Disaster
- Titanic: Machine Learning from Disaster
- Titanic : Machine Learning from Disaster
- Kaggle Titanic: Machine Learning from Disaster
- Kaggle | Titanic: Machine Learning from Disaster
- Kaggle之Titanic: Machine Learning from Disaster
- kaggle: Titanic: Machine Learning from Disaster
- kaggle competition 之 Titanic: Machine Learning from Disaster
- Titanic: Machine Learning from Disaster(Kaggle 数据挖掘竞赛)
- 【Kaggle练习赛】之Titanic: Machine Learning from Disaster
- Kaggle Titanic: Machine Learning from Disaster 一种思路
- Titanic: Machine Learning from Disaster——Linear regression
- Titanic: Machine Learning from Disaster——Logistic regression
- Titanic: Machine Learning from Disaster——Improving submission
- Titanic: Machine Learning from Disaster——总结
- Kaggle比赛经验总结之Titanic: Machine Learning from Disaster
- 获取地址栏参数
- Iris的R语言命令工具箱(1)
- 想要更改一个数据中的某一位的状态时可使用异或直接操作
- u3d做自己的第一个射击游戏demo,实现开火效果和跟随鼠标朝向
- QtCreator2.7.0桌面快捷方式创建过程
- Titanic: Machine Learning from Disaster
- framework学习之Qualcomm平台qcril初始化及消息处理流程
- raknet编译
- Spark的RDD详解(源码)
- Git 分布式版本控制系统
- 解说Zynq-7000 Uboot如何编译
- Qt Charts实践
- tomcat7.0.52及以上版本web.xml引用外部文件问题
- django 解决css,js文件304导致无法加载显示问题