Adaboost算法
来源:互联网 发布:电脑摇号软件 编辑:程序博客网 时间:2024/06/06 04:54
算法思路
Adaboost是将弱分类器进行组合的算法。在这里弱分类器采用DecisionStump,通过迭代产生一系列的DecisionStump分类器,然后以一定权重进行组合。需要注意的是Adaboost正负样本一+1,-1表示,不是0、1。
训练过程
1.设置样本权重向量
2.计算得到当前最优的DecisionStump分类器
3.计算当前误分类率
4.计算当前分类器权重
5.更新样本权重,对于误分类样本权重更新为
6.重复2-5,直到达到分类器个数。
预测过程
1.计算个分类器的计算结果
2.根据权重计算加权结果,取符号为预测结果。
代码
# Decision stump used as weak classifier in Adaboostclass DecisionStump(): def __init__(self): self.polarity = 1 self.feature_index = None self.threshold = None self.alpha = Noneclass Adaboost(): """Boosting method that uses a number of weak classifiers in ensemble to make a strong classifier. This implementation uses decision stumps, which is a one level Decision Tree. Parameters: ----------- n_clf: int The number of weak classifiers that will be used. """ def __init__(self, n_clf=5): self.n_clf = n_clf # List of weak classifiers self.clfs = [] def fit(self, X, y): n_samples, n_features = np.shape(X) # Initialize weights to 1/N w = np.full(n_samples, (1 / n_samples)) # Iterate through classifiers for _ in range(self.n_clf): clf = DecisionStump() # Minimum error given for using a certain feature value threshold # for predicting sample label min_error = 1 # Iterate throught every unique feature value and see what value # makes the best threshold for predicting y for feature_i in range(n_features): feature_values = np.expand_dims(X[:, feature_i], axis=1) unique_values = np.unique(feature_values) # Try every unique feature value as threshold for threshold in unique_values: p = 1 # Set all predictions to '1' initially prediction = np.ones(np.shape(y)) # Label the samples whose values are below threshold as '-1' prediction[X[:, feature_i] < threshold] = -1 # Error = sum of weights of missclassified samples error = sum(w[y != prediction]) if error > 0.5: # E.g error = 0.8 => (1 - error) = 0.2 # We flip the error and polarity error = 1 - error p = -1 # If this threshold resulted in the smallest error we save the # configuration if error < min_error: clf.polarity = p clf.threshold = threshold clf.feature_index = feature_i min_error = error # Calculate the alpha which is used to update the sample weights # and is an approximation of this classifiers proficiency clf.alpha = 0.5 * math.log((1.0 - min_error) / (min_error + 1e-10)) # Set all predictions to '1' initially predictions = np.ones(np.shape(y)) # The indexes where the sample values are below threshold negative_idx = (clf.polarity * X[:, clf.feature_index] < clf.polarity * clf.threshold) # Label those as '-1' predictions[negative_idx] = -1 # Calculate new weights # Missclassified gets larger and correctly classified smaller w = np.multiply(w, (np.exp(clf.alpha * np.multiply(y,predictions)))) # Normalize to one w /= np.sum(w) print("w value:", w) # Save classifier self.clfs.append(clf) def predict(self, X): n_samples = np.shape(X)[0] y_pred = np.zeros((n_samples, 1)) # For each classifier => label the samples for clf in self.clfs: # Set all predictions to '1' initially predictions = np.ones(np.shape(y_pred)) # The indexes where the sample values are below threshold negative_idx = (clf.polarity * X[:, clf.feature_index] < clf.polarity * clf.threshold) # Label those as '-1' predictions[negative_idx] = -1 # Add column of predictions weighted by the classifiers alpha # (alpha indicative of classifiers profieciency) y_pred = np.concatenate((y_pred, clf.alpha * predictions), axis=1) # Sum weighted predictions and return sign of prediction sum y_pred = np.sign(np.sum(y_pred, axis=1)) return y_pred
1 0
- adaboost算法
- AdaBoost算法
- AdaBoost 算法
- Adaboost算法
- adaboost算法
- adaboost算法
- AdaBoost算法
- Adaboost算法
- AdaBoost算法
- Adaboost 算法
- Adaboost算法
- Adaboost 算法
- adaBoost算法
- AdaBoost算法
- Adaboost 算法
- AdaBoost算法
- Adaboost 算法
- adaboost算法
- SQL连载(二)----数据类型
- 1.5.ARM裸机第五部分-SDRAM和重定位relocate
- tips---eclipse功能 快捷键
- eclipse中SVN分支合并到主干
- echarts基础图表教程(动态创建)
- Adaboost算法
- Maven 各种报错解决方案
- leetcode148 sort list
- Linux中rz和sz命令用法详解
- 过滤器,拦截器,监听器的定义及区别
- SprakStreaming整合Kafka2
- 【每日一个Linux命令】netstat
- js中获得当前时间是年份和月份,形如:201208
- rem,移动端适配心得2【转载】