传统机器学习算法包及使用

来源：互联网发布：数据分析ppt分享编辑：程序博客网时间：2024/05/22 01:45

传统机器学习算法包都在sklearn中

1，决策树
from sklearn import preprocessing
from sklearn import tree

clf = tree.DecisionTreeClassifier(criterion='entropy') # 选取度量标准，默认使用CART
clf = clf.dit(X, Y) # 建立决策树

predictY = clf.predict(newX)

tree有以下几种算法，具体见文档http://scikit-learn.org/stable/modules/classes.html#
tree.DecisionTreeClassifier()
tree.DecisionTreeRegressor()
tree.ExtraTreeClassifier()
tree.ExtraTreeRegressor()
tree.exprot_graphviz 将决策树导出为一个dot文件
安装Graphviz工具http://www.graphviz.org/, 然后使用如下命令将dot文件转换为pdf文件
dot -Tpdf input.dot -o output.pdf

2, KNN算法
from sklearn import neighbors

knn = neighbors.KNeighborsClassifier()
knn.fit(X, Y) #训练KNN
predictY = knn.predict(newX)

neighbors有以下几种算法:具体参数见文档 http://scikit-learn.org/stable/modules/classes.html#
neighbors.NearestNeighbors()
neighbors.KNeighborsClassifier()
neighbors.RadiusNeighborsClassifier()
neighbors.KNeighborsRegressor()
neighbors.RadiusNeighborsRegressor()
neighbors.NearestCentroid()
neighbors.BallTree
neighbors.KDTree
neighbors.LSHForest
neighbors.DistanceMetric
neighbors.KernelDensity
neighbors.kneighbors_graph()
neighbors.radius_neighbors_graph()

3，SVM算法
from sklearn import svm
clf = svm.SVC(kernel='linear')

clf.fit(X, Y)

predictY = clf.predict(newX)

svm具体有以下几种算法:
svm.SVC() #support vector classifier
svm.LinearSVC()
svm.NuSVC()
svm.SVR() # support vector regressor
svm.LinearSVR()
svm.NuSVR()
svm.OneClassSVM()
svm.l1_min_c()

4, 线性回归和逻辑回归
from sklean import linear_model

regr = linear_model.linearRegression()
regr.fit(X, Y)
predictY = regr.predict(newX)

linear_model具体有以下方法:
linear_model.ARDRegression([n_iter, tol, ...]) Bayesian ARD regression.
linear_model.BayesianRidge([n_iter, tol, ...]) Bayesian ridge regression
linear_model.ElasticNet([alpha, l1_ratio, ...]) Linear regression with combined L1 and L2 priors as regularizer.
linear_model.ElasticNetCV([l1_ratio, eps, ...]) Elastic Net model with iterative fitting along a regularization path
linear_model.HuberRegressor([epsilon, ...]) Linear regression model that is robust to outliers.
linear_model.Lars([fit_intercept, verbose, ...]) Least Angle Regression model a.k.a.
linear_model.LarsCV([fit_intercept, ...]) Cross-validated Least Angle Regression model
linear_model.Lasso([alpha, fit_intercept, ...]) Linear Model trained with L1 prior as regularizer (aka the Lasso)
linear_model.LassoCV([eps, n_alphas, ...]) Lasso linear model with iterative fitting along a regularization path
linear_model.LassoLars([alpha, ...]) Lasso model fit with Least Angle Regression a.k.a.
linear_model.LassoLarsCV([fit_intercept, ...]) Cross-validated Lasso, using the LARS algorithm
linear_model.LassoLarsIC([criterion, ...]) Lasso model fit with Lars using BIC or AIC for model selection
linear_model.LinearRegression([...]) Ordinary least squares Linear Regression.
linear_model.LogisticRegression([penalty, ...]) Logistic Regression (aka logit, MaxEnt) classifier.
linear_model.LogisticRegressionCV([Cs, ...]) Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.MultiTaskLasso([alpha, ...]) Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer
linear_model.MultiTaskElasticNet([alpha, ...]) Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer
linear_model.MultiTaskLassoCV([eps, ...]) Multi-task L1/L2 Lasso with built-in cross-validation.
linear_model.MultiTaskElasticNetCV([...]) Multi-task L1/L2 ElasticNet with built-in cross-validation.
linear_model.OrthogonalMatchingPursuit([...]) Orthogonal Matching Pursuit model (OMP)
linear_model.OrthogonalMatchingPursuitCV([...]) Cross-validated Orthogonal Matching Pursuit model (OMP)
linear_model.PassiveAggressiveClassifier([...]) Passive Aggressive Classifier
linear_model.PassiveAggressiveRegressor([C, ...]) Passive Aggressive Regressor
linear_model.Perceptron([penalty, alpha, ...]) Read more in the User Guide.
linear_model.RandomizedLasso([alpha, ...]) Randomized Lasso.
linear_model.RandomizedLogisticRegression([...]) Randomized Logistic Regression
linear_model.RANSACRegressor([...]) RANSAC (RANdom SAmple Consensus) algorithm.
linear_model.Ridge([alpha, fit_intercept, ...]) Linear least squares with l2 regularization.
linear_model.RidgeClassifier([alpha, ...]) Classifier using Ridge regression.
linear_model.RidgeClassifierCV([alphas, ...]) Ridge classifier with built-in cross-validation.
linear_model.RidgeCV([alphas, ...]) Ridge regression with built-in cross-validation.
linear_model.SGDClassifier([loss, penalty, ...]) Linear classifiers (SVM, logistic regression, a.o.) with SGD training.
linear_model.SGDRegressor([loss, penalty, ...]) Linear model fitted by minimizing a regularized empirical loss with SGD
linear_model.TheilSenRegressor([...]) Theil-Sen Estimator: robust multivariate regression model.
linear_model.lars_path(X, y[, Xy, Gram, ...]) Compute Least Angle Regression or Lasso path using LARS algorithm [1]
linear_model.lasso_path(X, y[, eps, ...]) Compute Lasso path with coordinate descent
linear_model.lasso_stability_path(X, y[, ...]) Stability path based on randomized Lasso estimates
linear_model.logistic_regression_path(X, y) Compute a Logistic Regression model for a list of regularization parameters.
linear_model.orthogonal_mp(X, y[, ...]) Orthogonal Matching Pursuit (OMP)
linear_model.orthogonal_mp_gram(Gram, Xy[, ...]) Gram Orthogonal Matching Pursuit (OMP)

5,K-Means算法

from sklearn import cluster

kmeans = cluster.KMeans()
kmeans.k_means(X, n_cluster...) #运行kmeans算法
print(kmeans.labels_)
print(kmeans.predict([1,3])
kmeans_cluster_centers_ #打印聚类中心的点的坐标

cluster有以下算法:

cluster.AffinityPropagation([damping, ...]) Perform Affinity Propagation Clustering of data.
cluster.AgglomerativeClustering([...]) Agglomerative Clustering
cluster.Birch([threshold, branching_factor, ...]) Implements the Birch clustering algorithm.
cluster.DBSCAN([eps, min_samples, metric, ...]) Perform DBSCAN clustering from vector array or distance matrix.
cluster.FeatureAgglomeration([n_clusters, ...]) Agglomerate features.
cluster.KMeans([n_clusters, init, n_init, ...]) K-Means clustering
cluster.MiniBatchKMeans([n_clusters, init, ...]) Mini-Batch K-Means clustering
cluster.MeanShift([bandwidth, seeds, ...]) Mean shift clustering using a flat kernel.
cluster.SpectralClustering([n_clusters, ...]) Apply clustering to a projection to the normalized laplacian.

6，gbdt算法
from sklearn import ensemble
clf = ensemble.GradientBoostingRegressor(params)
clf.fit(X, Y)

predictY = clf.predict(newX)

ensemble有以下算法:
ensemble.AdaBoostClassifier([...]) An AdaBoost classifier.
ensemble.AdaBoostRegressor([base_estimator, ...]) An AdaBoost regressor.
ensemble.BaggingClassifier([base_estimator, ...]) A Bagging classifier.
ensemble.BaggingRegressor([base_estimator, ...]) A Bagging regressor.
ensemble.ExtraTreesClassifier([...]) An extra-trees classifier.
ensemble.ExtraTreesRegressor([n_estimators, ...]) An extra-trees regressor.
ensemble.GradientBoostingClassifier([loss, ...]) Gradient Boosting for classification.
ensemble.GradientBoostingRegressor([loss, ...]) Gradient Boosting for regression.
ensemble.IsolationForest([n_estimators, ...]) Isolation Forest Algorithm
ensemble.RandomForestClassifier([...]) A random forest classifier.
ensemble.RandomTreesEmbedding([...]) An ensemble of totally random trees.
ensemble.RandomForestRegressor([...]) A random forest regressor.
ensemble.VotingClassifier(estimators[, ...]) Soft Voting/Majority Rule classifier for unfitted estimators.

7，贝叶斯算法
from sklean import naive_bayes

clf = naive_bayes.GaussianNB()
clf.fit(X, Y)
predictY = clf.predict(newX)

naive_bayes有以下几种算法:
naive_bayes.GaussianNB([priors]) Gaussian Naive Bayes (GaussianNB)
naive_bayes.MultinomialNB([alpha, ...]) Naive Bayes classifier for multinomial models
naive_bayes.BernoulliNB([alpha, binarize, ...]) Naive Bayes classifier for multivariate Bernoulli models.

8,高斯混合模型
from sklearn import mixture
clf = mixture.GaussianMixture()
clf.fit(X_train)
...

mixture有以下算法:
mixture.GaussianMixture([n_components, ...]) Gaussian Mixture.
mixture.BayesianGaussianMixture([...]) Variational Bayesian estimation of a Gaussian mixture.

0 0