机器学习&数据挖掘常用知识点
来源:互联网 发布:python new 单例 编辑:程序博客网 时间:2024/06/13 09:28
该知识点列表主要来自:http://www.36dsj.com/archives/20135
Basis
- MSE(MeanSquare Error 均方误差)
- LMS (Least Mean Square)
- LSM (Least Square Methods)
- MLE (Maximum Likelihood Estimation)
- QP(QuadraticProgramming 二次规划)
- CP (Conditional Probability)
- JP (Joint Probability)
- MP(Marginal Probability边缘概率)
- Bayesian Formula
- L1 /L2Regularization(L1/L2正则,以及更多的,现在比较火的L2.5正则等)
- GD(Gradient Descent 梯度下降)
- SGD(Stochastic GradientDescent 随机梯度下降)
- Eigenvalue(特征值)
- Eigenvector(特征向量)
- QR-decomposition(QR分解)
- Quantile(分位数)
- Covariance(协方差矩阵)
Common Distribution
Discrete Distribution
- Bernoulli Distribution/Binomial(贝努利分/二项分布)
- Negative BinomialDistribution(负二项分布)
- Multinomial Distribution(多项式分布)
- Geometric Distribution(几何分布)
- Hypergeometric Distribution(超几何分布)
- Poisson Distribution (泊松分布)
ContinuousDistribution
- Uniform Distribution(均匀分布)
- Normal Distribution/GuassianDistribution(正态分布/高斯分布)
- Exponential Distribution(指数分布)
- Lognormal Distribution(对数正态分布)
- Gamma Distribution(Gamma分布)
- Beta Distribution(Beta分布)
- Dirichlet Distribution(狄利克雷分布)
- Rayleigh Distribution(瑞利分布)
- Cauchy Distribution(柯西分布)
- Weibull Distribution (韦伯分布)
Three Sampling Distribution
- Chi-square Distribution(卡方分布)
- t-distribution(t-distribution)
- F-distribution(F-分布)
Data Pre-processing
- MissingValue Imputation(缺失值填充)
- Discretization(离散化)
- Mapping(映射)
- Normalization(归一化/标准化)
Sampling
- SimpleRandom Sampling(简单随机采样)
- Offline Sampling(离线等可能K采样)
- Online Sampling(在线等可能K采样)
- Ratio-based Sampling(等比例随机采样)
- Acceptance-rejection Sampling(接受-拒绝采样)
- Importance Sampling(重要性采样)
- MCMC(Markov Chain MonteCarlo 马尔科夫蒙特卡罗采样算法:Metropolis-Hasting& Gibbs)
Clustering
- K-Means
- K-Mediods
- 二分K-Means
- FK-Means
- Canopy
- Spectral-KMeans(谱聚类)
- GMM-EM(混合高斯模型-期望最大化算法解决)
- K-Pototypes
- CLARANS(基于划分)
- BIRCH(基于层次)
- CURE(基于层次)
- DBSCAN(基于密度)
- CLIQUE(基于密度和基于网格)
Clustering Effectiveness Evaluation
- Purity(纯度)
- RI(Rand Index,芮氏指标)
- ARI(Adjusted Rand Index,调整的芮氏指标)
- NMI(NormalizedMutual Information,规范化互信息)
- F-meaure(F测量)等
Classification & Regression
- LR(LinearRegression 线性回归)
- LR(Logistic Regression逻辑回归)
- SR(SoftmaxRegression 多分类逻辑回归)
- GLM(Generalized LinearModel 广义线性模型)
- RR(Ridge Regression 岭回归/L2正则最小二乘回归)
- LASSO(Least AbsoluteShrinkage and Selectionator Operator L1正则最小二乘回归)
- RF(随机森林)
- DT(Decision Tree决策树)
- GBDT(Gradient BoostingDecision Tree 梯度下降决策树)
- CART(Classification AndRegression Tree 分类回归树)
- KNN(K-Nearest Neighbor K近邻)
- SVM(Support Vector Machine)
- KF(Kernel Function 核函数Polynomial KernelFunction 多项式核函数、Guassian Kernel Function 高斯核函数/Radial Basis Function RBF径向基函数、String Kernel Function 字符串核函数)
- NB(Naive Bayes 朴素贝叶斯)
- BN(BayesianNetwork/Bayesian Belief Network/Belief Network 贝叶斯网络/贝叶斯信度网络/信念网络)
- LDA(Linear DiscriminantAnalysis/Fisher Linear Discriminant 线性判别分析/Fisher线性判别)
- EL(Ensemble Learning集成学习Boosting,Bagging,Stacking)
- AdaBoost(AdaptiveBoosting 自适应增强)
- MEM(Maximum Entropy Model最大熵模型)
Classification Effectiveness Evaluation
- ConfusionMatrix(混淆矩阵)
- Precision(精确度)
- Recall(召回率)
- Accuracy(准确率)
- F-score(F得分)
- ROC Curve(ROC曲线)
- AUC(AUC面积)
- Lift Curve(Lift曲线)
- KS Curve(KS曲线)
PGM (Probabilistic Graphical Models)
- BN(BayesianNetwork/Bayesian Belief Network/ Belief Network
贝叶斯网络/贝叶斯信度网络/信念网络) - MC(Markov Chain 马尔科夫链)
- HMM(Hidden MarkovModel 马尔科夫模型)
- MEMM(Maximum EntropyMarkov Model 最大熵马尔科夫模型)
- CRF(Conditional RandomField 条件随机场)
- MRF(Markov RandomField 马尔科夫随机场)
NN (Neural Network)
- ANN(ArtificialNeural Network 人工神经网络)
- BP(Error Back Propagation 误差反向传播)
Deep Learning
- Auto-encoder(自动编码器)
- SAE(Stacked Auto-encoders堆叠自动编码器:Sparse Auto-encoders稀疏自动编码器、Denoising Auto-encoders 去噪自动编码器、ContractiveAuto-encoders 收缩自动编码器)
- RBM(Restricted BoltzmannMachine 受限玻尔兹曼机)
- DBN(Deep BeliefNetwork 深度信念网络)
- CNN(Convolutional NeuralNetwork 卷积神经网络)
- Word2Vec(词向量学习模型)
Dimensionality Reduction
- LDA(LinearDiscriminant Analysis/Fisher Linear Discriminant
线性判别分析/Fish线性判别) - PCA(Principal ComponentAnalysis 主成分分析)
- ICA(Independent ComponentAnalysis 独立成分分析)
- SVD(Singular ValueDecomposition 奇异值分解)
- FA(Factor Analysis 因子分析法)
Text Mining
- VSM(Vector SpaceModel向量空间模型)
- Word2Vec(词向量学习模型)
- TF(Term Frequency词频)
- TF-IDF(TermFrequency-Inverse Document Frequency 词频-逆向文档频率)
- MI(Mutual Information 互信息)
- ECE(Expected CrossEntropy 期望交叉熵)
- QEMI(二次信息熵)
- IG(Information Gain 信息增益)
- IGR(InformationGain Ratio 信息增益率)
- Gini(基尼系数)
- x2 Statistic(x2统计量)
- TEW(Text EvidenceWeight文本证据权)
- OR(OddsRatio 优势率)
- N-Gram Model
- LSA(LatentSemantic Analysis 潜在语义分析)
- PLSA(ProbabilisticLatent Semantic Analysis 基于概率的潜在语义分析)
- LDA(Latent DirichletAllocation 潜在狄利克雷模型)
- SLM(StatisticalLanguage Model,统计语言模型)
- NPLM(NeuralProbabilistic Language Model 神经概率语言模型)
- CBOW(Continuous Bag of Words Model 连续词袋模型)
- Skip-gram(Skip-gramModel)
Association Mining
- Apriori
- FP-growth(FrequencyPattern Tree Growth 频繁模式树生长算法)
- AprioriAll
- Spade
Recommendation System
- DBR (Demographic-based Recommendation)
- CBR (Context-based Recommendation)
- CF (Collaborative Filtering)
- UCF (User-based Collaborative Filtering Recommendation)
- ICF (Item-based Collaborative Filtering Recommendation)
Similarity Measure & Distance Measure
- EuclideanDistance(欧式距离)
- Manhattan Distance(曼哈顿距离)
- Chebyshev Distance(切比雪夫距离)
- Minkowski Distance(闵可夫斯基距离)
- Standardized EuclideanDistance(标准化欧氏距离)
- Mahalanobis Distance(马氏距离)
- Cos(Cosine 余弦)
- Hamming Distance/EditDistance(汉明距离/编辑距离)
- Jaccard Distance(杰卡德距离)
- Correlation CoefficientDistance(相关系数距离)
- Information Entropy(信息熵)
- KL(Kullback-LeiblerDivergence KL散度/Relative Entropy 相对熵)
Optimization
Non-constrained Optimization
- Cyclic Variable Methods(变量轮换法)
- Pattern Search Methods(模式搜索法)
- Variable Simplex Methods(可变单纯形法)
- Gradient Descent Methods(梯度下降法)
- Newton Methods(牛顿法)
- Quasi-Newton Methods(拟牛顿法)
- Conjugate GradientMethods(共轭梯度法)
Constrained Optimization
- Approximation ProgrammingMethods(近似规划法)
- Feasible DirectionMethods(可行方向法)
- Penalty Function Methods(罚函数法)
- Multiplier Methods(乘子法)
- HeuristicAlgorithm(启发式算法)
- SA(Simulated Annealing,模拟退火算法)
- GA(genetic algorithm遗传算法)
Feature Selection
- MutualInformation(互信息)
- Document Frequence(文档频率)
- Information Gain(信息增益)
- Chi-squared Test(卡方检验)
- Gini(基尼系数)
Outlier Detection
- Statistic-based(基于统计)
- Distance-based(基于距离)
- Density-based(基于密度)
- Clustering-based(基于聚类)
Learning to Rank
- Pointwise:McRank
- Pairwise:RankingSVM,RankNet,Frank,RankBoost
- Listwise:AdaRank,SoftRank,LamdaMART
Tool
- MPI
- Hadoop生态圈
- Spark
- BSP
- Weka
- Mahout
- Scikit-learn
- PyBrain
0 0
- 常用的机器学习&数据挖掘知识点
- 常用的机器学习&数据挖掘知识点
- 常用的数据挖掘&机器学习知识点
- 机器学习&数据挖掘常用知识点
- 【基础】常用的机器学习&数据挖掘知识点
- 【基础】常用的机器学习&数据挖掘知识点
- 【基础】常用的机器学习&数据挖掘知识点
- 【基础】常用的机器学习&数据挖掘知识点
- 常用的机器学习&数据挖掘知识点[转]
- 【基础】常用的机器学习&数据挖掘知识点
- 常见的机器学习&数据挖掘知识点
- 常见的机器学习&数据挖掘知识点
- 机器学习 数据挖掘知识点总结大纲
- 机器学习&数据挖掘知识点大总结
- 机器学习&数据挖掘知识点大总结
- 常见的机器学习&数据挖掘知识点
- 常见的机器学习&数据挖掘知识点
- 学好这些你就牛了,常用的机器学习&数据挖掘知识点
- Ubuntu安装JDK
- "Permission denied: '/usr/local/man/man1/nosetests.1'"解决方法
- Kinect v2.0原理介绍之三:骨骼跟踪的原理
- (7.2.8)SQL Server 数据库定时自动备份
- 字符串操作
- 机器学习&数据挖掘常用知识点
- 终于有SpringMvc与Struts2的对比啦
- HDU 2046 骨牌铺方格
- 行香子·过尽千山
- Android中PopupWindow显示在指定位置
- 黑马程序员——JAVA基础---继承
- HDU 4815 Little Tiger vs. Deep Monkey
- vb.net操作word
- 逆转数