Machine Learning Concepts

来源：互联网发布：高斯滤波实现的算法编辑：程序博客网时间：2024/06/09 08:09

参考： Machine Learning Concepts ，周志华的西瓜书《机器学习》。

Machine learning (ML) can help you use historical data to make better business decisions. ML algorithms discover patterns in data, and construct mathematical models using these discoveries. Then you can use the models to make predictions on future data. For example, one possible application of a machine learning model would be to predict how likely a customer is to purchase a particular product based on their past behavior.

Building a Machine Learning Application

Building ML applications is an iterative process that involves a sequence of steps. To build an ML application, follow these general steps:

Frame the core ML problem(s) in terms of what is observed and what answer you want the model to predict.
Collect, clean, and prepare data to make it suitable for consumption by ML model training algorithms. Visualize and analyze the data to run sanity checks to validate the quality of the data and to understand the data.
Often, the raw data (input variables) and answer (target) are not represented in a way that can be used to train a highly predictive model. Therefore, you typically should attempt to construct more predictive input representations or features from the raw variables.
Feed the resulting features to the learning algorithm to build models and evaluate the quality of the models on data that was held out from model building.
Use the model to generate predictions of the target answer for new data instances.

概念含义 data set 数据集 instance 示例， sample 样本，feature vector 特征向量数据集的一条记录 attribute 属性， feature 特征 attribute space 属性空间，sample space 样本空间，输入空间 dimensionality 维度 learning 学习，training 训练通过执行学习算法从数据中学得模型的过程 training data 训练数据训练过程中使用的数据 training sample 训练样本训练数据中的一个样本 training set 训练集训练样本组成的集合 hypothesis 假设 ground-truth 真相或真实 prediction 预测 label 标记 example 样例拥有了标记信息的示例 label space 标记空间，输出空间 testing 测试使用模型进行测试的过程 testing sample 测试样本， testing instance 测试示例用于测试的样本 generalization 泛化将模型应用于新样本 induction 归纳泛化过程 deduction 演绎 specialization 特殊化 inductive learning 归纳学习 concept 概念概念学习，概念形成狭义的归纳学习 version space 版本空间与训练集相一致的假设集合 inductive bias 归纳偏好 Occam’s razor 奥卡姆剃须刀选择最简单的那个一致的假设 error rate 错误率，分类最常用的性能度量 accuracy 精度 = 1 - 错误率 error 误差 empirical error 经验误差 training error 训练误差，empirical error 经验误差 generalization error 泛化误差 overfitting 过拟合 underfitting 欠拟合 model selection 模型选择学习算法，参数的选择 testing set 测试集 testing error 测试误差 hold-out 留出法 sampling 采样 stratified sampling 分层采样保留类别比例的采样方式 fidelity 保真性使用数据集训练出的模型与使用训练集训练出的模型的一致性 cross validation 交叉验证法 Leave-One-Out 留一法 bootstrapping 自助法 parameter tuning 调参 validation set 验证集 performance measure 性能度量 MSE mean squared error 均方误差回归最常用的性能度量

阅读全文

0 0