Machine Learning Concepts

来源:互联网 发布:高斯滤波实现的算法 编辑:程序博客网 时间:2024/06/09 08:09

参考: Machine Learning Concepts ,周志华的西瓜书 《机器学习》。

Machine learning (ML) can help you use historical data to make better business decisions. ML algorithms discover patterns in data, and construct mathematical models using these discoveries. Then you can use the models to make predictions on future data. For example, one possible application of a machine learning model would be to predict how likely a customer is to purchase a particular product based on their past behavior.

Building a Machine Learning Application

Building ML applications is an iterative process that involves a sequence of steps. To build an ML application, follow these general steps:

  1. Frame the core ML problem(s) in terms of what is observed and what answer you want the model to predict.

  2. Collect, clean, and prepare data to make it suitable for consumption by ML model training algorithms. Visualize and analyze the data to run sanity checks to validate the quality of the data and to understand the data.

  3. Often, the raw data (input variables) and answer (target) are not represented in a way that can be used to train a highly predictive model. Therefore, you typically should attempt to construct more predictive input representations or features from the raw variables.

  4. Feed the resulting features to the learning algorithm to build models and evaluate the quality of the models on data that was held out from model building.

  5. Use the model to generate predictions of the target answer for new data instances.

Created with Raphaël 2.1.0数据学习算法数学模型
Created with Raphaël 2.1.0新数据数学模型预测的结果
概念 含义 data set 数据集 instance 示例, sample 样本,feature vector 特征向量 数据集的一条记录 attribute 属性, feature 特征 attribute space 属性空间,sample space 样本空间,输入空间 dimensionality 维度 learning 学习,training 训练 通过执行学习算法从数据中学得模型的过程 training data 训练数据 训练过程中使用的数据 training sample 训练样本 训练数据中的一个样本 training set 训练集 训练样本组成的集合 hypothesis 假设 ground-truth 真相或真实 prediction 预测 label 标记 example 样例 拥有了标记信息的示例 label space 标记空间,输出空间 testing 测试 使用模型进行测试的过程 testing sample 测试样本, testing instance 测试示例 用于测试的样本 generalization 泛化 将模型应用于新样本 induction 归纳 泛化过程 deduction 演绎 specialization 特殊化 inductive learning 归纳学习 concept 概念 概念学习,概念形成 狭义的归纳学习 version space 版本空间 与训练集相一致的假设集合 inductive bias 归纳偏好 Occam’s razor 奥卡姆剃须刀 选择最简单的那个一致的假设 error rate 错误率,分类最常用的性能度量 accuracy 精度 = 1 - 错误率 error 误差 empirical error 经验误差 training error 训练误差,empirical error 经验误差 generalization error 泛化误差 overfitting 过拟合 underfitting 欠拟合 model selection 模型选择 学习算法,参数的选择 testing set 测试集 testing error 测试误差 hold-out 留出法 sampling 采样 stratified sampling 分层采样 保留类别比例的采样方式 fidelity 保真性 使用数据集训练出的模型与使用训练集训练出的模型的一致性 cross validation 交叉验证法 Leave-One-Out 留一法 bootstrapping 自助法 parameter tuning 调参 validation set 验证集 performance measure 性能度量 MSE mean squared error 均方误差 回归最常用的性能度量