亚马逊机器学习服务的实例基本操作 -- 自定义ML模型

来源：互联网发布：摇一摇截图软件大全编辑：程序博客网时间：2024/04/30 04:19

亚马逊机器学习服务的实例基本操作 -- 自定义ML模型
Amazon Machine Learning

https://aws.amazon.com/machine-learning/

Steps

Step 1: Prepare Your Data
Step 2: Create a Training Datasource
Step 3: Create an ML Model
Step 4: Review the ML Model's Predictive Performance
Step 5: Use the ML Model to Generate Predictions
Step 6: Clean Up

准备数据(清洗，转换...)→选模型→检查结果→预测新数据→清理

1. 查看控制台

2. 选择数据集

3. 创建模型 – 自定义

4. 调整食谱 Recipe

缺省Recipe内容

{

"groups": {

"NUMERIC_VARS_QB_50": "group('emp_var_rate','cons_price_idx')",

"NUMERIC_VARS_QB_500": "group('campaign','age')",

"NUMERIC_VARS_QB_10": "group('duration','cons_conf_idx','previous','nr_employed','euribor3m','pdays')"

"assignments": {},

"outputs": [

"ALL_BINARY",

"ALL_CATEGORICAL",

"quantile_bin(NUMERIC_VARS_QB_50,50)",

"quantile_bin(NUMERIC_VARS_QB_500,500)",

"quantile_bin(NUMERIC_VARS_QB_10,10)"

]

}

修改为：

{

"groups": {

"NUMERIC_VARS_QB_10": "group('emp_var_rate','campaign')"

"assignments": {

"myassign" : "quantile_bin(NUMERIC_VARS_QB_10, 10)"

"outputs": [

"ALL_BINARY",

"ALL_CATEGORICAL",

"myassign"

]

}

5. 设置训练参数

Training Parameters

参考http://docs.aws.amazon.com/machine-learning/latest/dg/training-parameters.html?icmpid=docs_machinelearning_console

可以设置的参数如下：

Maximum model size

Maximum number of passes over training data

Shuffle type

Regularization type

Regularization amount

6. 设置评估参数

7. 概览

8. 切换到Dashboard

9. 使用某行原始数据

36,admin.,married,university.degree,no,no,no,cellular,jun,mon,174,1,3,1,success,-2.9,92.963,-40.8,1.266,5076.2

（原始数据，第10行，原始的目标列的值为 1）

回归分析，给出的预测值为 0.38

附录

1. 什么是Recipe

参考http://docs.aws.amazon.com/machine-learning/latest/dg/feature-transformations-with-data-recipes.html

a）特征变量需要处理，有两种方式，一种是上传AWS之前自处理，另种是AWS预定义的数据转换功能（即Recipe）。

举例：

如event发生的时间，整体来看，只发生一次，没有意义，但是，如果拆分出小时或者weekday，也许可以预测在哪段时间发生频率高。

b）三个基本部分

参考：http://docs.aws.amazon.com/machine-learning/latest/dg/recipe-format-reference.html

注释符： //

不是严格的JSON格式，只有以下三个部分：

Groups
Assignments
Outputs

内建的groups 有：

ALL_TEXT, ALL_NUMERIC, ALL_CATEGORICAL, ALL_BINARY

ALL_INPUTS

outputs节，说明了ML模型，能够“看到”哪些数据，包含的项目，可以是

组，变量名字，或者函数

其中，组的定义，来自Groups节，

变量，可以是原始的字段名，或者是Assignments节中定义的临时变量，

函数，具体有哪些，可以参考 AWS 中相关文档。

c）处理方法

语法参考：http://docs.aws.amazon.com/machine-learning/latest/dg/data-transformations-reference.html

比较重要的，举例说明如下：

quantile_bin(var1, 50)

var1是Numeric类型的，根据其数值，分到50个categorial中，即分到50个组中

组的大小，必须>= 5 && <= 1000。

normalize(var1)

将变量var1的值归一化。

更常用的，如normalize(ALL_NUMERIC) ，归一化每条记录。

cartesian(var1, var2)

对变量var1 和 var2，求他们的笛卡尔乘积。

2. 二分分类Binary Model

横轴，是得分 Score，一般使用 Sigmoid函数把输入值映射到区间 (0, 1) 内。

纵轴，是频度，某个分值Score 出现的次数。

ML Model Accuracy

Correct Predictions

True positive (TP): Amazon ML predicted the value as 1, and the true value is 1.
True negative (TN): Amazon ML predicted the value as 0, and the true value is 0.

Erroneous Predictions

False positive (FP): Amazon ML predicted the value as 1, but the true value is 0.
False negative (FN): Amazon ML predicted the value as 0, but the true value is 1.

Advanced Metrics

Accuracy, precision, recall, and false positive rate.

Accuracy

Accuracy (ACC) measures the fraction of correct predictions. The range is 0 to 1. A larger value indicates better predictive accuracy:

Precision

Precision measures the fraction of actual positives among those examples that are predicted as positive. The range is 0 to 1. A larger value indicates better predictive accuracy:

3. Sigmoid函数

也称为S形函数。

导数

形状

Multiclass Model

Regression Model

4. ReLU函数

线性整流函数（Rectified Linear Unit, ReLU）,又称修正线性单元, 是一种人工神经网络中常用的激活函数（activation function）

0 0

亚马逊 机器学习 服务 的实例 基本操作 -- 自定义ML模型

亚马逊机器学习服务的实例基本操作 -- 自定义ML模型