mvpa2.mappers.som.SimpleSOMMapper介绍

来源：互联网发布：抽签抓阄的软件编辑：程序博客网时间：2024/06/11 05:14

SOM，自组织非监督学习的分类算法，由Kohonen设计。
在python中找到一个mvpa库包含som的映射方法。
在这里说明一下一半的流程和要点。

1.导入模块

'''Created on May,5th 2016Email:996553232@qq.comAuthor: Trich_Tu'''import numpy as np import pandas as pd import mvpa2.mappers.som as msom

数据收集

#data collectiondataMat=pd.DataFrame.from_csv('septrain.csv')# dataMat=dataMat[:1000]dataMat=dataMat[dataMat['rain']>=0.1]rain=dataMat['rain']dataMat.drop('rain',axis=1)dataMat=np.array(dataMat)

创造一个som的实例

#creat a SOM instancesom=msom.SimpleSOMMapper((20, 30), 400, learning_rate=0.05)

通过数据训练Kohonen层

#train units by datasom.train(dataMat)

在训练之后，可以通过对原数据再次映射进行查看所属的类别。
比如这里mapped是一个（样本数，2）的数组。代表着背每个样本所在的Kohonen层的行列数。

#get belongs for each data mapped = som(dataMat)

计算一下每个unit的平均降水量

#check average for each unitsaverain=np.zeros((20,30))number=np.zeros((20,30))for index in xrange(len(mapped)):    i,j=mapped[index]    averain[i,j]+=rain[index]    number[i,j]+=1

看每个unit特征值中心点的值，每个样本分类是默认是按照欧式距离（就是直线距离）判断的，所以如果特征只里面有重复的信息，那每个特征值就不是等权的了，可以先进行PCA算法，然后按照解释方差大小的加权求距离。

#check the unit value# print som.K#predict for new datanewdata=dataMat[1000:2000]labels=som(newdata)#check average for each unitsaverain=np.zeros((20,30))number=np.zeros((20,30))for index in xrange(len(newdata)):    i,j=labels[index]    averain[i,j]+=rain[index]    number[i,j]+=1print averainprint numaverain=averain/number

得出的20*30unit的样本数量，平均降水率

这里写图片描述

我有用只有两个特征的数据做了一下，降水率的图超级美
这里写图片描述

下面是官网上的该映射的自带函数

Available conditional attributes:
calling_time+: Time (in seconds) it took to call the node
raw_results: Computed results before invoking postproc. Stored only if postproc is not None.
trained_dataset: 训练的数据集名称
trained_nsamples+: 样本数量
trained_targets+: Set of unique targets (or any other space) it has been trained on (if present in the dataset trained on)
training_time+: 训练网络花费的时间
(Conditional attributes enabled by default suffixed with +)

Attributes
K Provide access to the Kohonen layer.输出Kohonen的数据
auto_train Whether the Learner performs automatic trainingwhen called untrained.
descr Description of the object if any描述，如果有。
force_train Whether the Learner enforces training upon every call.是否每次调用的时候都强制训练（应该是在线训练的开关吧）
is_trained 是否在训练
pass_attr Which attributes of the dataset or self.ca to pass into result dataset upon call，用了哪些特征值
postproc Node to perform post-processing of results，结果的后处理
space Processing space name of this node这个节点的空间名称

Methods
（下面这块还不太会用）
call(ds)
forward(data) Map data from input to output space. 将输入映射到输出空间
forward1(data) Wrapper method to map single samples.
generate(ds) Yield processing results.获得处理结果
get_postproc() Returns the post-processing node or None.返回后处理节点。
get_space() Query the processing space name of this node.
reset() 查询这个节点的空间名
reverse(data) Reverse-map data from output back into input space.将输入结果逆映射到输入
reverse1(data) Wrapper method to map single samples.
set_postproc(node) Assigns a post-processing node 分配一个后处理节点
set_space(name) Set the processing space name of this node.设置这个节点处理的空间名字。
train(ds) The default implementation calls _pretrain(), _train(), and finally _posttrain().默认的流程，预处理、训练和后训练
untrain() Reverts changes in the state of this node caused by previous training返回到这个节点的训练之前的状态

0 0