DBSCAN的matlab实现

来源:互联网 发布:海岛研究所数据大全 编辑:程序博客网 时间:2024/05/24 07:12

DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一个比较有代表性的基于密度的聚类算法。与划分和层次聚类方法不同,它将簇定义为密度相连的点的最大集合,能够把具有足够高密度的区域划分为簇,并可在噪声的空间数据库中发现任意形状的聚类。这是最原始的DBSCAN的matlab实现,没有加索引,只能处理小规模的数据。算法中先对数据进行归一化处理,实际问题可能不需要,注释掉就好了,就是这行  x=zscore(x);%standarlize

% -------------------------------------------------------------------------% Function: [class,type]=dbscan(x,k,Eps)% -------------------------------------------------------------------------% Aim: % Clustering the data with Density-Based Scan Algorithm with Noise (DBSCAN)% -------------------------------------------------------------------------% Input: % x - data set (m,n); m-objects, n-variables% k - number of objects in a neighborhood of an object % (minimal number of objects considered as a cluster)% Eps - neighborhood radius, if not known avoid this parameter or put []% -------------------------------------------------------------------------% Output: % class - vector specifying assignment of the i-th object to certain % cluster (m,1)% type - vector specifying type of the i-th object % (core: 1, border: 0, outlier: -1)% -------------------------------------------------------------------------% Example of use:% x=[randn(30,2)*.4;randn(40,2)*.5+ones(40,1)*[4 4]];% [class,type]=dbscan(x,5,[])% clusteringfigs('Dbscan',x,[1 2],class,type)% -------------------------------------------------------------------------% References:% [1] M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for % discovering clusters in large spatial databases with noise, proc. % 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, % p. 226, available from: % www.dbs.informatik.uni-muenchen.de/cgi-bin/papers?query=--CO% [2] M. Daszykowski, B. Walczak, D. L. Massart, Looking for % Natural Patterns in Data. Part 1: Density Based Approach, % Chemom. Intell. Lab. Syst. 56 (2001) 83-92 % -------------------------------------------------------------------------% Written by Michal Daszykowski% Department of Chemometrics, Institute of Chemistry, % The University of Silesia% December 2004% http://www.chemometria.us.edu.plfunction [class,type,clusteridx]=clu_dbscan_fn(x,k,Eps)x=zscore(x);%standarlize[m,~]=size(x);if nargin<3||isempty(Eps)   [Eps]=epsilon(x,k);endx=[(1:m)',x];[m,n]=size(x);type=zeros(1,m);no=1;touched=zeros(m,1);for i=1:m    if touched(i)==0;       ob=x(i,:);       D=dist(ob(2:n),x(:,2:n));       ind=find(D<=Eps);           if length(ind)>1 && length(ind)<k+1                 type(i)=0;          class(i)=0;       end       if length(ind)==1          type(i)=-1;          class(i)=-1;            touched(i)=1;       end       if length(ind)>=k+1;           type(i)=1;          class(ind)=ones(length(ind),1)*max(no);                    while ~isempty(ind)                ob=x(ind(1),:);                touched(ind(1))=1;                ind(1)=[];                D=dist(ob(2:n),x(:,2:n));                i1=find(D<=Eps);                     if length(i1)>1                   class(i1)=no;                   if length(i1)>=k+1;                      type(ob(1))=1;                   else                      type(ob(1))=0;                   end                   for k1=1:length(i1)                       if touched(i1(k1))==0                          touched(i1(k1))=1;                          ind=[ind,i1(k1)];                             class(i1(k1))=no;                       end                                       end                end          end          no=no+1;        end   endendi1=find(class==0);class(i1)=-1;type(i1)=-1;maxlab=max(class);clusteridx=[];clun=[];for ck=1:maxlab    tidx=find(class==ck);     clusteridx=[clusteridx;[tidx,zeros(1,m-length(tidx))]];    clun=[clun,length(tidx)];enddisp(clun);%...........................................function [Eps]=epsilon(x,k)% Function: [Eps]=epsilon(x,k)%% Aim: % Analytical way of estimating neighborhood radius for DBSCAN%% Input: % x - data matrix (m,n); m-objects, n-variables% k - number of objects in a neighborhood of an object% (minimal number of objects considered as a cluster)[m,n]=size(x);Eps=((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);disp('EPS:');disp(Eps);%............................................function [D]=dist(i,x)% function: [D]=dist(i,x)%% Aim: % Calculates the Euclidean distances between the i-th object and all objects in x %    % Input: % i - an object (1,n)% x - data matrix (m,n); m-objects, n-variables    %                                                                 % Output: % D - Euclidean distance (m,1)[m,n]=size(x);D=sqrt(sum((((ones(m,1)*i)-x).^2)'));if n==1   D=abs((ones(m,1)*i-x))';end%********************************************************

希望对大家有点用

1 1
原创粉丝点击