R dbscan()中文说明

来源:互联网 发布:蜗牛睡眠知乎 编辑:程序博客网 时间:2024/05/21 10:01
R dbscan()中文说明
 dbscan(fpc)
dbscan()所属R语言包:fpc

                                        DBSCAN density reachability and connectivity clustering
                                         DBSCAN密�%BQ�可达性和连通性聚类

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Generates a density based clustering of arbitrary shape as introduced in Ester et al. (1996).
生成一个任意形状的基于密度的聚类酯等中所介绍的。 (1996年)。


用法----------Usage----------


  dbscan(data, eps, MinPts = 5, scale = FALSE, method = c("hybrid", "raw",
    "dist"), seeds = TRUE, showplot = FALSE, countmode = NULL)
  ## S3 method for class 'dbscan'
print(x, ...)
  ## S3 method for class 'dbscan'
plot(x, data, ...)
  ## S3 method for class 'dbscan'
predict(object, data, newdata = NULL,
predict.max=1000, ...)



参数----------Arguments----------

参数:data
data matrix, data.frame, dissimilarity matrix or dist-object 
数据矩阵,数据框,相异矩阵或dist对象


参数:eps
Reachability Distance 
可达距离


参数:MinPts
Reachability minimum no. of points 
不可达性最低。点


参数:scale
scale the data 
缩放数据


参数:method
"dist" treats data as distance matrix (relatively fast but memory expensive), "raw" treats data as raw data and avoids calculating a distance matrix (saves memory but may be slow), "hybrid" expects also raw data, but calculates partial distance matrices (very fast with moderate memory requirements )
“区”处理数据的距离矩阵(比较快,但内存价格昂贵),“原始”将数据作为原始数据,避免了计算距离矩阵(节省内存,但是可能会比较慢),“混合”预计也原始数据,但计算部分距离矩阵(非常快速有温和的内存要求)


参数:seeds
FALSE to not include the isseed-vector in the dbscan-object
FALSE不包含向量isseeddbscan对象


参数:showplot
0 = no plot, 1 = plot per iteration, 2 = plot per subiteration 
0 =没有图,图每次迭代,2个=每亚迭代的图


参数:countmode
NULL or vector of point numbers at which to report progress 
在汇报工作进度NULL或向量的点号


参数:x
object of class dbscan.
对象类dbscan。


参数:object
object of class dbscan.
对象类dbscan。


参数:newdata
matrix or data.frame with raw data to predict 
矩阵或数据框的原始数据来预测


参数:predict.max
max. batch size for predictions  
最大。批量大小预测


参数:...
Further arguments transferred to plot methods.
进一步的论据来绘制方法。


Details

详细信息----------Details----------

Clusters require a minimum no of points (MinPts) within a maximum distance (eps) around one of its members (the seed). Any point within eps around any point which satisfies the seed condition is a cluster member (recursively). Some points may not belong to any clusters (noise).
聚类需要一个最低点(MinPts)没有围绕其成员(种子)内的最大距离(EPS)。 EPS内的任何点周围的任何一点满足的种子条件是聚类成员(递归)。有些点可能不属于任何聚类(噪声)。

We have clustered a 100.000 x 2 dataset in 40 minutes on a Pentium M 1600 MHz.
我们聚集一100.000×2的Pentium M 1600 MHz的数据集在40分钟内。

print.dbscan shows a statistic of the number of points belonging to the clusters that are seeds and border points.
print.dbscan属于种子和边界点的簇的数量的点示出的统计信息。

plot.dbscan distinguishes between seed and border points by plot symbol.
plot.dbscan区分图符号的种子和边界点。


值----------Value----------

predict.dbscan gives out a vector of predicted clusters for the points in newdata.
predict.dbscan给出了一个向量的预测聚类中的点newdata。

dbscan gives out  an object of class 'dbscan' which is a LIST with components <table summary="R valueblock"> <tr valign="top"><td>cluster</td> <td> integer vector coding cluster membership with noise observations (singletons) coded as 0 </td></tr> <tr valign="top"><td>isseed</td> <td> logical vector indicating whether a point is a seed (not border, not noise)</td></tr> <tr valign="top"><td>eps</td> <td> parameter eps</td></tr> <tr valign="top"><td>MinPts</td> <td> parameter MinPts</td></tr> </table>
dbscan给出了一个对象类的DBSCAN“这是一个组件列表,表summary="R valueblock"> <tr valign="top"> <TD> cluster </ TD> < TD>整数向量编码聚类成员编码为0的的噪声意见(单身)</ TD> </ TR> <tr valign="top"> <TD>isseed </ TD> <TD>逻辑向量表示一个点是否是一个种子(没有边界,没有噪音)</ TD> </ TR> <tr valign="top"> <TD>eps </ TD> <TD>参数EPS </ TD> </ TR> <tr valign="top"> <TD>MinPts </ TD> <TD>参数MinPts </ TD> </ TR> </ TABLE>


注意----------Note----------

this is a simplified version of the original algorithm (no K-D-trees
这是一个简化版本,对原有算法(不KD-树


(作者)----------Author(s)----------


Jens Oehlschlaegel, based on a draft by Christian Hennig.



参考文献----------References----------

(1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data

实例----------Examples----------


  set.seed(665544)
  n <- 600
  x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
    sd=0.2))
  par(bg="grey40")
  ds <- dbscan(x, 0.2)
# run with showplot=1 to see how dbscan works.[与showplot = 1,,看如何DBSCAN工作。]
  ds
  plot(ds, x)

  x2 <- matrix(0,nrow=4,ncol=2)
  x255B1,] <- c(5,2)
  x2[2,] <- c(8,3)
  x2[3,] <- c(4,4)
  x2[4,] <- c(9,9)
  predict(ds, x, x2)

  n <- 600
  x <- cbind((1:3)+rnorm(n, sd=0.2), (1:3)+rnorm(n, sd=0.2))


  system.time(ds <- dbscan(x, 0.3, countmode=NULL, method="raw"))[3] 
  system.time(dsb <- dbscan(x, 0.3, countmode=NULL, method="hybrid"))[3]
  system.time(dsc <- dbscan(dist(x), 0.3, countmode=NULL,
    method="dist"))[3]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。

0 0
原创粉丝点击