R dbscan()中文说明

来源：互联网发布：蜗牛睡眠知乎编辑：程序博客网时间：2024/05/21 10:01

R dbscan()中文说明

dbscan(fpc)

dbscan()所属R语言包：fpc

                                    DBSCAN density reachability and connectivity clustering
                                       DBSCAN密�%BQ�可达性和连通性聚类

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Generates a density based clustering of arbitrary shape as introduced in Ester et al. (1996).
生成一个任意形状的基于密度的聚类酯等中所介绍的。（1996年）。

用法----------Usage----------

  dbscan(data, eps, MinPts = 5, scale = FALSE, method = c("hybrid", "raw",
"dist"), seeds = TRUE, showplot = FALSE, countmode = NULL)
  ## S3 method for class 'dbscan'
print(x, ...)
  ## S3 method for class 'dbscan'
plot(x, data, ...)
  ## S3 method for class 'dbscan'
predict(object, data, newdata = NULL,
predict.max=1000, ...)

参数----------Arguments----------

参数：data
data matrix, data.frame, dissimilarity matrix or dist-object
数据矩阵，数据框，相异矩阵或dist对象

参数：eps
Reachability Distance
可达距离

参数：MinPts
Reachability minimum no. of points
不可达性最低。点

参数：scale
scale the data
缩放数据

参数：method
"dist" treats data as distance matrix (relatively fast but memory expensive), "raw" treats data as raw data and avoids calculating a distance matrix (saves memory but may be slow), "hybrid" expects also raw data, but calculates partial distance matrices (very fast with moderate memory requirements )
“区”处理数据的距离矩阵（比较快，但内存价格昂贵），“原始”将数据作为原始数据，避免了计算距离矩阵（节省内存，但是可能会比较慢），“混合”预计也原始数据，但计算部分距离矩阵（非常快速有温和的内存要求）

参数：seeds
FALSE to not include the isseed-vector in the dbscan-object
FALSE不包含向量isseeddbscan对象

参数：showplot
0 = no plot, 1 = plot per iteration, 2 = plot per subiteration
0 =没有图，图每次迭代，2个=每亚迭代的图

参数：countmode
NULL or vector of point numbers at which to report progress
在汇报工作进度NULL或向量的点号

参数：x
object of class dbscan.
对象类dbscan。

参数：object
object of class dbscan.
对象类dbscan。

参数：newdata
matrix or data.frame with raw data to predict
矩阵或数据框的原始数据来预测

参数：predict.max
max. batch size for predictions
最大。批量大小预测

参数：...
Further arguments transferred to plot methods.
进一步的论据来绘制方法。

Details

详细信息----------Details----------

Clusters require a minimum no of points (MinPts) within a maximum distance (eps) around one of its members (the seed). Any point within eps around any point which satisfies the seed condition is a cluster member (recursively). Some points may not belong to any clusters (noise).
聚类需要一个最低点（MinPts）没有围绕其成员（种子）内的最大距离（EPS）。 EPS内的任何点周围的任何一点满足的种子条件是聚类成员（递归）。有些点可能不属于任何聚类（噪声）。

We have clustered a 100.000 x 2 dataset in 40 minutes on a Pentium M 1600 MHz.
我们聚集一100.000×2的Pentium M 1600 MHz的数据集在40分钟内。

print.dbscan shows a statistic of the number of points belonging to the clusters that are seeds and border points.
print.dbscan属于种子和边界点的簇的数量的点示出的统计信息。

plot.dbscan distinguishes between seed and border points by plot symbol.
plot.dbscan区分图符号的种子和边界点。

值----------Value----------

predict.dbscan gives out a vector of predicted clusters for the points in newdata.
predict.dbscan给出了一个向量的预测聚类中的点newdata。

dbscan gives out  an object of class 'dbscan' which is a LIST with components <table summary="R valueblock"> <tr valign="top"><td>cluster</td> <td> integer vector coding cluster membership with noise observations (singletons) coded as 0 </td></tr> <tr valign="top"><td>isseed</td> <td> logical vector indicating whether a point is a seed (not border, not noise)</td></tr> <tr valign="top"><td>eps</td> <td> parameter eps</td></tr> <tr valign="top"><td>MinPts</td> <td> parameter MinPts</td></tr> </table>
dbscan给出了一个对象类的DBSCAN“这是一个组件列表，表summary="R valueblock"> <tr valign="top"> <TD> cluster </ TD> < TD>整数向量编码聚类成员编码为0的的噪声意见（单身）</ TD> </ TR> <tr valign="top"> <TD>isseed </ TD> <TD>逻辑向量表示一个点是否是一个种子（没有边界，没有噪音）</ TD> </ TR> <tr valign="top"> <TD>eps </ TD> <TD>参数EPS </ TD> </ TR> <tr valign="top"> <TD>MinPts </ TD> <TD>参数MinPts </ TD> </ TR> </ TABLE>

注意----------Note----------

this is a simplified version of the original algorithm (no K-D-trees
这是一个简化版本，对原有算法（不KD-树

（作者）----------Author(s)----------

Jens Oehlschlaegel, based on a draft by Christian Hennig.

参考文献----------References----------

(1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data

实例----------Examples----------

  set.seed(665544)
  n <- 600
  x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
sd=0.2))
  par(bg="grey40")
  ds <- dbscan(x, 0.2)
# run with showplot=1 to see how dbscan works.[与showplot = 1，，看如何DBSCAN工作。]
  ds
  plot(ds, x)

  x2 <- matrix(0,nrow=4,ncol=2)
  x255B1,] <- c(5,2)
  x2[2,] <- c(8,3)
  x2[3,] <- c(4,4)
  x2[4,] <- c(9,9)
  predict(ds, x, x2)

  n <- 600
  x <- cbind((1:3)+rnorm(n, sd=0.2), (1:3)+rnorm(n, sd=0.2))

  system.time(ds <- dbscan(x, 0.3, countmode=NULL, method="raw"))[3]
  system.time(dsb <- dbscan(x, 0.3, countmode=NULL, method="hybrid"))[3]
  system.time(dsc <- dbscan(dist(x), 0.3, countmode=NULL,
method="dist"))[3]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

0 0