数据挖掘标准流程——CRISP-DM

来源:互联网 发布:莱汀rei 知乎 编辑:程序博客网 时间:2024/05/18 00:06

 WIKI上的CRISP-DM

CRISP-DM stands for Cross Industry Standard Process for Data Mining[1]. It is a data mining process model that describes commonly used approaches that expert data miners use to tackle problems. Polls conducted in 2002, 2004, and 2007 show that it is the leading methodology used by data miners.[2][3][4]

Contents

[hide]
  • 1Major phases
  • 2History
  • 3CRISP-DM 2.0
  • 4Advantages
  • 5References
  • 6External links

[edit]Major phases

CRISP-DM breaks the process of data mining into six major phases[5]:

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment

[edit]History

CRISP-DM was conceived in 1996. In 1997 it got underway as a European Union project under theESPRIT funding initiative. The project was led by four companies: ISL,NCR Corporation,Daimler-Benz andOHRA.

This core consortium brought different experiences to the project: ISL, later acquired and merged intoSPSS Inc. The computer giantNCR Corporation produced theTeradata data warehouse and its own data mining software.Daimler-Benz had a significant data mining team.OHRA, an insurance company, was just starting to explore the potential use of data mining.

The first version of the methodology was released as CRISP-DM 1.0 in 1999.

[edit]CRISP-DM 2.0

In July 2006 the consortium announced that it was going to start the process of working towards a second version of CRISP-DM. On 26 September 2006, theCRISP-DM SIG met to discuss potential enhancements for CRISP-DM 2.0 and the subsequent roadmap. However, these efforts appear to be stalled. The SIG has not met, updated the CRISP website, or communicated anything to members since early 2007. As of June 22, 2011, the website redirects to an IBM page about SPSS.

[edit]Advantages

  • Industry neutral
  • Tool neutral
  • Closely related to the Knowledge Discovery in Databases Process Model
  • Anchors the data mining process

[edit]References

  1. ^ Shearer C.The CRISP-DM model: the new blueprint for data mining. J Data Warehousing 2000;5:13—22.
  2. ^ Gregory Piatetsky-Shapiro (2002)KDnuggets Methodology Poll
  3. ^ Gregory Piatetsky-Shapiro (2004)KDnuggets Methodology Poll
  4. ^ Gregory Piatetsky-Shapiro (2007)KDnuggets Methodology Poll
  5. ^Harper, Gavin; Stephen D. Pickett (August 2006)."Methods for mining HTS data". Drug Discovery Today11 (15–16): 694–699.doi:10.1016/j.drudis.2006.06.006.PMID 16846796.http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6T64-4KDJSRH-4&_user=793840&_coverDate=08%2F31%2F2006&_rdoc=4&_fmt=full&_orig=browse&_srch=doc-info(%23toc%235020%232006%23999889984%23627946%23FLA%23display%23Volume)&_cdi=5020&_sort=d&_docanchor=&view=c&_ct=17&_acct=C000043460&_version=1&_urlVersion=0&_userid=793840&md5=f7f5b2376172e12b63177a32b03de111. 

[edit]External links

  • CRoss Industry Standard Process for Data Mining Blog
  • Le site des dataminers Article publié par Pascal BIZZARI, Mai 2009
  • The Data Mining Group (DMG): The DMG is an independent, vendor led group which develops data mining standards, such as the Predictive Model Markup Language (PMML)