KDD Cup竞赛介绍

来源:互联网 发布:比较好的拼图软件 编辑:程序博客网 时间:2024/05/17 02:18

前几天,看了篇导师正在审理的《工业工程》中的一篇稿件。是使用决策树在入侵检测中的研究和应用。其中,里面涉及到的数据是使用的KDD Cup99的数据,对该竞赛进行了了解。现将收集到的相关资料分享与大家。希望大家多多关注。

KDD Cup简介

 

KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.
(由SIGKDD(ACM Special Interest Group on Knowledge Discovery and Data Mining)组织,每年一次的KDD竞赛,和SIGKDD国际会议同期举行。同时面向学术界和业界。 )

 

here is the KDD Cup Center:

http://www.sigkdd.org/kddcup/index.php

 

历届KDD Cup的主题:
2004, 有指导分类的多种性能度量
2003, 网络挖掘及使用日志分析
2002, 生物信息及文本挖掘(分子生物学领域)
2001, 生物信息及医药(医药设计中的生物活性预测、预测基因/蛋白质的功能及定位)
2000, web挖掘任务(根据点击流及交易数据)
1999, 网络侵入侦测及报告
1998, 生成最佳直销名单
1997, 预测出最可能的善款捐赠人

KDD Cup 1997

http://www-aig.jpl.nasa.gov/public/kdd97/kdd_cup.html
Task
given data on past responders to fund-raising, predict most likely responders for new campaign
Dataset
321 fields/variables, Significant effort on data preprocessing
Participants
45 companies/institutions participated
16 contestants turned in their results
Shared 1-2 place
Charles Elkan, Ph.D. from University of California, San Diego (BNB, Boosted Naive Bayesian Classifier)
Urban Science Applications, Inc. (Gain, Direct Marketing Selection System)
3rd Place
Silicon Graphics, Inc (MineSet)

KDD Cup 1998

http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
Task: the goal was to select the best list to mail a solicitation
Dataset: 95412 records and 481 fields
Participants: 21 teams completed the challenge and submitted results
1st place: Urban Science Applications, Inc. (Software GainSmarts)
2nd place: SAS Institute, Inc. (Software Enterprise Miner)
3rd place: Quadstone Limited (Software Decisionhouse)

KDD Cup 1999

1.Classifier learning contest
http://www-cse.ucsd.edu/users/elkan/clresults.html
The goal was to build a predictive model for identifying network intrusions.
24 entries were submitted.
2.Knowledge discovery "report" contest
http://www.cse.ucsd.edu/users/elkan/kdresults.html
The goal was to apply a range of knowledge discovery techniques to the same data used in the 1998 competition, and discover higher-level knowledge from data .
Co-winners
J. Georges and A.Milley (SAS)
S. Rosset and A. Inger (Amdocs, Israel).
Honorable mention
Paola Sebastiani, Marco Ramoni, and Alexander Crea of Bayesware Ltd.

KDD Cup 2000

http://www.ecn.purdue.edu/KDDCUP/
Task: The questions related to clickstream and purchase data from an e-tailer. Five questions.
Dataset: Obtained from Gazelle.com, a legwear and legcare Web retailer
Over 150 teams requested data, 30 teams submitted the answers.
Questions 1 & 5 Winner: Amdocs
Exploratory Data Analysis – SAS, S Plus
Classification Tree, Rules Extraction – Amdocs Business Insight Tool
Questions 2 & 3 Winner: Salford Systems
Question 4 Winner: e-steam

KDD Cup 2001

http://www.cs.wisc.edu/~dpage/kddcup2001/
Problems from bioinformaitcs
Data set 1
Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin (task 1)
Data set 2
Prediction of Gene/Protein Function (task 2) and Localization (task 3)
136 groups , 200 submissions
Task 1 winner (Thrombin)
Jie Cheng (Canadian Imperial Bank of Commerce).
Bayesian network learner and classifier
Task 2 winner (Function)
Mark-A. Krogel (University of Magdeburg).
Inductive Logic programming
Task 3 winner (Localization)
Hisashi Hayashi, Jun Sese, and Shinichi Morishita (University of Tokyo).
K nearest neighbor

KDD Cup 2002

http://www.biostat.wisc.edu/~craven/kddcup/
Two tasks from molecular biology domains
Task 1: construct models that can assist genome annotators by automatically extracting information from scientific articles
Task 2: learn models that characterize the behavior of individual genes in a hidden experimental setting.
Task 1 winner
Yizhar Regev and Michal Finkelstein
ClearForest and Celera, USA
Task 2 winner
Adam Kowalczyk and Bhavani Raskutti
Telstra Research Laboratories, Australia
Single Class SVM

KDD Cup 2003

http://www.cs.cornell.edu/projects/kddcup/
Data set
A very large archive of research papers
Citation structure and (partial) data on the downloading of papers by users
Task
Task 1: predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference
Task 2: a citation graph of a large subset of the archive from only the LaTex sources
Task 3: each paper's popularity will be estimated based on partial download logs
Task 4: devise their own questions
Task 1 :
Claudia Perlich, Foster Provost, Sofus Kacskassy
New York University
Task 2:
David Vogel
AI Insight Inc.
Task 3 :
Janez Brank and Jure Leskovec
Jozef Stefan Institute, Slovenija
Task 4 :
Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen
University of Massachusetts, Amherst, USA

KDD Cup 2004

http://kodiak.cs.cornell.edu/kddcup/
April 28 --- July 14, 2004
两个问题,数据分别来自
生物信息学
量子物理学
不同性能指标下的数据挖掘问题
有来自49个国家的注册 (including .com)
优胜者来自China, Germany, India, New Zealand, USA
优胜者一半来自公司,一半来自大学
Protein Winners:
Bernhard Pfahringer
University of Waikato, Computer Science Department
1st Place Overall

Yan Fu, RuiXiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao
Institute of Computing Technology, Chinese Academy of Sciences
Tied for 1st Place Overall
Honorable Mention for Squared Error
Honorable Mention for Average Precision

David S. Vogel, Eric Gottschalk, and Morgan C. Wang
MEDai / A.I. Insight / University of Central Florida
Tied for 1st Place Overall
Honorable Mention for Top-1 Accuracy

Dirk Dach, Holger Flick, Christophe Foussette, Marcel Gaspar, Daniel Hakenjos, Felix Jungermann, Christian Kullmann, Anna Litvina, Lars Michele, Katharina Morik, Martin Scholz, Siehyun Strobel, Marc Twiehaus, Nazif Veliu
Artificial Intelligence Unit, University of Dortmund, Germany
Honorable Mention for Rank of Last

原创粉丝点击