KDD Cup竞赛介绍

来源:互联网 发布:亚马逊服务器和阿里云 编辑:程序博客网 时间:2024/05/17 18:02


KDD Cup简介

 

KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners. 
(由SIGKDD(ACM Special Interest Group on Knowledge Discovery and Data Mining)组织,每年一次的KDD竞赛,和SIGKDD国际会议同期举行。同时面向学术界和业界。 )

 

here is the KDD Cup Center:

http://www.sigkdd.org/kddcup/index.php

 

历届KDD Cup的主题:
2004, 有指导分类的多种性能度量
2003, 网络挖掘及使用日志分析
2002, 生物信息及文本挖掘(分子生物学领域)
2001, 生物信息及医药(医药设计中的生物活性预测、预测基因/蛋白质的功能及定位) 
2000, web挖掘任务(根据点击流及交易数据)
1999, 网络侵入侦测及报告
1998, 生成最佳直销名单
1997, 预测出最可能的善款捐赠人

KDD Cup 1997

http://www-aig.jpl.nasa.gov/public/kdd97/kdd_cup.html
Task
given data on past responders to fund-raising, predict most likely responders for new campaign
Dataset
321 fields/variables, Significant effort on data preprocessing 
Participants
45 companies/institutions participated
16 contestants turned in their results
Shared 1-2 place 
Charles Elkan, Ph.D. from University of California, San Diego (BNB, Boosted Naive Bayesian Classifier)
Urban Science Applications, Inc. (Gain, Direct Marketing Selection System)
3rd Place 
Silicon Graphics, Inc (MineSet)

KDD Cup 1998

http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
Task: the goal was to select the best list to mail a solicitation
Dataset: 95412 records and 481 fields 
Participants: 21 teams completed the challenge and submitted results 
1st place: Urban Science Applications, Inc. (Software GainSmarts)
2nd place: SAS Institute, Inc. (Software Enterprise Miner)
3rd place: Quadstone Limited (Software Decisionhouse)

KDD Cup 1999

1.Classifier learning contest
http://www-cse.ucsd.edu/users/elkan/clresults.html
The goal was to build a predictive model for identifying network intrusions. 
24 entries were submitted. 
2.Knowledge discovery "report" contest 
http://www.cse.ucsd.edu/users/elkan/kdresults.html
The goal was to apply a range of knowledge discovery techniques to the same data used in the 1998 competition, and discover higher-level knowledge from data . 
Co-winners
J. Georges and A.Milley (SAS) 
S. Rosset and A. Inger (Amdocs, Israel). 
Honorable mention
Paola Sebastiani, Marco Ramoni, and Alexander Crea of Bayesware Ltd.

KDD Cup 2000

http://www.ecn.purdue.edu/KDDCUP/
Task: The questions related to clickstream and purchase data from an e-tailer. Five questions.
Dataset: Obtained from Gazelle.com, a legwear and legcare Web retailer
Over 150 teams requested data, 30 teams submitted the answers. 
Questions 1 & 5 Winner: Amdocs 
Exploratory Data Analysis – SAS, S Plus 
Classification Tree, Rules Extraction – Amdocs Business Insight Tool
Questions 2 & 3 Winner: Salford Systems
Question 4 Winner: e-steam

KDD Cup 2001

http://www.cs.wisc.edu/~dpage/kddcup2001/
Problems from bioinformaitcs
Data set 1
Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin (task 1)
Data set 2
Prediction of Gene/Protein Function (task 2) and Localization (task 3)
136 groups , 200 submissions
Task 1 winner (Thrombin)
Jie Cheng (Canadian Imperial Bank of Commerce). 
Bayesian network learner and classifier
Task 2 winner (Function)
Mark-A. Krogel (University of Magdeburg). 
Inductive Logic programming
Task 3 winner (Localization)
Hisashi Hayashi, Jun Sese, and Shinichi Morishita (University of Tokyo). 
K nearest neighbor

KDD Cup 2002

http://www.biostat.wisc.edu/~craven/kddcup/
Two tasks from molecular biology domains 
Task 1: construct models that can assist genome annotators by automatically extracting information from scientific articles 
Task 2: learn models that characterize the behavior of individual genes in a hidden experimental setting. 
Task 1 winner
Yizhar Regev and Michal Finkelstein
ClearForest and Celera, USA
Task 2 winner
Adam Kowalczyk and Bhavani Raskutti
Telstra Research Laboratories, Australia
Single Class SVM

KDD Cup 2003

http://www.cs.cornell.edu/projects/kddcup/
Data set
A very large archive of research papers 
Citation structure and (partial) data on the downloading of papers by users 
Task
Task 1: predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference
Task 2: a citation graph of a large subset of the archive from only the LaTex sources 
Task 3: each paper's popularity will be estimated based on partial download logs 
Task 4: devise their own questions
Task 1 :
Claudia Perlich, Foster Provost, Sofus Kacskassy
New York University
Task 2: 
David Vogel
AI Insight Inc.
Task 3 :
Janez Brank and Jure Leskovec 
Jozef Stefan Institute, Slovenija 
Task 4 :
Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen 
University of Massachusetts, Amherst, USA

KDD Cup 2004

http://kodiak.cs.cornell.edu/kddcup/ 
April 28 --- July 14, 2004
两个问题,数据分别来自
生物信息学
量子物理学
不同性能指标下的数据挖掘问题
有来自49个国家的注册 (including .com)
优胜者来自China, Germany, India, New Zealand, USA
优胜者一半来自公司,一半来自大学
Protein Winners:
Bernhard Pfahringer 
University of Waikato, Computer Science Department
1st Place Overall

Yan Fu, RuiXiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao 
Institute of Computing Technology, Chinese Academy of Sciences
Tied for 1st Place Overall 
Honorable Mention for Squared Error 
Honorable Mention for Average Precision

David S. Vogel, Eric Gottschalk, and Morgan C. Wang 
MEDai / A.I. Insight / University of Central Florida
Tied for 1st Place Overall 
Honorable Mention for Top-1 Accuracy

Dirk Dach, Holger Flick, Christophe Foussette, Marcel Gaspar, Daniel Hakenjos, Felix Jungermann, Christian Kullmann, Anna Litvina, Lars Michele, Katharina Morik, Martin Scholz, Siehyun Strobel, Marc Twiehaus, Nazif Veliu 
Artificial Intelligence Unit, University of Dortmund, Germany
Honorable Mention for Rank of Last

KDD Cup简介

 

KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners. 
(由SIGKDD(ACM Special Interest Group on Knowledge Discovery and Data Mining)组织,每年一次的KDD竞赛,和SIGKDD国际会议同期举行。同时面向学术界和业界。 )

 

here is the KDD Cup Center:

http://www.sigkdd.org/kddcup/index.php

 

历届KDD Cup的主题:
2004, 有指导分类的多种性能度量
2003, 网络挖掘及使用日志分析
2002, 生物信息及文本挖掘(分子生物学领域)
2001, 生物信息及医药(医药设计中的生物活性预测、预测基因/蛋白质的功能及定位) 
2000, web挖掘任务(根据点击流及交易数据)
1999, 网络侵入侦测及报告
1998, 生成最佳直销名单
1997, 预测出最可能的善款捐赠人

KDD Cup 1997

http://www-aig.jpl.nasa.gov/public/kdd97/kdd_cup.html
Task
given data on past responders to fund-raising, predict most likely responders for new campaign
Dataset
321 fields/variables, Significant effort on data preprocessing 
Participants
45 companies/institutions participated
16 contestants turned in their results
Shared 1-2 place 
Charles Elkan, Ph.D. from University of California, San Diego (BNB, Boosted Naive Bayesian Classifier)
Urban Science Applications, Inc. (Gain, Direct Marketing Selection System)
3rd Place 
Silicon Graphics, Inc (MineSet)

KDD Cup 1998

http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
Task: the goal was to select the best list to mail a solicitation
Dataset: 95412 records and 481 fields 
Participants: 21 teams completed the challenge and submitted results 
1st place: Urban Science Applications, Inc. (Software GainSmarts)
2nd place: SAS Institute, Inc. (Software Enterprise Miner)
3rd place: Quadstone Limited (Software Decisionhouse)

KDD Cup 1999

1.Classifier learning contest
http://www-cse.ucsd.edu/users/elkan/clresults.html
The goal was to build a predictive model for identifying network intrusions. 
24 entries were submitted. 
2.Knowledge discovery "report" contest 
http://www.cse.ucsd.edu/users/elkan/kdresults.html
The goal was to apply a range of knowledge discovery techniques to the same data used in the 1998 competition, and discover higher-level knowledge from data . 
Co-winners
J. Georges and A.Milley (SAS) 
S. Rosset and A. Inger (Amdocs, Israel). 
Honorable mention
Paola Sebastiani, Marco Ramoni, and Alexander Crea of Bayesware Ltd.

KDD Cup 2000

http://www.ecn.purdue.edu/KDDCUP/
Task: The questions related to clickstream and purchase data from an e-tailer. Five questions.
Dataset: Obtained from Gazelle.com, a legwear and legcare Web retailer
Over 150 teams requested data, 30 teams submitted the answers. 
Questions 1 & 5 Winner: Amdocs 
Exploratory Data Analysis – SAS, S Plus 
Classification Tree, Rules Extraction – Amdocs Business Insight Tool
Questions 2 & 3 Winner: Salford Systems
Question 4 Winner: e-steam

KDD Cup 2001

http://www.cs.wisc.edu/~dpage/kddcup2001/
Problems from bioinformaitcs
Data set 1
Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin (task 1)
Data set 2
Prediction of Gene/Protein Function (task 2) and Localization (task 3)
136 groups , 200 submissions
Task 1 winner (Thrombin)
Jie Cheng (Canadian Imperial Bank of Commerce). 
Bayesian network learner and classifier
Task 2 winner (Function)
Mark-A. Krogel (University of Magdeburg). 
Inductive Logic programming
Task 3 winner (Localization)
Hisashi Hayashi, Jun Sese, and Shinichi Morishita (University of Tokyo). 
K nearest neighbor

KDD Cup 2002

http://www.biostat.wisc.edu/~craven/kddcup/
Two tasks from molecular biology domains 
Task 1: construct models that can assist genome annotators by automatically extracting information from scientific articles 
Task 2: learn models that characterize the behavior of individual genes in a hidden experimental setting. 
Task 1 winner
Yizhar Regev and Michal Finkelstein
ClearForest and Celera, USA
Task 2 winner
Adam Kowalczyk and Bhavani Raskutti
Telstra Research Laboratories, Australia
Single Class SVM

KDD Cup 2003

http://www.cs.cornell.edu/projects/kddcup/
Data set
A very large archive of research papers 
Citation structure and (partial) data on the downloading of papers by users 
Task
Task 1: predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference
Task 2: a citation graph of a large subset of the archive from only the LaTex sources 
Task 3: each paper's popularity will be estimated based on partial download logs 
Task 4: devise their own questions
Task 1 :
Claudia Perlich, Foster Provost, Sofus Kacskassy
New York University
Task 2: 
David Vogel
AI Insight Inc.
Task 3 :
Janez Brank and Jure Leskovec 
Jozef Stefan Institute, Slovenija 
Task 4 :
Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen 
University of Massachusetts, Amherst, USA

KDD Cup 2004

http://kodiak.cs.cornell.edu/kddcup/ 
April 28 --- July 14, 2004
两个问题,数据分别来自
生物信息学
量子物理学
不同性能指标下的数据挖掘问题
有来自49个国家的注册 (including .com)
优胜者来自China, Germany, India, New Zealand, USA
优胜者一半来自公司,一半来自大学
Protein Winners:
Bernhard Pfahringer 
University of Waikato, Computer Science Department
1st Place Overall

Yan Fu, RuiXiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao 
Institute of Computing Technology, Chinese Academy of Sciences
Tied for 1st Place Overall 
Honorable Mention for Squared Error 
Honorable Mention for Average Precision

David S. Vogel, Eric Gottschalk, and Morgan C. Wang 
MEDai / A.I. Insight / University of Central Florida
Tied for 1st Place Overall 
Honorable Mention for Top-1 Accuracy

Dirk Dach, Holger Flick, Christophe Foussette, Marcel Gaspar, Daniel Hakenjos, Felix Jungermann, Christian Kullmann, Anna Litvina, Lars Michele, Katharina Morik, Martin Scholz, Siehyun Strobel, Marc Twiehaus, Nazif Veliu 
Artificial Intelligence Unit, University of Dortmund, Germany
Honorable Mention for Rank of Last

0 0
原创粉丝点击