7-16 paper reading

来源:互联网 发布:js数组地址指向 编辑:程序博客网 时间:2024/05/19 02:00

文章1标题:konwledge discovery and data mining: towards an unifying framework

这是一篇1996发表的文章。
对kdd的定义: is non-trivial process of identifying valid novel potentially useful and ultimately understandable patterns in data

对于 data mining的定义: is a step in the kdd process consisting of applying data analysis and discovery algorithms that ,under acceptable computational efficience limtations ,produce a particular enumeration of patterns over the data.

kdd process的定义:

抽象描述:
using database along with any required selection,preprocessing ,subsampling and transformation of it ---->
data mining method to enumeration patterns ---->
evaluate the products of data mining to identify the subset of enumerated patterns deemed “knowledge”

具体描述:
选择数据 target data
数据预处理 preprocessed data
数据转换 transformed data
数据挖掘 pattern
interpreting mined pattern: konwledge

data mining step of kdd process :
most data mining methods are based on tried and tested techniques from machine learning pattern recognition and statistics
5.1数据挖掘方法:分类 回归 聚类 summarization , dependency modeling , change ande deviation detection
5.2数据挖掘算法的结构:model representation , model evaluation criterial,search method( parameter search and model search).

model representation and model evaluation criterial fixed
then data mining problem has been reduced to purely an optimization task

5.3 data mining algorithms:
about data mining algorithms ,each technique typically suits some problems better than others

6.application issues :
research and application challenges

文章2: 数据挖掘技术及其在生物信息学的应用 《湖南农业大学学报:自然科学版》

先对数据挖掘有个说明

典型应用有:
(1)异构、分布式基因数据库的语义集成.
(2 ) D N A序列间相似搜索和比较
(3)关联分析:多个基因对疾病的影响
(4)路径分析:不同基因在疾病各个阶段的作用
(5)聚类分析
(6)可视化工具和遗传数据分析.