A brief introduction to feature selection
来源:互联网 发布:怎么做图表数据分析图 编辑:程序博客网 时间:2024/04/25 09:12
With the development of machine learning and the abundant benefit that follows in the past few decades, humans are becoming more and more obsessed with machine learning, and dreaming for a future that one day we need only to input every data we can acquire into the learning algorithm, and then all the problem will be solved automatically. Behind this dream is a naive fath that more information are hided in bigger data and the machine learing algorithm are responsible for discovering the answer as long as we provide enough data. However, this naive fath seem to be more like a dream than something we can achieve in the future because researches have showed that more data is not equal to better result in some situation. For example, most machine learning problems are very sensitive to the set of features of the input data, and the final result may change dramatically when there is a little shift of the features. In some real scenarios such as bioinformatics and text analyses where usually the number of features is than 10k or even more, most of features of the data are not only irrelative to the final result, but also can disturb the algorithm from making the right decision. Thus, to increase the proformance of machine learning algorithms, it will be very necessary and useful for us to pick up a subset of ‘right’ features before we apply them in real problem (this is what we called Feature Selection).
In this serial of sessions, I will try to explain what feature selection is and how it works. I will introduce the latest research in the area in the world and group them into different categories. With my best hope, I will try to figure out some new algorithm that can handle this problem.
Feature Selection (sometimes called variable selection or attribute selection) is the selection of a subset of the best features from the whole setof originals features. According to the definition of feature selection, one can feel that the fundamental problem in feature selection is to decide what is the best subset of feature?
There are three main types of feature selection algorithm: ‘filter’, ‘wrapper’ and ‘embedded’. A filter works by marking a score for each individual feature and then picks up the top scored features. A wrapper works by evaluating asubset of features as a whole and choose the best subset as a whole. The size of the subset is not restricted to be the number of desired features. For the embedded algorithm, there are not an independent process of scoring for the features, however, the selection are embedded into the machine learning algorithm.(that is to say, for some algorithm, their final result are relatedto only a few features, such as sparse algorithm).
One can see that filters are computationally efficient and run faster because they only consider the individual score of the features, thus they suffer from the problem that missing the complicated relationship among features. Wrappers works better than filters but the computation complexity grows exponentially as the size of favored feature grows, due to the problem of combinational explosion. Using embedded algorithm, the size of picked features may be different from what we expect.
- A brief introduction to feature selection
- A Brief Introduction to IoC
- A Brief Introduction to IoC
- A Brief Introduction to REST
- A Brief Introduction to REST
- A Brief Introduction to OVF
- A Brief Introduction to Myself
- A brief introduction to VXLAN
- A Brief Introduction to XACML
- An Introduction to Variable and Feature Selection
- A Brief Introduction to UNIX SHELL Virus
- A brief introduction to geospatial coordinates
- A Brief Introduction to Multiset[STL]
- A Brief Introduction to Language Modeling
- A Brief Introduction to PHP Namespacing
- A Brief Introduction to Digital Video
- IoC 简介:A Brief Introduction to IoC[读后感]
- A Brief Introduction to the JTAG Boundary Scan Interface
- 信托公司利用境外资金的两大途径
- 详解Python魔术方法__getitem__、__setitem__、__delitem__、__len__
- GIT 查看/修改用户名和邮箱地址
- Android-Universal-Image-Loader中的缓存分析
- 完美解决SpringMVC中静态资源无法找到(No mapping found for HTTP request with URI)问题
- A brief introduction to feature selection
- TensorFlow练习22: 手写汉字识别 利用tensorflow来做验证码识别
- C++静态代码检查工具cppcheck在vs下安装与测试步骤
- C# 正则表达式实现字符串搜索
- struts2运行流程
- 文章标题
- YII之片段缓存
- Hinge loss
- 类对象和类的对象的区别