A brief introduction to feature selection

来源：互联网发布：怎么做图表数据分析图编辑：程序博客网时间：2024/04/25 09:12

With the development of machine learning and the abundant benefit that follows in the past few decades, humans are becoming more and more obsessed with machine learning, and dreaming for a future that one day we need only to input every data we can acquire into the learning algorithm, and then all the problem will be solved automatically. Behind this dream is a naive fath that more information are hided in bigger data and the machine learing algorithm are responsible for discovering the answer as long as we provide enough data. However, this naive fath seem to be more like a dream than something we can achieve in the future because researches have showed that more data is not equal to better result in some situation. For example, most machine learning problems are very sensitive to the set of features of the input data, and the final result may change dramatically when there is a little shift of the features. In some real scenarios such as bioinformatics and text analyses where usually the number of features is than 10k or even more, most of features of the data are not only irrelative to the final result, but also can disturb the algorithm from making the right decision. Thus, to increase the proformance of machine learning algorithms, it will be very necessary and useful for us to pick up a subset of ‘right’ features before we apply them in real problem (this is what we called Feature Selection).

In this serial of sessions, I will try to explain what feature selection is and how it works. I will introduce the latest research in the area in the world and group them into different categories. With my best hope, I will try to figure out some new algorithm that can handle this problem.

Feature Selection (sometimes called variable selection or attribute selection) is the selection of a subset of the best features from the whole setof originals features. According to the definition of feature selection, one can feel that the fundamental problem in feature selection is to decide what is the best subset of feature?

There are three main types of feature selection algorithm: ‘filter’, ‘wrapper’ and ‘embedded’. A filter works by marking a score for each individual feature and then picks up the top scored features. A wrapper works by evaluating asubset of features as a whole and choose the best subset as a whole. The size of the subset is not restricted to be the number of desired features. For the embedded algorithm, there are not an independent process of scoring for the features, however, the selection are embedded into the machine learning algorithm.(that is to say, for some algorithm, their final result are relatedto only a few features, such as sparse algorithm).

One can see that filters are computationally efficient and run faster because they only consider the individual score of the features, thus they suffer from the problem that missing the complicated relationship among features. Wrappers works better than filters but the computation complexity grows exponentially as the size of favored feature grows, due to the problem of combinational explosion. Using embedded algorithm, the size of picked features may be different from what we expect.

0 0