Why data normalization in SVM
来源:互联网 发布:linux 批量重命名 编辑:程序博客网 时间:2024/06/06 03:03
Data normalization is generally performed during the data pre-processing step.
1. why we need normalization
There are two major reasons that data normalization is so essential for machine learning algorithm.
- Data normalization can promote the performance in common machine learning problems.
- Data normalization can speed up the coverage of gradient descent algorithm.
Let's illustrate this using a screenshot from Andrew's machine learning course
2. how to normalize data
Three common methods are used to perform feature normalization in machine learning algorithms.- Rescaling
where is the original value, is the normalized value.
The equation (1) rescales data into [0,1], and the equation (2) rescales data into [-1,1].
Note: the parameters and should be computed in the training data only, but will be used in the training, validation, and testing data later.
There are also some methods to normalize the features using non-linear function, such as
logarithmic function:
inverse tangent function:
sigmoid function:
- Standardization
Feature standardization makes the values of each feature in the data have zero-mean and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g., support vector machines,logistic regression, and neural networks). The general formula is given as:
where is the standard deviation of the feature .
- Scaling to unit length
This is especially important if the Scalar Metric is used as a distance measure in the following learning steps.
3. Some cases you don't need data normalization
3.1 using a similarity function instead of distance function
You can propose a similarity function rather than a distance function and plug it in a kernel (technically this function must generate positive-definite matrices).
3.2 random reforest
Random forest never compare one feature with another in magnitude, so the ranges don't matter.
Reference
[1] http://en.wikipedia.org/wiki/Feature_scaling
[2] http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling
[3] http://stats.stackexchange.com/questions/57010/is-it-essential-to-do-normalization-for-svm-and-random-forest
- Why data normalization in SVM
- 数据标准化 Data Normalization
- JMeter - Why response data is not displayed in .jtl file
- normalization rules prevent from reusing this data item in a primary identifier解决办法
- powerdesigner: used normalization rules prevent from reusing this data item in a primary
- 数据挖掘:Top 10 Algorithms in Data Mining(三)SVM
- 数据标准化(data normalization)
- 数据标准化(data normalization)
- Normalization on conv4_3 in SSD
- Why does wide file-stream in C++ narrow written data by default?
- If advanced algorithms and data structures are never used in industry, then why learn them?
- why request event is fired before data event in http modlues of node.js.
- [Cloud Computing]Patterns: Dynamic Data Normalization
- 数据标准化/归一化方法(Data Normalization Method )
- Why web data extraction service?
- Normalization
- SVM in Forecasting
- Why Homogeneous Matrix in CG?
- 优秀个人和团体篇
- Android UI布局之RelativeLayout
- 尹春鹏-Cocos游戏的自动化测试和崩溃分析
- JAVA card 应用开发(一) 创建第一个APPLET
- ios开发之深入浅出 (5) — ARC之Outlet与弱引用
- Why data normalization in SVM
- Comet:基于 HTTP 长连接的“服务器推”技术
- [Leetcode]Subsets I&II
- 雷军在“我看未来20年”上的演讲
- office outlook 2010 cannot open your default e-mail folders 的解决方法
- SQL Server 2008 R2 远程过程调用失败 0x800706be的解决方法
- UITableView使用详解
- Windows使用VNC viewer访问Ubuntu 14.04远程桌面的简单方法
- 未来一年想去的地方