Machine Learning Record(1-1)Polynomial curve fitting

来源:互联网 发布:java linux ping ip 编辑:程序博客网 时间:2024/05/18 17:40

本系列文章是本人阅读 《Partern recognition and Machine Learning》系列文章一

文章一 多变量曲线拟合

1.1Polynomial curve fitting

Real model  :   sin(2πx).

We sample points from real model with a Gaussian model noise.The input data set X in Figure 1.2 was uniformly sampled from 0 to 1. Z is gotten by computing the corresponding value and then add a small level of Gaussian noise.



In particular, we use apolynomial function to fitthe data.


Minimize Error function:


The remaining problem,Model comparison or Model selection:

Order  M=0,1,3,9:


There is a over-fitting problem when order M is 9. In fact,the least squares is a specific case of Maximum likelihood model ,and theover-fitting problem is a general property of maximum likelihood.By adopting a Bayesian approach, the over-fitting problem can be avoided. We shall see that there is no difficulty from a Bayesian perspective in employing models for which the number of parameters greatly exceeds the number of data points. Indeed, in aBayesian modelthe effective number of parametersadapts automatically to thesize of the data set.

One technique that is often used tocontrol the over-fitting phenomenonin such cases is that ofregularization,which involves adding a penalty to the error function in order to discourage the coefficients from reaching large values.  The simplest technique is the square of all coefficients:


Whenλ=1 the lnλ=0.

Whenλ=0 the lnλ=-00.


The impact of the regularization term on the generalization error can be seen byplotting the value of the RMS error or both training and test sets againstlnlnλ,as shown in Figure 1.8. We see that in effectlnλnow controls the effective complexityof the model and hence determines the degree of over-fitting.


In fact ,how to determine a suitable value for thecomplexityis a hard problem. For this example ,we can partition the data set into training set and test set, and then use the RMS to judge the complexity. Training set was used to determine the coefficientsw, and a separate

validation set, also called ahold-outset,was used todetermine the complexity ( M orλ).In many cases, however, this will prove to be too wasteful of Section 1.3 valuable training data, and we have to seek more sophisticated approaches.

1 0