03-Data Resampling

来源:互联网 发布:北京骏溢至信 淘宝 编辑:程序博客网 时间:2024/06/06 12:49

Related image

1. Bootstrap

Draw a “bootstrap sample" by sampling n times with replacement from the sample.

The bootstrap estimates the variability of the sampling process and works well for estimating confidence intervals.

A confidence interval provides a range of values which is likely to contain the population parameter of interest.

ex. I have 95% confidence to believe that the mean of this parameter is in range(x1, x2)

Image result for confidence interval



2. Permutation

Concatenate two datasets A & B, randomly reset the indexes, then output new A and new B with no replacement.

Permutation tests test a specific null hypothesis of exchangeability.


3.Cross validation

Cross-validation removes one point at a time, then fits to the remaining points, then sees how well the removed point is fit.

Cross-validation is primarily a way of measuring the predictive performance of a statistical model.

Cross Validation is used to assess the predictive performance of the models and and to judge how they perform outside the sample to a new data set also known as test data
The motivation to use cross validation techniques is that when we fit a model, we are fitting it to a training dataset. Without cross validation we only have information on how does our model perform to our in-sample data. Ideally we would like to see how does the model perform when we have a new data in terms of accuracy of its predictions. In science, theories are judged by its predictive performance.  
There two types of cross validation you can perform: leave one out and k fold.

原创粉丝点击