bagging算法java实现(从N个样本中有放回地取N次)

来源:互联网 发布:php artisan serve 编辑:程序博客网 时间:2024/05/22 04:37

该试验实现了统计学上经典的bagging抽样方法,并通过多次反复实验(迭代),画出了分布图,验证了bagging的正态分布和随机抽样的概率收敛情况。


package cn.melina.classification.test;import java.text.DecimalFormat;import java.util.ArrayList;import java.util.HashMap;import java.util.Random;import java.util.Set;import org.apache.mahout.common.RandomUtils;public class BaggingTest {/**   * if data has N cases, sample N cases at random without replacement.   *    * @author melina   *    * @param N   *          numbers of cases   *    * @return N次取值并去重之后剩余的数的个数   */public static int runBagging(int N){Random rng = RandomUtils.getRandom();    ArrayList<Integer> list = new ArrayList<Integer>();         for (int i = 0; i < N; i++) {      int index = rng.nextInt(N);      list.add(index);    }    //合并相同的取值    HashMap<String, Integer> hash = new HashMap<String, Integer>();          for (int i = 0; i < list.size(); i++) {              try {                    if (!hash.isEmpty() && hash.containsKey(list.get(i))) {                      hash.put(list.get(i).toString(), hash.get(list.get(i)) + 1);                  } else {                      hash.put(list.get(i).toString(), 1);                  }              } catch (Exception e) {                }          }          /*Set<String> set = hash.keySet();          for (String key : set) {              System.out.println(key + "==>" + hash.get(key));          }  */return hash.keySet().size();}public static void main(String []args){int itr_num = 10000;   //迭代次数int datasize = 100;    //bagging的样本数目,此处为0~99之间100个数字做baggingArrayList<Integer> list = new ArrayList<Integer>();for(int i = 0; i < itr_num; i ++){int num = runBagging(datasize);list.add(num);    //System.out.println("第"+i+"次bagging去重之后的个数:"+ num);    //System.out.println(num);}//统计 相同的数目在全部迭代后出现的频率HashMap<String, Integer> hash = new HashMap<String, Integer>();          for (int i = 0; i < list.size(); i++) {              try {                    if ((!hash.isEmpty() )&&( hash.containsKey(list.get(i).toString()))) {                      hash.put(list.get(i).toString(), Integer.valueOf(hash.get(list.get(i).toString())) + 1);                  } else {                      hash.put(list.get(i).toString(), 1);                  }              } catch (Exception e) {                }          }        Set<String> set = hash.keySet();          for (String key : set) {          double itr_double=itr_num*1.0;          double value =  hash.get(key)/itr_double;        DecimalFormat df = new DecimalFormat("0.00%");                    System.out.println(key + "," + df.format(value));          }        }}


bagging的样本总数设定为100,则从0~99这100个数中随机有放回取100次,将会在63.2附近达到极值。反复实验10000次,画出一下统计图:


 GOOD LUCK!

小伙伴们加油!有问题欢迎加我好友讨论~

0 0
原创粉丝点击