协同过滤算法

来源：互联网发布：中国网络全世界最自由编辑：程序博客网时间：2024/05/16 08:54

1.表示用户行为矩阵，即统计用户购买某种商品类型的数量

public double[] getNumByCustomer(Customer customer){        List<OrderItem> list =orderItemDao.findByCustomerAndAliveAndState(customer.getId(),1,2);        double [] vectore =new double[totalNum];        int index=0;        for(ProductType type:productTypes){            for(OrderItem orderItem:list){                if(orderItem.getProduct().getProductType().id==type.id){                    vectore[index]=vectore[index]+orderItem.getNum();                }            }        return vectore;    }

2.用余弦距离计算每个用户与其它用户的行为相似度
下面代码是两个用户之间的相似度，进行遍历就可以获取全部相似度

 public double countSimilarity(double [] a,double [] b){        double total=0;        double alength=0;        double blength=0;        for(int i=0;i<a.length;i++){            total=total+a[i]*b[i];            alength=alength+a[i]*a[i];            blength=blength+b[i]*b[i];        }        double down=Math.sqrt(alength)*Math.sqrt(blength);        double result=0;        if(down!=0){            result =total/down;        }        return result;    }

3.取相似度最高的前n个用户，组成相似用户集合
对Map按值进行排序

 public List<Map.Entry<Long,Double>> getMaxSimilarity(Customer customer){        Map<Long,Double> result =new HashMap<Long,Double>();        double vector[] =(double [])users.get(customer.getId());        for(Map.Entry<Long,Object> entry:users.entrySet()){            if(entry.getKey()!=customer.getId()){                double [] temp =(double[])entry.getValue();                double similarity =countSimilarity(temp,vector);              result.put(entry.getKey(),similarity);            }        }        List<Map.Entry<Long,Double>> list = new LinkedList<Map.Entry<Long,Double>>( result.entrySet() );        Collections.sort( list, new Comparator<Map.Entry<Long,Double>>(){            public int compare( Map.Entry<Long,Double> o1, Map.Entry<Long,Double> o2 )            {                return (o2.getValue()).compareTo( o1.getValue() );            }        } );        return list;    }

4.获得相似用户集合购买的商品，并统计相似用户购买的商品的数量，进行排序

public Map<Long,ProductNumModel> getProducts(List<Map.Entry<Long,Double>> list){        List<Customer> simCustomers =new ArrayList<Customer>();        System.out.println("相似度高的3个用户  ");        for(int i=0;i<list.size()&&i<3;i++){            Long id =list.get(i).getKey();            Customer customer =customerDao.findByIdAndAlive(id,1);            simCustomers.add(customer);        }        Map<Long,ProductNumModel> map =new HashMap<Long,ProductNumModel>();        for(Customer customer:simCustomers){           Map<Long,ProductNumModel> hashSet =getCustomerProduct(customer);           for(Map.Entry<Long,ProductNumModel> entry:hashSet.entrySet()){                ProductNumModel model=null;                if(map.containsKey(entry.getKey())){                    model=map.get(entry.getKey());                    model.num+=entry.getValue().num;                }else{                    model=new ProductNumModel();                    model.product=entry.getValue().product;                    model.num=entry.getValue().num;                }                map.put(entry.getKey(),model);            }        }        return map;    }

总的调用函数，将前面函数连接，并把结果存到文件中。如果文件不存在，则用算法计算，如果文件内容存在，则直接读取文件的内容。开定时任务，每天或者一周将商品推荐文件删除，则会自动更新商品推荐内容

public Map<String,Object> getAllSimilarity(Customer customer) throws IOException {        changeCustomerToVector();        for(Map.Entry<Long,Object> entry:users.entrySet()){            double [] temp=(double [])entry.getValue();        }        InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("cxtx.properties");        Properties p = new Properties();        try {            p.load(inputStream);        } catch (IOException e1) {            e1.printStackTrace();        }        String folderPath = p.getProperty("recommendFile");        File file=new File(folderPath);        if(!file.exists()){            file.createNewFile();        }        FileInputStream fileInputStream=new FileInputStream(file);        Map<String,Object> map =new HashMap<String,Object>();        com.alibaba.fastjson.JSONObject jsonObject = null;        try {            if(fileInputStream!=null){                jsonObject = com.alibaba.fastjson.JSON.parseObject(IOUtils.toString(fileInputStream, "UTF-8"));            }        } catch (IOException e) {            map.put("msg","JSON 格式不正确");            map.put("content","");            return map;        }         Object content=null;        if(jsonObject==null){ //如果文件中没有,则计算每个用户的推荐产品            FileWriter fileWriter=new FileWriter(file,true);            BufferedWriter bufferedWriter=new BufferedWriter(fileWriter);            Map<Long,Object> temp =new HashMap<Long,Object>();            for(Customer c:customers){               List<Map.Entry<Long,Double>> list =this.getMaxSimilarity(c);               Map<Long,ProductNumModel> result =getProducts(list);               List<Product> list1=sortProduct(result);               temp.put(c.getId(),list1);            }               JSONObject object=new JSONObject(temp);               bufferedWriter.write(object.toString());               bufferedWriter.flush();            if(object!=null){                content= object.get(customer.getId()+"");            }        }else{            if(null!=jsonObject.get(customer.getId()+"")){                content=jsonObject.get(customer.getId()+"");            }        }        map.put("msg","获取成功");        map.put("content",content);        return map;    }

注意的地方：

1.用户相似度计算时，要考虑分母为0的情况；同时要防止数值太大，超过了double能表示的范围，可以做一些处理，例如除以最大的某个商品销售量，来表示某个维度的向量值，或者减去某个值等等

2.余弦值越接近1，表明两个向量越相似，即计算出来的值越大，用户行为越相似

3.最后获得推荐的商品数量可以较多或较少，要根据一定策略进行排序，例如相似用户的购买数量，而不是商品总的销售量，因为不相似用户的数据，容易产生干扰。

0 0