Eclipse中mahout运行环境的搭建以及推荐实例

来源:互联网 发布:淘宝保健食品类押金 编辑:程序博客网 时间:2024/06/04 18:18
首先,搭建eclipse mahout开发环境
1.安装eclipse集成开发环境j2ee版本:eclipse-jee-juno-SR2-linux-gtk.tar.gz
下载地址为:http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR2/eclipse-jee-juno-SR2-linux-gtk.tar.gz

2.为eclipse安装maven插件(该部分原文来自:http://blog.sina.com.cn/s/blog_72e282a50101455i.html)

要实现mahout通过eclipse在hadoop上跑,首先要安装eclipse-hadoop插件,见另外一篇博文

第一步,Eclipse中安装Maven插件 M2eclipse

下面是官网的说明,基本上的意思下面有图片说明.

To install m2eclipse, use the following Eclipse update site to install the core of the m2eclipse plugin. This Core update site contains a single component: "Maven Integration for Eclipse (Required)". When you install this component you will be installing all of the core Wizards, the POM Editor, Maven Repository integration, and Maven integration

m2eclipse Core Update Site: http://m2eclipse.sonatype.org/sites/m2e

在该地址下,现已改变,

Eclipse中mahout运行环境的搭建以及推荐实例
在上面的版本中任意选择一个,故新的网址为:http://m2eclipse.sonatype.org/sites/m2e/0.12.1.20110112-1712

IMPORTANT NOTE: You cannot upgrade from m2eclipse 0.9.8 or m2eclipse 0.9.9 to m2eclipse 0.10.0. If you are running m2eclipse 0.9.8 or 0.9.9 you must either uninstall m2eclipse from your Eclipse installation or start with a fresh installation of Eclipse.

To install this plugin in the Eclipse IDE:


Select Help > Install New Software. This should display the "Install" dialog.
Paste the Update Site URL into the field named "Work with:" and press Enter. Pressing Enter should cause Eclipse to update list of available plugins and components.
Choose the component listed under m2eclipse: "Maven Integration for Eclipse (Required)".
Click Next. Eclipse will then check to see if there are any issues which would prevent a successful installation.
Click Next and agree to the terms of the Eclipse Public License v1.0.
Click Finish to begin the installation process. Eclipse will then download and install the necessary components.
Once the installation process is finished, Eclipse will ask you if you want to restart the IDE. Sonatype strongly recommends that you restart your IDE after installing m2eclipse.

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

 

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑



Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑


Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑
到这里可以说已经完成了maven的安装了,接下来是如何把hadoop和mahout的包导入到maven中

第二步,新建一个maven项目,下图说明

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑

Linux <wbr><wbr>eclipse中maven的安装以及利用maven来实现mahout算法在hadoop上跑
这里是演示添加hadoop的包(保存pom.xml文件时,这些包会自动从网上下载的),同理可以添加mahout
上文的导入包方式我在搭建时,出错,故我用了另一种导入包方法:
1)右击项目,选择最后一项“properties”
跳出一个窗口,点击“java build path”
2)然后点击“Libraries”,点击“add external  jars”,之后添加相应的jar文件
如下图操作:日志原文:http://coresun.blog.sohu.com/72521381.html
在项目上点击右键,然后选择>Build Path-->Configure Build Path,之后弹出如下图所示的界面:
选择Libraries选项卡,如下图所示:
            

    可以看到现在只有JRE System Library 供我们使用,点击右侧的Add JARs按钮,弹出界面如下图所示:
                 
    选中这三个.jar文件后,点击ok按钮,刚才的界面已经有变化了,如下图所示:
              
再点击OK按钮,可以看到项目中已经有了变化,如下图所示:
               


实例
1.创建一个文本文件,存储用户ID,商品ID,评分,将文件保存为intro.csv
  1. 第一列为UserID ,第二列为ItemID,第三列为Preference Value 即评分  
  2. 1,101,5  
  3. 1,102,3  
  4. 1,103,2.5  
  5. 2,101,2  
  6. 2,102,2.5  
  7. 2,103,5  
  8. 2,104,2  
  9. 3,101,2.5  
  10. 3,104,4  
  11. 3,105,4.5  
  12. 3,107,5  
  13. 4,101,5  
  14. 4,103,3  
  15. 4,104,4.5  
  16. 4,106,4  
  17. 5,101,4  
  18. 5,102,3  
  19. 5,103,2  
  20. 5,104,4  
  21. 5,105,3.5  
  22. 5,106,4
程序:
package com.ben;

import java.io.File;
import java.io.IOException;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

public class RecommenderIntro {

   
    public static void main(String[] args) throws TasteException {
        try {
            DataModel model=new FileDataModel(new File("/home/srp/Downloads/intro.csv"));
            UserSimilarity similarity=new PearsonCorrelationSimilarity(model);
            UserNeighborhood neighborhood=new NearestNUserNeighborhood(2,similarity,model);
            Recommender recommender=new GenericUserBasedRecommender(model,neighborhood,similarity);
            List recommendations=recommender.recommend(1,1);
            for(RecommendedItem recommendation : recommendations)
                System.out.println(recommendation);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}
在该程序中,肯定会报错导到工程中,除了将mahout-core-0.4.jar导入之外,还有部分Jar包未导入,

然后Run As 选择Java Application,单机运行(当然集群如果成功启动的话,也可以在集群上运行的)。此时在运行后,会相继报一些包不存在,根据提示,依次将包导到工程中。我在测试的过程中,先后导入了slf4j-api-1.6.1.jar,slf4j-jcl-1.6.1.jar,slf4j-log4j12-1.6.1.jar,commons-logging-1.1.1.jar,uncommons-maths-1.2.2.ja,mahout-math- 0.6.jar,guava-r09.jar,mahout-collections-1.0.jar.这些包都可以在mahout安装目录下的lib目录下找到。经过一步步差包导包.调试后,最后运行成功,结果如下:

RecommendedItem[item:104, value:4.257081]

0 0
原创粉丝点击