oozie安装

来源:互联网 发布:懒猫软件 编辑:程序博客网 时间:2024/04/29 10:18

安装

1.缺包去:https://repo.spring.io/simple/hortonworks/org/apache/
$ bin/mkdistro.sh [-DskipTests]Running =mkdistro.sh= will create the binary distribution of Oozie. By default, oozie war will not contain hadoop andhcatalog libraries, however they are required for oozie to work. There are 2 options to add these libraries:1. At install time, copy the hadoop and hcatalog libraries to libext and run oozie-setup.sh to setup oozie war. This issuitable when same oozie package needs to be used in multiple set-ups with different hadoop/hcatalog versions.2. Build with -Puber which will bundle the required libraries in the oozie war. Further, the following options areavailable to customise the versions of the dependencies:-P<profile> - default hadoop-1. Valid are hadoop-1, hadoop-0.23, hadoop-2 or hadoop-3. Choose the correct hadoopprofile depending on the hadoop version used.-Dhadoop.version=<version> - default 1.2.1 for hadoop-1, 0.23.5 for hadoop-0.23, 2.3.0 for hadoop-2 and 3.0.0-SNAPSHOT    for hadoop-3-Dhadoop.auth.version=<version> - defaults to hadoop version-Ddistcp.version=<version> - defaults to hadoop version-Dpig.version=<version> - default 0.12.1-Dpig.classifier=<classifier> - default none-Dsqoop.version=<version> - default 1.4.3-Dsqoop.classifier=<classifier> - default hadoop100-Dtomcat.version=<version> - default 6.0.41-Dopenjpa.version=<version> - default 2.2.2-Dxerces.version=<version> - default 2.10.0-Dcurator.version=<version> - default 2.5.0-Dhive.version=<version> - default 0.13.1-Dhbase.version=<version> - default 0.94.2
来源: http://oozie.apache.org/docs/4.2.0/DG_QuickStart.html#Building_Oozie
cd  oozie-4.2.0/bin
./mkdistro.sh -DskipTests  -Puber -P hadoop-2                 -Puber会将第三方的包的打包进war包,比较方便。如果不加-Puber的话。编译好后的oozie.war包就没有依赖的jar文件。以后得自己下载依赖,放到libext目录下,然后自己打war包。

2.hbase-1.0.3.jar下载不到,在m2本地仓库中用hbase-0.9.XX.jar替换了,就是把名字改成1.0.3了----不知道会不会有后遗症

3. Failed to execute goal org.apache.maven.plugins:maven-site-plugin:2.0-beta-6:site (default) on project oozie-docs: The site descriptor cannot be resolved from the repository: Could not transfer artifact org.apache:apache:xml:site_en:16 from/to Codehaus repository (http://repository.codehaus.org/): repository.codehaus.org: unknown error
解决方式:

I was able to resolve it by editing the parent pom.xml file by removing the repository Codehaus repository

<repository>
<id>Codehaus repository</id>
<url>http://repository.codehaus.org/</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
或者替换链接
<repositories>
<repository>
<id>Codehaus repository</id>
<name>codehaus-mule-repo</name>
<url>https://repository-master.mulesoft.org/nexus/content/groups/public/
</url>
<layout>default</layout>
</repository>
</repositories>

编译后目标生产路径oozie-4.2.0/distro/target/oozie-4.2.0-distro.tar.gz

安装:

编译生成的oozie-4.2.0-distro.tar.gz 解压到相应目录这就是我们要的oozie 了。解压后得到oozie-4.2.0,cd到改目录下

1.下载ext-2.2.zip

    解压oozie-4.0.1-distro.tar.gz包

    mkdir libext

把hadoop的lib拷贝至libext目录下:cp /usr/local/hadoop/share/hadoop/*/*.jar libext/;cp /usr/local/hadoop/share/hadoop/*/lib/*.jar libext/

把hadoop与tomcat冲突jar包去掉--这个参考网络,不知道是不是必须
mv servlet-api-2.5.jar servlet-api-2.5.jar.bak
mv jsp-api-2.1.jar jsp-api-2.1.jar.bak
mv jasper-compiler-5.5.23.jar jasper-compiler-5.5.23.jar.bak
mv jasper-runtime-5.5.23.jar jasper-runtime-5.5.23.jar.bak 

2.bin/oozie-setup.sh prepare-war

3.上传共享lib 
tar -zxvf oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-sharelib-4.2.0.tar.gz,会生成share目录,待会生成hdfs的sharelib目录的时候需要用到。
bin/oozie-setup.sh  sharelib  create -fs hdfs://node01:8020,红色部分改成自己hdfs url地址
 或者  hdfs dfs -put  /opt/oozie-4.2.0/share  /user/{username}
注意,与oozie-site.xml中的oozie.service.WorkflowAppService.system.libpath的值保持一致,所以必须放到/user/{username}这个目录下
参考:http://blog.csdn.net/u014729236/article/details/47188631/

4.代理设置
http://dongxicheng.org/mapreduce-nextgen/hadoop-secure-impersonation/
如果不设置,提交任务时会遇到类似的报错:
 hadoop is not allowed to impersonate hadoop
翻译过来意思是hadoop不允许模仿hadoop,也就是说hadoop没有代替hadoop提交任务的权限。
出现这个问题的原因在于OOZIE本身并不执行任何任务,也不会分发任务至Tasktracker。OOZIE和Hadoop集群唯一的交互是向Jobtracker提交任务,并通过回调URL或轮询的方式获取任务执行情况。
我们假定Hadoop集群安装在A账户下,OOZIE安装在某节点的B账户下,该账户属于C用户组。那么代理设置表示如下含义:A账户在该节点拥有代替C用户组提交任务的权限。 
在core-site.xml中添加
        <!-- OOZIE --> 
        <property> 
                <name>hadoop.proxyuser.hadoop.hosts</name> 
                <value>IP</value> 
        </property> 
        <property>
                <name>hadoop.proxyuser.hadoop.groups</name>
                <value>hadoop</value> 
        </property>
 
在配置项中,hadoop.proxyuser.hadoop.hosts和hadoop.proxyuser.hadoop.groups中的两个hadoop是我们上文提到的账户A,hadoop.proxyuser.hadoop.hosts对应的value需要填写OOZIE安装节点的IP,hadoop.proxyuser.hadoop.groups对应的value需要填写我们上文提到的用户组C。
由于一般Hadoop和OOZIE都安装在hadoop账户下,而hadoop账户又属于hadoop用户组。所以就出现了这种搞笑的配置,hadoop代替hadoop提交任务。 
不重启hadoop集群,而使配置生效
hdfs dfsadmin -refreshSuperUserGroupsConfiguration
  yarn rmadmin -refreshSuperUserGroupsConfiguration
注意用户名一定不能带点:如hadoop.proxyuser.xing.ming.groups
5.bin/oozie-setup.sh db create -run
可以在conf/oozie-site.xml中修改oozie 元数据db相关信息
6.bin/oozied.sh start
7.bin/oozie admin -oozie http://localhost:11000/oozie -status
可以直接访问http://localhost:11000/oozie

教程

使用:http://www.open-open.com/lib/view/open1453606606995.html  
           http://www.ibm.com/developerworks/cn/data/library/bd-hadoopoozie/#shell、
           http://www.infoq.com/cn/articles/introductionOozie/
时间调度:http://shiyanjun.cn/archives/684.html
0 0