Compile and build specific Hadoop source code branch using Azure VM
来源:互联网 发布:hr抢购软件 编辑:程序博客网 时间:2024/06/05 22:47
Sometimes you may want to test a Hadoop feature that is available in a specific branch that is not available as a binary release. For example, in my case, I want to try accessing Azure Data Lake Store (ADLS) via its WebHDFS endpoint. Access to ADLS requires OAuth2, support for which was added in Hadoop 2.8 (HDFS-8155) but is not available in the current Hadoop 2.7.x releases.
Hadoop source code is available in this mirrored GitHub repo https://github.com/apache/hadoop. Version 2.8 specific code is available in the branch appropriately called "branch-2.8"
Deploy Azure VM with Ubuntu 14.04-LTS
As is described in the Building instructions for Hadoop, "the easiest way to get an environment with all the appropriate tools is by means of the provided Docker config" (for Linux or Mac). Since my primary laptop is running Windows 10, I will deploy a Ubuntu 14.04 LTS virtual machine in my Azure subscription, use it to build Hadoop 2.8 binary tar.gz file, download the resultant file, and delete the VM once I am done.
I am using Standard_DS2 VM size created from Canonical Ubuntu 14.04 LTS Azure gallery image https://portal.azure.com/#create/Canonical.UbuntuServer1404LTS-ARM
Install Docker on Ubuntu 14.04
After the VM is deployed, I SSH into it using its public IP and quickly install Docker following the instructions for Ubuntu 14.04 from https://docs.docker.com/engine/installation/linux/ubuntulinux/
sudo apt-get updatesudo apt-get install apt-transport-https ca-certificatessudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609Decho "deb https://apt.dockerproject.org/repo ubuntu-trusty main" | sudo tee --append /etc/apt/sources.list.d/docker.listsudo apt-get updatesudo apt-get purge lxc-dockerapt-cache policy docker-enginesudo apt-get install linux-image-extra-$(uname -r)sudo apt-get install docker-enginesudo service docker startsudo docker run hello-world
By default, I am not able to run "docker run hello-world" using my user account (i.e. azureuser) without using sudo. When I try it, I get back this message "docker: Cannot connect to the Docker daemon. Is the docker daemon running on this host?" This happens because by default docker daemon's Unix socket is owned by the user root and other users can access it only with sudo.
To enable azureuser to run docker without sudo, I follow the instructions from Docker to create group called "docker", add my user to that group, logout, log back in, and try docker run again.
sudo groupadd dockersudo usermod -aG docker `whoami`logout
After logging back in, I now can run "docker run hello-world" without problems.
Clone Hadoop 2.8 Branch
Since I want to compile specifically the branch called "branch-2.8", I use Git to clone only that specific branch to my home directory (/home/azureuser/hadoop-2.8) using this command:
git clone -b branch-2.8 --single-branch https://github.com/apache/hadoop.git hadoop-2.8
Start Docker Container with Hadoop Build Environment
Following instructions from https://github.com/apache/hadoop/blob/trunk/BUILDING.txt, I start the Hadoop build environment using the provided script:
cd hadoop-2.8/./start-build-env.sh
This process will take some time (~5-10 min) since it installs all of the required build environment tools (JDK, Maven, etc.) in the container.
Building Hadoop within the Docker Container
After the creation process is finished, I see my Hadoop Dev docker container running.
I try to start the Maven binary distribution build without native code, without running the tests, and without documentation.
mvn package -Pdist -DskipTests -Dtar
Resolving Permissions Error
However, I get a permissions error regarding the /home/azureuser/.m2 directory (used by Maven).
To fix this problem, I exit the docker container, and set the ownership of the /home/azureuser/.m2 directory to azureuser:azureuser.
sudo chown azureuser:azureuser ~/.m2
Restarting Container and Starting Maven Build
After the permission problem is resolved, I restart the docker container:
cd hadoop-2.8/./start-build-env.sh
Once within the container, I again try to start the Maven build and package:
mvn package -Pdist -DskipTests -Dtar
This process will take some time to complete. For me, on the Standard_DS2 Azure VM, it took about 9 minutes.
Download Binary Distribution File
After the build process is complete, the resultant files are found in the hadoop-dist/target directory.
I download the hadoop-dist-2.8.0-SNAPSHOT.tar.gz (200MB) file to my local machine from the Ubuntu Azure VM (e.g. using WinSCP, MobaXterm SFTP, etc.).
I also store this file as a block blob in a Azure Storage container so that I can quickly download it from there without rebuilding (https://avdatarepo1.blob.core.windows.net:443/hadoop/hadoop-2.8.0-SNAPSHOT.tar.gz)
Once I have the binary distribution file ready, I can go ahead and delete my Azure VM.
Conclusion
It is very convenient and quick to be able to use an Azure VM running Ubuntu 14.04-LTS and Docker to setup the temporary Hadoop build environment. Although in this case I specifically built the "branch-2.8" branch, the same process can be used to build other Hadoop branches (or trunk) from source.
I’m looking forward to your feedback and questions via Twitter https://twitter.com/ArsenVlad
- Compile and build specific Hadoop source code branch using Azure VM
- Using Eclipse to compile Android source code
- Do not hybrid compile and link source code by using VC and GCC!
- gstreamer source code compile and install
- Build Android source code Compile Environment Under Unbuntu
- Hadoop: Compile and Run URLCat example code
- download android source code and build
- Android Branch and master source code merge(patch)
- Build or Compile Linphone from Source for iPhone and iPad
- android source code compile
- Dowload and compile android source code implemented by qualcomm
- compile mysql from source code
- How to build NCL and NCAR Graphics from source code
- How to build and debug android source code
- Android kernel build from source code and from prebuilt
- how to build and debug wireshark2.4.2 source code
- build qgroundcontrol source code
- python source compile and upgrade
- Mongodb从0到1系列五: 主从复制
- leetcode 466. Count The Repetitions
- 用php代码获取机器的ip地址
- openssh-server
- Mybatis中使用oracle的模糊查询的SQL语句写法
- Compile and build specific Hadoop source code branch using Azure VM
- POJ 2823 Sliding Window
- Jvm内存溢出的几种情况
- springMVC常用注解
- hadoop源码编译、配置安装、测试
- (四)php参考手册---php数组
- Javascript赋值语句中的“&&”操作符和"||"操作符
- 数据库索引
- HTC Vive VR房产项目开发四(切换家具样式)