copy data from difrent version hadoop
来源:互联网 发布:mac升级失败系统丢失 编辑:程序博客网 时间:2024/04/30 00:59
I had to copy data from one Hadoop cluster to another recently. However, the two clusters ran different versions of Hadoop, which made using distcp a little tricky.
Some notes of distcp: By default, distcp will skip files that already exist in the destination, but they can be overwritten by supplying the -overwrite option. You can also update only files that have changed using the -update option. distcp is implemented as a MapReduce job where the work of copying is done by maps that run in parallel across the cluster. There are no reducers. Each file is copied by a single map, and distcp tries to give each map approximately the same amount of data, by bucketing files into roughly equal allocations.
The following command will copy the folder contents from one Hadoop cluster to a folder on another Hadoop cluster. Using hftp is necessary because the clusters run a different version of Hadoop. The command must be run on the destination cluster. Be sure your user has access to write to the destination folder.
hadoop distcp -pb hftp://namenode:50070/tmp/* hdfs://namenode/tmp/
Note: The -pb option will preserve the block size.
Double Note: For copying between two different versions of Hadoop we must use the HftpFileSystem, which is a read-only files system. So the distcp must be run on the destination cluster.
The following command will copy data from Hadoop clusters that are the same version.
hadoop distcp -pb hdfs://namenode/tmp/* hdfs://namenode/tmp/
Using Cloudera Manager I found it very easy to configure single-node Hadoop development nodes that can be used by our developers to test their Pig scripts. However, out of the box dfs.replication is set to 3, which is great for a cluster, but for a single-node development workstation, this throws warnings. I set the dfs.replication to 1, but any blocks written previously are reported as under replicated blocks. Having a quick way to change the replication factor is very handy.
There are other reasons for managing the replication level of data on a running Hadoop system. For example, if you don’t have even distribution of blocks across your DataNodes, you can increase replication temporarily and then bring it back down.
To set replication of an individual file to 4:
sudo -u hdfs hadoop dfs -setrep -w 4 /path/to/file
You can also do this recursively. To change replication of entire HDFS to 1:
sudo -u hdfs hadoop dfs -setrep -R -w 1 /
I found these easy instructions on the Streamy Development Blog.
- copy data from difrent version hadoop
- sql server 2008 copy data from one table to another
- copy redis data from server A to server B
- mysqldump version Error while exporting data from mysql
- Copy From
- sqoop exort data from DB2 or Oracle to Hadoop
- 如何用C++递归来实现copy even data from the original BST
- Attribute meta-data#android.support.VERSION@value value=(25.3.1) from [com.android.support:appcompat
- Manifest merger failed : Attribute meta-data#android.support.VERSION@value value=(25.3.1) from
- 复制数据copy from
- Copy from chromium-dev!
- anthor copy from interview
- Copy from chapter3
- Using Data Copy
- Data Copy Tool
- bug宝典 hadoop篇 /hadoop/hdfs/data is in an inconsistent state: file VERSION has cTime missing.
- hadoop的集群copy
- hadoop的集群copy
- Thinkphp基础之输入类处理
- DeepLearning工具Theano学习记录(三) CNN卷积神经网络
- 【Unity3D插件】在Unity中读写文件数据:LitJSON快速教程 - 王选易
- android背景选择器
- vim使用教程(全)-最好的编辑器
- copy data from difrent version hadoop
- p51第二章 第十七题
- JAVA随机生成文件名:当前年月日时分秒+五位随机数
- Thinkphp基础之文件处理类
- 为什么自定义控件无法显示
- 在C#用HttpWebRequest中发送GET/HTTP/HTTPS请求
- eclipse导入maven项目报错
- 装载-显示-保存图像的方法
- DSP C6748 与 FPGA 通讯方式的选择