大数据-Hadoop学习笔记05

来源：互联网发布：博士知乎编辑：程序博客网时间：2024/05/04 08:27

19.滚动编辑日志—融合镜像

1.融合编辑日志

$>hfs dfsadmin -rollEdits

2.融合镜像文件（需在安全模式下执行）

hdfs dfsadmin -saveNamespace

3.hadoop安全与非安全模式

    如果集群处于安全模式，不能执行一些重要操作，集群启动完成后自动进入安全模式1.安全模式操作    -查看当前模式状态    $>hdfs dfsadmin -safemode get    -进入安全模式    $>hdfs dfsadmin -safemode enter    -离开安全模式    $>hdfs dfsadmin -safemode leave    -等待安全模式结束再执行    $>hdfs dfsadmin -safemode wait

20.一致性模型

    文件系统的一致模型描述了文件读／写的数据可见性。HDFS为提升性能而牺牲了文件系统的一致性，即写入文件不能立即可见。    HDFS提供一个方法来使所有缓存与数据节点强行同步，即对FSDdataOutputStream调用sync()方法。当sync()方法返回成功后，对所有新的reader而言，HDFS能保证文件中到目前为止写入的数据均到达所有datanode的写入管道并且对所有新的reader均可见。    hflush()       //清理客户端缓冲区数据，被其client立即可见    sync()         //不推荐使用    hsync()        //清理客户端缓冲区数据，并写入磁盘，不能立即可见

@Test    public void writeFile() throws Exception {        Path path = new Path("hdfs://write.txt");        FSDataOutputStream dos = fs.create(path);        dos.write("hello write!\n".getBytes());        dos.hflush();        dos.write("how are you".getBytes());        dos.close();        System.out.println("----- over -----");    }

21.集群之间复制数据

    distcp的典型应用场景是在两个HDFS集群之间传输数据。如果两个集群运行相同的版本的Hadoop，就非常适合使用distcp方案。$>hadoop distcp hfs:namenode1/foo hfs://namenode2/bar

22.归档文件

【归档文件】  $>hadoop archive -archiveName myhar.har -p filePath harPath（第一个选项是存档文件的名称，必须以.har为文件扩展名，filePath为需要归档的文件目录，harPath为输出目录）【解归档】$>hdfs dfs -lsr har://myhar.har      //查看归档文件$>hdfs dfs -cp har://myhar.har hdfs://user/    //解归档

23.数据完整性

1.一般性校验没有纠错机制2.校验和对指定的字节数进行校验，由io.bytes.per.checksum配置3.数据写入hdfs的datanode管道时，由最后一个节点负责校验4.datanode在后台开启守护线程-DataBlockScanner，从而定期验证存储在这个datanode上的所有数据块【忽略校验和】hdfs dfs -get -ignoreCrc path【检查校验和】hdfs dfs -checksum path

public class TestCheckSum {    @Test    public void testLocalFileSystem() throws Exception {        Configuration conf = new Configuration();        LocalFileSystem fs = FileSystem.getLocal(conf);        Path path = new Path("/pp.txt");        FSDataOutputStream fos = fs.create(path);        fos.write("hello world!".getBytes());        fos.close();        fs.close();        System.out.println("over");    }}

当写入文件时，会在本地目录产生pp.txt以及.pp.txt.crc用来校验

24.压缩解压缩

ZipInputStream    //解压缩ZipOutputStream   //压缩ZipEntry          //压缩条目1.与hadoop结合使用的常见压缩算法压缩格式总结     格式          工具        文件扩展名        是否可切割  DEFLATE        无         .deflate          否  Gzip           gzip       .gz               否  Bzip2          bzip2      .bz2              是  LZO            loop.      .lzo              否  LZ4            无          .lz4             否  Snappy         无          .snappy          否

2.codec（编码）    @org.junit.Test    public static void CompressDeflate() throws Exception {        String codecClassname = "org.apache.hadoop.io.compress.DefaultCodec";        Class<?> codecClass = Class.forName(codecClassname);        Configuration conf = new Configuration();        CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);               FileInputStream fis = new FileInputStream("/xx.pdf");        FileOutputStream fos = new FileOutputStream("/Users/zhaozhe/Downloads/xx.deflate");        CompressionOutputStream out = codec.createOutputStream(fos);        IOUtils.copyBytes(fis, out, 4096, false);        out.finish();        out.close();        fos.close();        fis.close();        System.out.println("over");         }

3.decodec（解码）

@org.junit.Test    public static void deCompressDeflate() throws Exception {        Configuration conf = new Configuration();        CompressionCodecFactory f = new CompressionCodecFactory(conf);        CompressionCodec codec = f.getCodec(new Path("/xx.deflate"));        CompressionInputStream cis = codec.createInputStream(new FileInputStream("/xx.deflate"));        FileOutputStream fos = new FileOutputStream("/xx.pdf");        IOUtils.copyBytes(cis, fos, 1024);        fos.close();        cis.close();        System.out.println("over");    }

    @org.junit.Test    public static void deCompressDeflate2() throws Exception {        Configuration conf = new Configuration();        Class<?> codecClass = DeflateCodec.class;        DeflateCodec code = (DeflateCodec)ReflectionUtils.newInstance(codecClass, conf);        CompressionInputStream cis = code.createInputStream(new FileInputStream("/xx.deflate"));        FileOutputStream fos = new FileOutputStream("/xx.pdf");        IOUtils.copyBytes(cis, fos, 1024);        fos.close();        cis.close();        System.out.println("over");    }

4.各个压缩算法性能比较空间：Bzip2>Deflate>Gzip>Lz4压缩时间：Lz4>Gzip>Deflate>Bzip2解压时间：Lz4>Gzip>deflate>Bzip2

0 0