Hadoop源码分析笔记(一)：Hadoop Configuration详解

来源：互联网发布：软件开发工作计划模板编辑：程序博客网时间：2024/05/16 10:58

Hadoop Conguration详解：

本文着重讲述Hadoop配置模块的基础类：org.apache.hadoop.conf.Configuration。

Java配置文件

JDK本身提供了java.util.Properties类，用户处理简单的配置文件。Properties类，它是继承自Hashtable，表示一个持久的属性集，该集可保存在流中，或者从流中加载。属性列表中的每个键及其对应值都是字符串类型。下面是Properties提供的几个主要的方法。

//从流中加载属性public void load(InputStrem inStream)//从一个Reader中加载属性public void load(Reader reader)//从一个XML文件流中加载属性public void loadFromXML(InputStrem in)//根据指定的Key在属性列表中查找属性public String getProperty(String Key)//根据指定的Key在属性列表中查找属性，如果没有则用给定的默认值public String getProperty(String key,String defaultValue)//同步方法，给属性列表添加值，最终调用Hashtable的方法putpublic synchronized Object setProperty(String key,String value)

从以上的API中，我们可以实现一个简单的配置文件。

Hadoop配置文件

Hadoop没有使用Java本身提供的java.util.Properties管理配置文件，它使用一套独有的配置文件管理系统，并提供API。既使用org.apache.hadoop.conf.Configuration处理配置信息。下面摘录一段Hadoop的配置信息如下：

<?xml version="1.0"?><?xml-stylesheet type="text/xsl"href="configuration.xsl"?><configuration><property> <name>fs.default.name</name>  <value>hdfs://node1:49000</value></property><property>  <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop_home/var</value></property></configuration>

从以上的配置中，我们不难发现Hadoop配置文件的根元素是configuration，一般只包含子元素property。每一个property元素就是一个配置项，配置文件不支持分层或分级。没个配置项一般包括配置属性的名称name、值value和一个关于配置项的描述description。

在Configuration中，每个元素都是String类型的，但是值类型基本上包括Java的基本类型。

Configuration的成员

//configuration的成员boolean quietmode;ArrayList<Object> resources;Set<String> finalParameters;boolean loadDefaults;ArrayList<String> defaultResources;Properties properties;properties overlay;ClassLoader classLoader;//方法public void addResource(InputStream in);public void addResource(Path file);public void addResource(String name);public void addResouce(URL url);private synchronized void addResourceObject(Object resource);public synchronized void reloadConfiguration();private synchronized Properties getProps();

Confuguration中的布尔变量loadDefaults用于确定是否加载默认资源，这些默认资源将保持在defaultResources中。在HDFS中，会把hdfs-default.xml和hdfs-site.xml作为默认资源，并且通过addDefaultResource保持在defaultResources中。在MapReduce中，默认资源是mapred-default.xml和mapred-site.xml。

properties、overlay和finalParameters都是和配置项相关的成员变量。其中，properties和overlay的类型就是前面介绍的java.util.Properties。Hadoop配置文件解析后的键值对都存放在properties中。变量finalParameters的类型是Set<String>,用来保存所有在配置文件中已经被申明为final的键-值对的键。

Hadoop的配置文件都是XML形式，Java本身提供稳定和可靠的XML处理API，如SAX和DOM两种XML处理方法。

DOM和SAX的不同之处在于，DOM首先需要将XML文档一次性装入内存，然后根据文档中定义的元素和属性在内存中创建一个文档对象模型，并且提供使用对象的编程API操作XML文档。下面截取一段DOM解析XML的源代码如下：

private void loadResource(Properties properties, Object name, boolean quiet) {    try {//得到用户创建DOM解析器的工厂      DocumentBuilderFactory docBuilderFactory         = DocumentBuilderFactory.newInstance();      //ignore all comments inside the xml file      docBuilderFactory.setIgnoringComments(true);      //allow includes in the xml file      docBuilderFactory.setNamespaceAware(true);      try {          docBuilderFactory.setXIncludeAware(true);      } catch (UnsupportedOperationException e) {        LOG.error("Failed to set setXIncludeAware(true) for parser "                + docBuilderFactory                + ":" + e,                e);      }//获取解析XML的Builder对象      DocumentBuilder builder = docBuilderFactory.newDocumentBuilder();      Document doc = null;      Element root = null;      //根据不同的资源，做出不同的判断      if (name instanceof URL) {                  // an URL resource        URL url = (URL)name;        if (url != null) {          if (!quiet) {            LOG.info("parsing " + url);          }          doc = builder.parse(url.toString());        }      } else if (name instanceof String) {        // a CLASSPATH resource        URL url = getResource((String)name);        if (url != null) {          if (!quiet) {            LOG.info("parsing " + url);          }          doc = builder.parse(url.toString());        }      } else if (name instanceof Path) {          // a file resource        // Can't use FileSystem API or we get an infinite loop        // since FileSystem uses Configuration API.  Use java.io.File instead.        File file = new File(((Path)name).toUri().getPath())          .getAbsoluteFile();        if (file.exists()) {          if (!quiet) {            LOG.info("parsing " + file);          }          InputStream in = new BufferedInputStream(new FileInputStream(file));          try {            doc = builder.parse(in);          } finally {            in.close();          }        }      } else if (name instanceof InputStream) {        try {          doc = builder.parse((InputStream)name);        } finally {          ((InputStream)name).close();        }      } else if (name instanceof Element) {        root = (Element)name;      }      if (doc == null && root == null) {        if (quiet)          return;        throw new RuntimeException(name + " not found");      }      if (root == null) {        root = doc.getDocumentElement();      }//解析一下节点      if (!"configuration".equals(root.getTagName()))        LOG.fatal("bad conf file: top-level element not <configuration>");      NodeList props = root.getChildNodes();      for (int i = 0; i < props.getLength(); i++) {        Node propNode = props.item(i);        if (!(propNode instanceof Element))          continue;        Element prop = (Element)propNode;        if ("configuration".equals(prop.getTagName())) {          loadResource(properties, prop, quiet);          continue;        }        if (!"property".equals(prop.getTagName()))          LOG.warn("bad conf file: element not <property>");        NodeList fields = prop.getChildNodes();        String attr = null;        String value = null;        boolean finalParameter = false;        for (int j = 0; j < fields.getLength(); j++) {          Node fieldNode = fields.item(j);          if (!(fieldNode instanceof Element))            continue;          Element field = (Element)fieldNode;          if ("name".equals(field.getTagName()) && field.hasChildNodes())            attr = ((Text)field.getFirstChild()).getData().trim();          if ("value".equals(field.getTagName()) && field.hasChildNodes())            value = ((Text)field.getFirstChild()).getData();          if ("final".equals(field.getTagName()) && field.hasChildNodes())            finalParameter = "true".equals(((Text)field.getFirstChild()).getData());        }                // Ignore this parameter if it has already been marked as 'final'        if (attr != null) {          if (value != null) {            if (!finalParameters.contains(attr)) {              properties.setProperty(attr, value);              if (storeResource) {                updatingResource.put(attr, name.toString());              }            } else if (!value.equals(properties.getProperty(attr))) {              LOG.warn(name+":a attempt to override final parameter: "+attr                     +";  Ignoring.");            }          }          if (finalParameter) {            finalParameters.add(attr);          }        }      }            } catch (IOException e) {      LOG.fatal("error parsing conf file: " + e);      throw new RuntimeException(e);    } catch (DOMException e) {      LOG.fatal("error parsing conf file: " + e);      throw new RuntimeException(e);    } catch (SAXException e) {      LOG.fatal("error parsing conf file: " + e);      throw new RuntimeException(e);    } catch (ParserConfigurationException e) {      LOG.fatal("error parsing conf file: " + e);      throw new RuntimeException(e);    }  }

版权申明：本文部分摘自【蔡斌、陈湘萍】所著【Hadoop技术内幕深入解析Hadoop Common和HDFS架构设计与实现原理】一书，仅作为学习笔记，用于技术交流，其商业版权由原作者保留，推荐大家购买图书研究，转载请保留原作者，谢谢！