DOM解析XML（四）

来源：互联网发布：h动漫推荐知乎编辑：程序博客网时间：2024/05/21 11:21

上一篇中我们讨论了解析器如何将XML文件转化为Document（总体大概知道是如何解析，还需后期对底层代码好好了解），下面我们看看转化为Document之后，是如何取的Document里面的一些元素节点的呢？

再看代码之后我要了解一些基础的知识：

java.util.Vector：我个人的理解它的本质是一个大小可变化的数组，在这个基础上又封装了一些方法来存取数据

上一篇我们知道：Document document = builder.parse(inputStream);返回的对象是DeferredDocumentImpl

那我们继续逐行分析我们的例子代码：Element element = document.getDocumentElement();

它是调用父类com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl的方法：

/**     * Convenience method, allowing direct access to the child node     * which is considered the root of the actual document content. For     * HTML, where it is legal to have more than one Element at the top     * level of the document, we pick the one with the tagName     * "HTML". For XML there should be only one top-level     *     * (HTML not yet supported.)     */    public Element getDocumentElement() {        if (needsSyncChildren()) {            synchronizeChildren();        }        return docElement;    }

最后返回的对象是com.sun.org.apache.xerces.internal.dom.ElementImpl：

我们在来看看这句：NodeList bookNodes = element.getElementsByTagName("book");

/**     * Returns a NodeList of all descendent nodes (children,     * grandchildren, and so on) which are Elements and which have the     * specified tag name.     * <p>     * Note: NodeList is a "live" view of the DOM. Its contents will     * change as the DOM changes, and alterations made to the NodeList     * will be reflected in the DOM.     *     * @param tagname The type of element to gather. To obtain a list of     * all elements no matter what their names, use the wild-card tag     * name "*".     *     * @see DeepNodeListImpl     */    public NodeList getElementsByTagName(String tagname) {    return new DeepNodeListImpl(this,tagname);    }

看看这句：bookNodes.getLength()：

/** Returns the length of the node list. */    public int getLength() {        // Preload all matching elements. (Stops when we run out of subtree!)        item(java.lang.Integer.MAX_VALUE);        return nodes.size();    }

这里就用到了上面说的Vector知识点：

/** Returns the node at the specified index. */    public Node item(int index) {    Node thisNode;        // Tree changed. Do it all from scratch!    if(rootNode.changes() != changes) {            nodes   = new Vector();                 changes = rootNode.changes();    }            // In the cache    if (index < nodes.size())              return (Node)nodes.elementAt(index);            // Not yet seen    else {                // Pick up where we left off (Which may be the beginning)    if (nodes.size() == 0)             thisNode = rootNode;    else        thisNode=(NodeImpl)(nodes.lastElement());        // Add nodes up to the one we're looking for    while(thisNode != null && index >= nodes.size()) {    thisNode=nextMatchingElementAfter(thisNode);    if (thisNode != null)        nodes.addElement(thisNode);        }            // Either what we want, or null (not avail.)    return thisNode;               }    } // item(int):Node

下面这个方法Node current一开始传入进来就是DeferredElementImpl类

    /**      * Iterative tree-walker. When you have a Parent link, there's often no     * need to resort to recursion. NOTE THAT only Element nodes are matched     * since we're specifically supporting getElementsByTagName().     */    protected Node nextMatchingElementAfter(Node current) {    Node next;    while (current != null) {    // Look down to first child.    if (current.hasChildNodes()) {    current = (current.getFirstChild());    }    // Look right to sibling (but not from root!)    else if (current != rootNode && null != (next = current.getNextSibling())) {current = next;}// Look up and right (but not past root!)else {next = null;for (; current != rootNode; // Stop when we return to starting pointcurrent = current.getParentNode()) {next = current.getNextSibling();if (next != null)break;}current = next;}// Have we found an Element with the right tagName?// ("*" matches anything.)    if (current != rootNode         && current != null        && current.getNodeType() ==  Node.ELEMENT_NODE) {if (!enableNS) {    if (tagName.equals("*") ||((ElementImpl) current).getTagName().equals(tagName))    {return current;    }} else {    // DOM2: Namespace logic.     if (tagName.equals("*")) {if (nsName != null && nsName.equals("*")) {    return current;} else {    ElementImpl el = (ElementImpl) current;    if ((nsName == null && el.getNamespaceURI() == null)|| (nsName != null    && nsName.equals(el.getNamespaceURI())))    {return current;    }}    } else {ElementImpl el = (ElementImpl) current;if (el.getLocalName() != null    && el.getLocalName().equals(tagName)) {    if (nsName != null && nsName.equals("*")) {return current;    } else {if ((nsName == null     && el.getNamespaceURI() == null)    || (nsName != null &&nsName.equals(el.getNamespaceURI()))){    return current;}    }}    }}    }// Otherwise continue walking the tree    }    // Fell out of tree-walk; no more instances found    return null;    } // nextMatchingElementAfter(int):Node

我们先来看看com.sun.org.apache.xerces.internal.dom.ParentNode类中的current.hasChildNodes()这句都做了些什么：

 /**     * Test whether this node has any children. Convenience shorthand     * for (Node.getFirstChild()!=null)     */    public boolean hasChildNodes() {        if (needsSyncChildren()) {            synchronizeChildren();        }        return firstChild != null;    }

这里的synchronizeChildren()方法是实现类DeferredElementImpl实现如下：

protected final void synchronizeChildren() {        DeferredDocumentImpl ownerDocument =            (DeferredDocumentImpl) ownerDocument();        ownerDocument.synchronizeChildren(this, fNodeIndex);    } // synchronizeChildren()

仅仅针对这个例子而言：注意看getNodeObject(index)返回的是ChildNode对象其实是DeferredTextImpl，因为element下面还有text

这里就把text对象赋值给element： p.firstChild = firstNode;所以在后面才能取到

    /**     * Synchronizes the node's children with the internal structure.     * Fluffing the children at once solves a lot of work to keep     * the two structures in sync. The problem gets worse when     * editing the tree -- this makes it a lot easier.     * This is not directly used in this class but this method is     * here so that it can be shared by all deferred subclasses of ParentNode.     */    protected final void synchronizeChildren(ParentNode p, int nodeIndex) {        // we don't want to generate any event for this so turn them off        boolean orig = getMutationEvents();        setMutationEvents(false);        // no need to sync in the future        p.needsSyncChildren(false);        // create children and link them as siblings        ChildNode firstNode = null;        ChildNode lastNode = null;        for (int index = getLastChild(nodeIndex);             index != -1;             index = getPrevSibling(index)) {            ChildNode node = (ChildNode) getNodeObject(index);            if (lastNode == null) {                lastNode = node;            }            else {                firstNode.previousSibling = node;            }            node.ownerNode = p;            node.isOwned(true);            node.nextSibling = firstNode;            firstNode = node;        }        if (lastNode != null) {            p.firstChild = firstNode;            firstNode.isFirstChild(true);            p.lastChild(lastNode);        }        // set mutation events flag back to its original value        setMutationEvents(orig);    } // synchronizeChildren(ParentNode,int):void

我们可以看到在 type == 3，会新建DeferredTextImpl对象：

/** Instantiates the requested node object. */    public DeferredNode getNodeObject(int nodeIndex) {        // is there anything to do?        if (nodeIndex == -1) {            return null;        }        // get node type        int chunk = nodeIndex >> CHUNK_SHIFT;        int index = nodeIndex & CHUNK_MASK;        int type = getChunkIndex(fNodeType, chunk, index);        if (type != Node.TEXT_NODE && type != Node.CDATA_SECTION_NODE) {            clearChunkIndex(fNodeType, chunk, index);        }        // create new node        DeferredNode node = null;        switch (type) {            //            // Standard DOM node types            //            case Node.ATTRIBUTE_NODE: {                if (fNamespacesEnabled) {                    node = new DeferredAttrNSImpl(this, nodeIndex);                } else {                    node = new DeferredAttrImpl(this, nodeIndex);                }                break;            }            case Node.CDATA_SECTION_NODE: {                node = new DeferredCDATASectionImpl(this, nodeIndex);                break;            }            case Node.COMMENT_NODE: {                node = new DeferredCommentImpl(this, nodeIndex);                break;            }            // NOTE: Document fragments can never be "fast".            //            //       The parser will never ask to create a document            //       fragment during the parse. Document fragments            //       are used by the application *after* the parse.            //            // case Node.DOCUMENT_FRAGMENT_NODE: { break; }            case Node.DOCUMENT_NODE: {                // this node is never "fast"                node = this;                break;            }            case Node.DOCUMENT_TYPE_NODE: {                node = new DeferredDocumentTypeImpl(this, nodeIndex);                // save the doctype node                docType = (DocumentTypeImpl)node;                break;            }            case Node.ELEMENT_NODE: {                if (DEBUG_IDS) {                    System.out.println("getNodeObject(ELEMENT_NODE): "+nodeIndex);                }                // create node                if (fNamespacesEnabled) {                    node = new DeferredElementNSImpl(this, nodeIndex);                } else {                    node = new DeferredElementImpl(this, nodeIndex);                }                // save the document element node                if (docElement == null) {                    docElement = (ElementImpl)node;                }                // check to see if this element needs to be                // registered for its ID attributes                if (fIdElement != null) {                    int idIndex = binarySearch(fIdElement, 0,                                               fIdCount-1, nodeIndex);                    while (idIndex != -1) {                        if (DEBUG_IDS) {                            System.out.println("  id index: "+idIndex);                            System.out.println("  fIdName["+idIndex+                                               "]: "+fIdName[idIndex]);                        }                        // register ID                        String name = fIdName[idIndex];                        if (name != null) {                            if (DEBUG_IDS) {                                System.out.println("  name: "+name);                                System.out.print("getNodeObject()#");                            }                            putIdentifier0(name, (Element)node);                            fIdName[idIndex] = null;                        }                        // continue if there are more IDs for                        // this element                        if (idIndex + 1 < fIdCount &&                            fIdElement[idIndex + 1] == nodeIndex) {                            idIndex++;                        }                        else {                            idIndex = -1;                        }                    }                }                break;            }            case Node.ENTITY_NODE: {                node = new DeferredEntityImpl(this, nodeIndex);                break;            }            case Node.ENTITY_REFERENCE_NODE: {                node = new DeferredEntityReferenceImpl(this, nodeIndex);                break;            }            case Node.NOTATION_NODE: {                node = new DeferredNotationImpl(this, nodeIndex);                break;            }            case Node.PROCESSING_INSTRUCTION_NODE: {                node = new DeferredProcessingInstructionImpl(this, nodeIndex);                break;            }            case Node.TEXT_NODE: {                node = new DeferredTextImpl(this, nodeIndex);                break;            }            //            // non-standard DOM node types            //            case NodeImpl.ELEMENT_DEFINITION_NODE: {                node = new DeferredElementDefinitionImpl(this, nodeIndex);                break;            }            default: {                throw new IllegalArgumentException("type: "+type);            }        } // switch node type        // store and return        if (node != null) {            return node;        }        // error        throw new IllegalArgumentException();    } // createNodeObject(int):Node

我这里有一点搞不明白，当执行完这句话node = new DeferredElementImpl(this, nodeIndex);之后，node里面属性name的值就有了。

后来也看了很多次，发现每次传入nodeIndex的值都不一样，我在想原因是不是在这里呢？

0 0