Inside MSXML Performance(MSXML性能分析) (6)

来源:互联网 发布:mac rss阅读器 编辑:程序博客网 时间:2024/05/20 18:52

Validation

验证

Validation compares the types of elements in an XML document against a Document Type Definition (DTD) or XML Schema. For example, the DTD may say that all "Customer" elements must contain a child "Name" element. Take a look at the DTD for Hamlet.xml (hamletdtd.htm) and the XML Schema for Hamlet.xml (hamletschema.htm).

验证是指按照文档类型定义(DTD)或者XML Schema来检查XML文档中的元素类型。例如,DTD中规定所有“Customer”元素必须包含一个“Name”子元素。可以看一下Hamlet.xmlDTDhamletdtd.htm)和Hamlet.xmlXML Schemahamletschema.htm[SL1] 

Validation is another huge area for performance analysis, but I only have time for a brief mention today. Validation is expensive for several reasons. First, it involves loading a separate file (the DTD or XML Schema) and compiling it. Second, it requires state machinery for performing the validation itself. Third, when the schema also includes information about data types, any data types also have to be validated. For example, if an XML element or attribute is typed as an integer, that text has to be parsed to see if it is a valid integer.

验证是性能分析的另一大领域,但是这里只有一个比较简单的讨论。由于很多原因,验证的代价是很大的。首先,它牵涉到另一个单独的文件(DTD或者XML Schema)需要载入。第二,它需要状态机(state machinery)配合进行验证。第三,如果Schema包含了数据类型的信息,那么所有数据类型都必须经过验证。例如,如果一个XML元素或类型被定为整型,那么相应的文本必须经过解析来查看它是否是一个合法的整型。

The following table shows the difference between loading without validation, with DTD validation, and with XML Schema validation.

下表中显示了载入时没有验证,有DTD验证和有XML Schema验证的不同情况:

Sample
样本

Load (milliseconds)
载入(毫秒)

DTD (milliseconds)
DTD
(毫秒)

Schema (milliseconds)
Schema
(毫秒)

Schema plus datatypes (milliseconds)
Schema
并有数据类型检验(毫秒)

Ado.xml

662

2,230

2,167

3064

Hamlet.xml

106

215

220

N/A

Ot.xml

1,069

2,168

2,193

N/A

Northwind.xml

64

123

127

N/A

The bottom line is to expect validation to double or triple the time it takes to load your documents. New to MSXML January 2000 Web Release is a SchemaCollection object, which allows you to load the XML Schema once and then share it across your documents for validation. This will be discussed in a future article.

最起码,验证可能会使载入文档的时间增加两倍或三倍。MSXML January 2000 Web Release中新增加了SchemaCollection对象,它能够使得XML Schema只需载入一次,并能在各文档验证时共享。这将在以后的文章中讨论。

XSL

XSL can be a big performance win over using DOM code for generating "transformed" reports from an XML document. For example, suppose you wanted to print out all the speeches by Hamlet in the sample Hamlet.xml. You might use selectNodes to find all the speeches by Hamlet, then use another selectNodes call to iterate through the lines of each of those speeches, as follows:

XSL在性能上大大优于使用DOM代码去转化XML文档。例如,假设你想要打印出Hamlet.xml中哈姆雷特所有的话。你可能会用selectNodes来查找所有哈姆雷特的话,然后使用另一个selectNodes来查找这些话中的每一行,代码如下:

function Method1(doc)

{

    var speeches = doc.selectNodes("/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']");

    var s = speeches.nextNode();

    var out = "";

    while (s)

    {

        var lines = s.selectNodes("LINE");

        var line = lines.nextNode();

        while (line)

        {

            out += line.text;

            line = lines.nextNode();

        }

        out += "<hr>";

        s = speeches.nextNode();

    }

    return out;

}

This works, but it takes about 1,500 milliseconds. A better way to tackle this problem is to use XSL. The following XSL style sheet (or template) does exactly the same thing:

这能够达到目的,但是会花大概1,500毫秒。一个更好的处理这个问题的方式是使用XSL。以下的XSL样式表(或者模板)可以完成同样的任务:

<xsl:template xmlns:xsl="http://www.w3.org/TR/WD-xsl">

  <xsl:for-each select="/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']">

    <xsl:for-each select="LINE">

        <xsl:value-of/>

    </xsl:for-each>

    <hr/>

  </xsl:for-each>

</xsl:template>

You can then write the following simpler script code that uses this template:

你可以使用该模板写以下简单的脚本代码:

function Method2(doc)

{

    var xsl = new ActiveXObject("Microsoft.XMLDOM");

    xsl.async = false;

    xsl.load("hamlet.xsl");

    return doc.transformNode(xsl)

}

This takes only 203 milliseconds—it is more than seven times faster. This is a rather compelling reason to use XSL. In addition, it is easier to update the XSL template than it is to rewrite your code every time you want to get a different report.

这只需203毫秒就可以了——比前面的方法快7倍以上。这也是为什么要使用XSL的有力理由。而且,如果你想要得到不同的报告,改写XSL模板比改写你的代码要容易得多。

The problem is that XSL is very powerful. You have a lot of rope with which to hang yourself, so to speak. XSL has a rich expression language that can be used to walk all over the document in any order. It is highly recursive, and the MSXML parser includes script support for added extensibility. Using all these features with reckless abandon will result in slow XSL style sheets. The following sections describe a few specific traps to watch out for.

问题是XSL太强大了。所以你可以用很多方法来处理问题。XSL有很丰富的表达语言让你以任何次序来遍历文档。它是高度递归的,而且MSXML解析器增加了对扩展性的脚本支持。滥用这些功能会导致效率很低的XSL样式表。以下几个部分会讨论一些必须注意的陷阱。

Scripting

脚本

It is convenient to call script from within an XSL style sheet, and it is a great extensibility mechanism. But as always, there is a catch. Script code is slow. For purposes of illustration, imagine that we wrote the following style sheet instead of the one shown previously:

XSL样式表中可以很方便的调用脚本,这提供了很好的扩展性能。但是它总是带来性能上的损失。脚本代码的执行速度比较慢。为了说明这一点,我们改写前面的样式表如下:

<xsl:template xmlns:xsl="http://www.w3.org/TR/WD-xsl">

  <xsl:for-each select="/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']">

    <xsl:eval>this.text</xsl:eval>

    <hr/>

  </xsl:for-each>

</xsl:template>

 

This produces the same result, but it takes 266 milliseconds instead of 203 milliseconds—a whopping 23 percent slower. The more frequently your xsl:eval statements are executed, the slower the performance becomes. For purposes of illustration only, lets move the xsl:eval inside the inner for-each loop:

这产生相同的结果,但执行需要266毫秒而不是203毫秒了,慢了整整23%。你越经常执行xsl:eval语句,性能下降就越明显。为了说明这一点,将xsl:eval移到内层for-each循环中:

    <xsl:for-each select="LINE">

        <xsl:eval>this.text</xsl:eval>

    </xsl:for-each>

This one takes 516 milliseconds, more than twice as slow. The bottom line is to be careful with script code in XSL.

这个代码的执行速度为516毫秒,比原先慢了2倍。所以,你应该对XSL中的脚本代码小心使用。

The Dreaded "//" Operator

令人担心的“//”运算符

Watch out for the "//" operator. This little operator walks the entire subtree looking for matches. Developers use it more than they should just because they are too lazy to type in the full path. (I catch myself using it all the time, too.) For example, try switching the select statement in the previous example to the following:

小心“//”运算符。这个小小的运算符会遍历整个子树来进行查找匹配。开发者经常在不必要的情况下使用它,只是因为他们懒得打入完整路径。(我发现我也总是使用它。)例如,将前面例中的select语句改写如下:

  <xsl:for-each select="//SPEECH[SPEAKER='HAMLET']">

The time it takes to perform the selection jumps from 203 milliseconds to 234 milliseconds. My laziness just cost me a 15 percent tax.

这次,它的执行时间从203毫秒升至234毫秒。我的懒惰造成了15%的损失。

Prune the Search Tree

精简查找树

If there's anything you can do to "prune" the search tree, by all means do it. For example, suppose you were reporting all speeches by Bernardo from Hamlet.xml. All Bernardo's speeches happen to be in Act I. If you already knew this, you could skip the entire search of Act II through Act V. The following shows what the new select statement would look like:

如果你有任何方法可以“精简”查找树,那就尽力去做。例如,假设你想查找Hamle.xml中所有Bernardo的话。而所有他的话都在第一幕中。如果你已经知道这一点了,你就应该跳过查找第二至第四幕。以下是新的select语句:

select="/PLAY/ACT[TITLE='ACT I']/SCENE/SPEECH[SPEAKER='BERNARDO']"

This chops the time down from 141 milliseconds to 125 milliseconds, a healthy 11 percent improvement.

这使得运行时间从141毫秒降低到125毫秒,整整提高了11%性能。

Cross-Threading Models

跨线程模式

Before, the transformNode and transformNodeToObject methods required that the threading model of the style sheet and that of the document being transformed be the same. In the MSXML January 2000 Web Release, you can use free-threaded style sheets on rental documents and vice versa. This means you can get the performance benefit of using rental documents at the same time as the performance win of sharing free-threaded style sheets across threads.

以前,transformNodetransformNodeToObject方法要求样式表和被转换文档的线程模式必须相同。在MSXML January 2000 Web Release中,你可以在租用模式的文档上使用自由线程的样式表,也可以反过来。这意味着你可以在得到租用文档的性能优势的同时享受自由线程模式的样式表在各线程之中共享的性能提升。

Conclusion


 [SL1]Since the link is not available, we can omit this sentence

原创粉丝点击