Xml and the Nametable
来源:互联网 发布:vmware设置nat网络 编辑:程序博客网 时间:2024/05/21 09:59
Quote from "Scott Hanselman's ComputerZen.com"
I got a number (~dozen) of emails about by use of the Nametable in my XmlReader post recently. Charles Cook tried it out and noticed about a 10% speedup. I also received a number of poo-poo emails that said "use XPath" or "don't bother" and "the performance is good enough."
Sure, if that works for you, that's great. Of course, always measure before you make broad statements. That said, here's a broad statement. Using an XmlReader will always be faster than the DOM and/or XmlSerializer. Always.
Why? Because what do you think is underneath the DOM and inside of XmlSerialization? An XmlReader of course.
For documents larger than about 50k, you're looking at least one order of magnitude faster when plucking a single value out. When grabbing dozens, it increases.
Moshe is correct in his pointing out that a nice middle-place perf-wise is the XPathReader (for a certain subset of XPath). There's a number of nice XmlReader implementations that fill the space between XmlTextReader and XPathDocument by providing more-than-XmlReader functionality:
- XPathReader
- XmlBookmarkReader
- SgmlReader
BTW, I would also point out that an XmlReader is what I call a "cursor-based pull implementation." While it's similar to the SAX parsers in that it exposes the infoset rather than the angle brackets, it's not SAX.
Now, all that said, what was the deal with my Nametable usage? Charles explains it well, but I will expand. You can do this if you like:
XmlTextReader tr =
new XmlTextReader("http://feeds.feedburner.com/ScottHanselman");
while (tr.Read())
{
if (tr.NodeType == XmlNodeType.Element && tr.LocalName == "enclosure")
{
while (tr.MoveToNextAttribute())
{
Console.WriteLine(String.Format("{0}:{1}",
tr.LocalName, tr.Value));
}
}
}
The line in red does a string compare as you look at each element. Not a big deal, but it adds up over hundreds or thousands of executions when spinning through a large document.
The NameTable is used by XmlDocument, XmlReader(s), XPathNavigator, and XmlSchemaCollection. It's a table that maps a string to an object reference. This is called "atomization" - meaning we want to think about atom (think small). If they see "enclosure" more than once, they use the object reference rather than have n number of "enclosure" strings internally.
It's not exactly like a Hashtable, as the NameTable will return the object reference if the string has already been atomized.
XmlTextReader tr =
new XmlTextReader("http://feeds.feedburner.com/ScottHanselman");
object enclosure = tr.NameTable.Add("enclosure");
while (tr.Read())
{
if (tr.NodeType == XmlNodeType.Element &&
Object.ReferenceEquals(tr.LocalName, enclosure))
{
while (tr.MoveToNextAttribute())
{
Console.WriteLine(String.Format("{0}:{1}",
tr.LocalName, tr.Value));
}
}
}
The easiest way, IMHO, to think about it is this:
- If you know that you're going to look for an element or attribute with a specific name within any System.Xml class that has an XmlNameTable, preload or warn the parser that you'll be watching for these names.
- When you do a comparison between the current element or attribute and your target, use Object.ReferenceEquals. Instead of a string comparison, you'll just be asking "are these the same object" - which is about the fastest thing that the CLR can do.
- Yes, you can use == rather than Object.ReferenceEquals, but the later makes it totally clear what your intent is, while the former is more vague.
This kind of optimization makes a big perf difference (~10% depending) when using an XmlReader. It makes less of one when using an XPathDocument because you are using Select(ing)Nodes in a loop.
Stealing Charles' words: "...because it involves very little extra code it is perhaps an optimization worth making prematurely."
Even the designers agree: "...using the XmlNameTable gives you enough of a performance benefit to make it worthwhile especially if your processing starts to spans multiple XML components in a piplelining scenario and the XmlNameTable is shared across them i.e. XmlTextReader->XmlDocument->XslTransform."
Oleg laments: "...that something needs to be done to fix this particular usage pattern of XmlReader to not ignore great NameTable idea."
Conclusion: The NameTable is there for a reason, no matter what System.Xml solution you use. This is a the correct and useful pattern and not using it is just silly. If you're going to develop a habit, why not make it a best-practice-habit?
- Xml and the Nametable
- 解决WCF The maximum nametable character count quota (16384) has been exceeded while reading XML data问题
- Dynamic HTML and XML: The XMLHttpRequest Object
- libxml2, the XML C parser and toolkit
- Querying XML Data Using XPATH Expression and the XML DOM
- The Realization of Linked Select by JavaScript and XML
- D.16 What's the story on XML and EDI?
- What is the difference between XML and C or C ?
- Aren't XML, SGML, and HTML all the same thing?
- Secure XML: the new syntax for signatures and encryption
- Telerik OpenAccess ORM and the XML Metadata Source
- camel Direct and import the routes from another XML file
- The tag of<o:p> and the <?xml:namespace prefix = o /> in the html
- Unexpected XML declaration. The XML declaration must be the first node in the document and no white
- How to build and run the XML, RPG and DB2 sample
- The XML Litmus Test Understanding When and Why to Use XML
- The Semantic Web : A Guide to the Future of XML, Web Services, and Knowledge Management
- PHP: Send the Authorization token in a header instead of on the querystring and 读取 XML
- .net多条件查询
- sql创建和使用约束
- How to: Connect to Windows CE Device Without ActiveSync
- windows 2003 不能识别移动硬盘
- tomcat配置
- Xml and the Nametable
- 使用 IIS 进行 ASP.NET 2.0 成员/角色管理
- html转jsp乱码问题
- 写给胃不好的人(我留着了)
- [C++] 拷贝构造函数的调用
- 关于Java的singleton模式的介绍,比较有用
- 写程序很累
- 数据库设计(2009)
- ASP.NET(C#) DataSet数据导出到Excel