xmllint命令处理xml与html的例子(js Command-line JSON)

来源:互联网 发布:网络教育研究的资料 编辑:程序博客网 时间:2024/05/17 08:49

mac 安装:brew install libxml2

例子

curl http://www.111cn.net /ip/?q=8.8.8.8 2>/dev/null | xmllint --html --xpath "//ul[@id='csstb']" - 2>/dev/null | sed -e 's/<[^>]*>//g'
上例中主要是通过在123cha上查询的IP地址的归属情况后,通过提取结果(ul#csstb),只获取文本部分的内容。上面的脚本语句执行后的结果如下:


[您的查询]:8.8.8.8
本站主数据:
美国
本站辅数据:Google Public DNS提供:hypo
美国 Google免费的Google Public DNS提供:zwstar参考数据一:美国
参考数据二:美国
下面再结合示例看下其他主要参数的用法。

1、 --format

此参数用于格式化xml,使其具有良好的可读性。
假设有xml(person.xml)内容如下:


<person><name>ball</name><age>30</age<sex>male</sex></person>  
执行如下操作后其输出为更易读的xml格式:


#xmllint --format person.xml
    <?xml version="1.0"?>
    <person>
      <name>ball</name>
      <age>30</age>
      <sex>male</sex>
    </person> 

2、 --noblanks

与--format相反,有时为了节省传输量,我们希望去掉xml中的空白,这时我们可以使用--noblanks命令。
假设xml(person.xml)内容如下


<?xml version="1.0"?>
    <person>
      <name>ball</name>
      <age>30</age>
      <sex>male</sex>
    </person>  
执行该参数操作后,其输出结果为:


#xmllint --noblanks person.xml
    <?xml version="1.0"?>
    <person><name>ball</name><age>30</age><sex>male</sex></person>  
3、--schema

使用scheam验证xml文件的正确性(XML Schema 是基于 XML 的 DTD 替代者)
假设有xml文件(person.xml)和scheam文件(person.xsd)文件,内容分别如下

person.xml


<?xml version="1.0"?>
    <person>
      <name>ball</name>
      <age>30</age>
      <sex>male</sex>
    </person>  
person.xsd


<?xml version="1.0"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="name" type="xs:string"/>
      <xs:element name="age" type="xs:integer"/>
      <xs:element name="sex">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="male"/>
            <xs:enumeration value="female"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element name="person">
        <xs:complexType>
          <xs:all>
            <xs:element ref="name"/>
            <xs:element ref="age"/>
            <xs:element ref="sex"/>
          </xs:all>
        </xs:complexType>
      </xs:element>
    </xs:schema>  
按如下命令执行后的结果是:


#xmllint --schema person.xsd person.xml
    <?xml version="1.0"?>
    <person>
      <name>ball</name>
      <age>30</age>
      <sex>male</sex>
    </person>
    person.xml validates  
注:默认情况下,验证后会输出验证的文件内容,可以使用 --noout选项去掉此输出,这样我们可以只得到最后的验证结果。


#xmllint --noout --schema person.xsd person.xml
person.xml validates  
下面我们改动person.xml,使这份文件age字段和sex都是不符合xsd定义的。


#xmllint --noout --schema person.xsd person.xml
person.xml:4: element age: Schemas validity error : Element 'age': 'not age' is not a valid value of the atomic type 'xs:integer'.
person.xml:5: element sex: Schemas validity error : Element 'sex': [facet 'enumeration'] The value 'test' is not an element of the set {'male', 'female'}.
person.xml:5: element sex: Schemas validity error : Element 'sex': 'test' is not a valid value of the local atomic type.
person.xml fails to validate 
可以看到xmllint成功的报出了错误!

4、 关于--schema的输出

在讲输出之前先看下面一个场景,假如你想通过php执行xmllint然后拿到返回结果,你的代码通常应该是这个样子valid.php


<?php
    $command = "xmllint --noout --schema person.xsd person.xml";
    exec($command, $output, $retval);
    //出错时返回值不为0
    if ($retval != 0){
            var_dump($output);
    }
    else{
        echo "yeah!";
    }  
我们保持上文中person.xml的错误。
执行此代码,你会发现,你拿到的output不是错误,而是array(0) {}, amazing!
为什么会这样呢?

因为xmllint --schema,如果验证出错误,错误信息并不是通过标准输出(stdout)显示的,而是通过标准错误(stderr)进行显示的。
而exec的output参数拿到的,只能是标准输出(stdout)显示的内容。
所以,为了拿到出错信息,我们需要将标准错误重定向到标准输出,对应修改代码:


$command = "xmllint --noout --schema person.xsd person.xml 2>$1";  
再次执行valid.php,错误信息顺利拿到!

例子

首先建立一份 xml 文档,命名为 po.xml,其内容如下:

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>然后为 po.xml 写的 schema 文件,取名为 po.xsd,内容如下:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:annotation>
  <xsd:documentation xml:lang="en">
   Purchase order schema for Example.com.
   Copyright 2000 Example.com. All rights reserved.
  </xsd:documentation>
 </xsd:annotation>
 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
 <xsd:element name="comment" type="xsd:string"/>
 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>
 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:decimal"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN"
     fixed="US"/>www.111cn.net
 </xsd:complexType>
 <xsd:complexType name="Items">
  <xsd:sequence>
   <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
    <xsd:complexType>
     <xsd:sequence>
      <xsd:element name="productName" type="xsd:string"/>
      <xsd:element name="quantity">
       <xsd:simpleType>
        <xsd:restriction base="xsd:positiveInteger">
         <xsd:maxExclusive value="100"/>
        </xsd:restriction>
       </xsd:simpleType>
      </xsd:element>
      <xsd:element name="USPrice"  type="xsd:decimal"/>
      <xsd:element ref="comment"   minOccurs="0"/>
      <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
     </xsd:sequence>
     <xsd:attribute name="partNum" type="SKU" use="required"/>
    </xsd:complexType>
   </xsd:element>
  </xsd:sequence>
 </xsd:complexType>
 <!-- Stock Keeping Unit, a code for identifying products -->
 <xsd:simpleType name="SKU">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="d{3}-[A-Z]{2}"/>
  </xsd:restriction>
 </xsd:simpleType>
</xsd:schema>使用 xmllint 对 po.xml 文件进行校验:

$ xmllint   -schema po.xsd po.xml如果无出错信息,就说明校验通过了。

The xmllint Shell

Peter Lavin

2013-06-15

Table of Contents

1. xmllint Options
2. The xmllint Shell
3. Using Shell Commands
4. Working with Multiple Files
5. Using Namespaces
6. About the Author

XML files are human-readable, text files so it is easy to search them from the command line using grep or from within a text editor. But if you want to do something a little more sophisticated-count the number of elements, for example-you'll need to take a different approach. You could write a transformation style sheet to extract such information but this would be overkill. It is much easier to use xmllint from the command line to find out this kind of information.

This command is available on Mac OS X and Linux. It is installed by default on Mac OS X and, on Linux, if it isn't already installed, you can quickly do so by installing the libxml2 package.

1. xmllint Options

One of the primary uses for the xmllint command is to validate that an XML file is well formed and that it conforms to a specific DTD or schema; this is done by using the --valid option. If your XML file contains other XIncluded files you can also use xmllint in the following way to resolve included files and output the result to a file:

shell> xmllint --xinclude manual.xml --output tmp.xml

The output file tmp.xml will include the contents of any xi:include elements. Also, the --format option is very useful for quickly formatting files from the command line. However, the most interesting option is the --shell option.

[Note] 

For a complete list of all the options available view the xmllint man page.

2. The xmllint Shell

Use xmllint with the --shell option in the following way:

shell> xmllint --shell file_name.xml

You can use other options with the --shell option. For example, if you wish to resolve included files, use the --xinclude option as well.

You can display the list of the commands available from the shell by typing help. You should see output similar to the following:

  base         display XML base of the node  setbase URI  change the XML base of the node  bye          leave shell  cat [node]   display node or current node  cd [path]    change directory to path or to root  dir [path]   dumps informations about the node                (namespace, attributes, content)  du [path]    show the structure of the subtree under                path or the current node  exit         leave shell  help         display this help  free         display memory usage  load [name]  load a new document with name  ls [path]    list contents of path or the current directory  set xml_fragment replace the current node content with the                fragment parsed in context  xpath expr   evaluate the XPath expression in that context                and print the result  setns nsreg  register a namespace to a prefix in the                XPath evaluation context               format for nsreg is: prefix=[nsuri]                (i.e. prefix= unsets a prefix)  setrootns    register all namespace found on the                root element the default namespace                if any uses 'defaultns' prefix  pwd          display current working directory  quit         leave shell  save [name]  save this document to name or the original name  write [name] write the current node to the filename  validate     check the document for errors  relaxng rng  validate the document against the Relax-NG schemas  grep string  search for a string in the subtree

There are a number of relatively trivial but necessary commands such as help and exit. All the commands are useful but this article deals primarily with the following commands:

  • cat node - output all nodes below the current node

  • cd path - change to another node; you can only use this command with unique nodes.

  • dir - dump information about the current node

  • xpath expression - evaluate and print the XPath expression

  • setns - register a namespace

  • write filename - write the current node to file

[Note] 

If you want to write your complete shell session to file run the shell after first issuing the script command. This can be particularly useful on Mac OS X where the write command does not work.

3. Using Shell Commands

When you first open the xmllint shell the cursor, / >, indicates that you are at the root node. You will likely want to navigate to specific nodes and view the file contents below that node. You can do this with the cdand cat commands.

/ > cd /options/option[@name = 'address_metrics_lifetime']option >

On success the cursor changes to the name of the current node. To view the current node, use the cat command-this displays output to the screen. To create a text file of the output of cat, use writefile_name.xml.

You can only use cd to navigate to unique nodes. Attempt to navigate to a non-unique node and you will see output such as the following:

/ > cd /options/option/options/option is a 353 Node Set

If there is no unique identifier for the node that you wish to navigate to, you can use a subscript in the following way:

/ > cd /options/option[1]option >

To output information about the current node use the dir command:

option > dir ELEMENT option  ATTRIBUTE name    TEXT      content=address_metrics_cleanse_interval  ATTRIBUTE type    TEXT      content=sendingoption > 

4. Working with Multiple Files

You can open the xmllint shell specifying multiple files but the behaviour is not intuitive. In the following example, the shell is opened with two different files that have the same structure. The options.xml has a root element <options> with 353 <option>s while the smpp_options.xml has a root element <options> containing only 57 <option>s.

shell> xmllint --shell options.xml smpp_options.xml/ > baseoptions.xml/ > xpath count(//option)Object is a number : 353/ > bye/ > basesmpp_options.xml/ > xpath count(//option)Object is a number : 57/ > setbase options.xml/ > baseoptions.xml

If you invoke help from the shell the bye command is tersely described as leave shell. As this sequence of commands shows, bye also exits the first file passed to the --shell option.

Once you have exited the first shell, you cannot return to it by using setbase even though the command seems to have performed it's function-as the output of base erroneously indicates. For this reason it is perhaps less confusing to open the shell specifying only one file and then use the load command to switch to a different file:

shell> xmllint --shell options.xml/ > baseoptions.xml/ > xpath count(//option)Object is a number : 353/ > load smpp_options.xml/ > basesmpp_options.xml/ > xpath count(//option)Object is a number : 57

The second count indicates that the load command executed successfully.

5. Using Namespaces

To this point none of the examples use namespaces. To use an XML file with namespaces you must use the setns command. Use it in the following way:

shell> xmllint --xinclude --shell manual.xml / > setns x=http://docbook.org/ns/docbook/ > dirDOCUMENTversion=1.0URL=manual.xmlstandalone=truenamespace xml href=http://www.w3.org/XML/1998/namespace/ > cd /x:book/x:chapter[@xml:id='apis']chapter > dirELEMENT chapter  ATTRIBUTE id    TEXT      content=apis

The dir command shown above confirms that you have navigated to the specified node. From that node you can execute xpath commands using absolute or relative paths.

chapter > xpath count(/x:book/x:chapter[@xml:id='apis']/x:section)Object is a number : 15chapter > xpath count(/x:book/x:chapter[@xml:id='apis']/x:section/x:refentry)Object is a number : 135chapter > xpath count(/x:book/x:chapter[@xml:id='structs']/x:section/x:section)Object is a number : 18chapter > xpath count(//x:chapter[@xml:id='apis']/x:section/x:refentry)Object is a number : 135chapter > xpath count(//x:section/x:refentry)Object is a number : 140chapter > xpath count(x:section/x:refentry)Object is a number : 135

There are 15 sections in the apis chapter and these 15 sections have 135 refentries. Note the difference in output between the paths //x:section/x:refentry and x:section/x:refentry. The difference in output shows that only the latter is relative to the current node.

When your XML file uses IDs, an easier way to navigate is to use the id function:

 chapter > cd // > xpath id('apis')Object is a Node Set :Set contains 1 nodes:1  ELEMENT chapter    ATTRIBUTE id      TEXT        content=apis/ > cd id('apis')chapter > xpath count(x:section/x:refentry)Object is a number : 135chapter > cd // > xpath count(id('apis')/x:section/x:refentry)Object is a number : 135
<root>    <FIToFICstmrDrctDbt>            <GrpHdr>                <MsgId>A</MsgId>                <CreDtTm>2001-12-17T09:30:47</CreDtTm>                <NbOfTxs>0</NbOfTxs>                <TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt>                <IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt>                <SttlmInf>                    <SttlmMtd>CLRG</SttlmMtd>                    <ClrSys>                        <Prtry>xx</Prtry>                    </ClrSys>                </SttlmInf>                <InstgAgt>                    <FinInstnId>                        <BIC>AAAAAAAAAAA</BIC>                    </FinInstnId>                </InstgAgt>            </GrpHdr>    </FIToFICstmrDrctDbt></root>
$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")$ echo $dt1967-08-13


0 0