XML快速入门的基本语法

来源:互联网 发布:lua 性能优化 编辑:程序博客网 时间:2024/05/12 16:11

element   ::=   EmptyElemTag| STagcontentETag

[2]   EmptyElemTag   ::=   '<' Name(SAttribute)*S?'/>'

[3]   STag   ::=   '<' Name(SAttribute)*S?'>'

[4]   ETag    ::=     '</'NameS?'>'

[5]   content   ::=   CharData?((element|Reference| CDSect| PI| Comment)CharData?)*

Name   ::=   NameStartChar(NameChar)*

【7】NameStartChar   ::=   ":" | [A-Z] |"_" | [a-z]

【8】NameChar   ::=   NameStartChar |"-" | "." | [0-9]

 

[9]   S   ::=   (#x20 | #x9 | #xD | #xA)+

 

[10]   Attribute   ::=   NameEqAttValue

E::= S? '='S?

[12]   AttValue   ::=   ' " ' ([^<&"] | Reference)*' " '| " ' " ([^<&'] | Reference)*" ' "

 

[13]   CharData ::=  [^<&]* - ([^<&]* ']]>'[^<&]*)

【】reeference   ::=   EntityRef | CharRef

[15]  EntityRef   ::=   '&'Name ';'

[16]   CharRef   ::=   '&#' [0-9]+ ';' | '&#x'[0-9a-fA-F]+ ';'

 

[17]   CDSect   ::=   CDStartCData CDEnd

[18]   CDStart   ::=   '<![CDATA['

[19]   CData   ::=   (Char* - (Char* ']]>' Char*))

[20]   CDEnd   ::=   ']]>'

 

[21]   PI   ::=   '<?'PITarget(S(Char*- (Char*'?>'Char*)))?'?>'

[22]   PITarget   ::=   Name- (('X' | 'x') ('M' | 'm') ('L' | 'l'))

 

[23]  Comment   ::=   '<!--'((Char - '-') | ('-' (Char - '-')))* '-->'

[24]  Char   ::=   #x9 | #xA | #xD | [#x20-#x7F]

 

 

 

Abstract

1. This documentcontains simplified XML Spec1.0 and dependences among those syntactic constructs.

2. Our work is toparse an XML document in a most paralleling way with FPGA. When parallelprocessing, flat and multiple rules can be checked in parallel.

3. In order torecognize characters matching the syntactic constructs properly, it is neededto first consider those dependences among the above rules. If a character or astring matching one syntactic construct can also make up another one, we definethese two syntactic constructs are dependent on each other. So we pick up allthese dependences or potential interrelations to improve parallel processing. 

 

Dependences

4 key tips:

1) Tag ‘<’

It can be a partof texts in CDSect,PIorComment.

It is the first character of STag,ETag,CDSect,PIorComment.

“element” also begins with tag ‘<’.

 

2) CDSect,PIorComment

These three kinds of elements are very special. Thetexts in these three can be consisted of any characters inChar except their correspondingclosing tags.

Illegal examples:

<![CDATA[ Hello ]]> world!]]>

<?xmlversion="1.0" ?>encoding="ISO-8859-1" ?>

<!--CDATA[ Hello --> world!-->

<!--CDATA[ Hello, world!--->

 

3) Tag ‘>’

It can be a partof texts in CDSect,PIorComment.

It is the last character of STag,ETag,CDSect,PIorComment.

It occurs anywhere of texts in any element except Name.

 

4) Tag ‘</’

It can be a part of texts in CDSect,PIorComment.

It is the firsttwo characters of ETag.

 

Others

 

XML documentsconsist of a lot of tags. The start tags ‘<’, ‘<?’, ‘<!--’, ‘</’, ‘<![CDATA[’must be in pairs with ‘>’, ‘?>’, ‘-->’, ‘>’, ‘]]>’.

 

5) Tag ‘<?’

It can be a part of texts in CDSect,PIorComment.

When it is thefirst two characters of PI,it occurs in pair with ‘?>’.

 

Tag ‘<!’

It can be a part of texts in CDSect,PIorComment.

It is the firsttwo characters of CDSectandComment.

 

6) Tag ‘<!--’

It can be a part of texts in CDSect,PIorComment.

When it is thefirst four characters of Comment,it occurs in pair with ‘-->’.

 

7) Tag ‘-->’

It can be a part of texts in CDSectorPI.

It can be the closing tag of Comment.

 

8) ‘<![CDATA[’

It can be a part of texts in CDSect,PIorComment.

When it is thefirst four characters of CDSect,it occurs in pair with ‘]]>’.

 

9) ‘]]>’

It can be a part of texts in PIorComment.

It can be the closing tag of CDSect.

It can not be inside CDSect.

It can not occur in CharData.

 

10) Tag ‘/>’

It can be a part of texts in CDSect,PIorComment.

It can be theclosing tag of EmptyElemTag.

 

CDSect,PIorComment

These three canalso be inside of each other.

11) CDSectand PI

When “ '<?' PITarget(S(Char*- (Char*'?>'Char*)))?'?>' “ is a part of CDSect,it is not Processing Instruction any more and it lose the PIfunction because the text in a CDSectwill not be parsed by a parse.

e.g. <![CDATA[ HelloWorld!<?xml version="1.0"?>]]>

 

CDStartCDataCDEnd”can be a part of PI,it is still CDSectinside PI.

e.g. <?xml version="1.0"<![CDATA[SSPKU]]> ?>

 

12) PIand Comment

  '<!--' ((Char- '-') | ('-' (Char- '-')))* '-->' “can be a part of PI,it is still Commentinside PI.

e.g. <?xml version="1.0" <!--encoding="ISO-8859-1-->"?>

 

But when “ '<?'PITarget(S(Char*- (Char*'?>'Char*)))?'?>' “ is a part of Comment,it is not Processing Instruction any more and it lose the PIfunction.

e.g. <!-- HelloWorld!<?xml version="1.0"?>-->

 

13) CDSectand Comment

WhenCDStartCDataCDEnd”is a part of Comment,it is not CDATA Section any more and it lose the CDSectfunction.

e.g. <!-- HelloWorld!<![CDATA[SSPKU]]>  -->

 

When   '<!--'((Char- '-') | ('-' (Char- '-')))* '-->'is a part of CDSect,it will not be parsed by a parse.

e.g. <![CDATA[ HelloWorld!<!--xml --> ]]>

 

 

14) Attribute

 ' " ' ([^<&"] | Reference)*' " '| " ' " ([^<&'] | Reference)*" ' " can be a part of AttValueitself.

e.g. CarNo.=”PKU99 CarNo.= ‘PKU99’”

 

 

  

< 可以出现在PI CDSect Comment内容中

或者作为STag ETag PI CDSect Comment首字符element开始

 

PI CDSect Comment的内容可以包含除了他们结束符以外的任何字符

< > , …….

 

>可以出现在PI CDSect Comment内容中

STag ETag PI CDSectComment结尾字符

除了name 其他任何元素的内容都可以出现

 

</可以出现在PI CDSect Comment内容中

Etag的前两个字符

 

 

 

0 0
原创粉丝点击