Xml Schema

来源：互联网发布：云计算公司编辑：程序博客网时间：2024/04/24 11:18

Standards

"DTD" was the first formalized standard, but is rarely used anymore.
"XDR" was an early attempt by Microsoft to provide a more comprehensive standard than DTD. This standard has pretty much been abandoned now in favor of XSD.
"XSD" is currently the de facto standard for describing XML documents. There are 2 versions in use 1.0 and 1.1, which are on the whole the same (you have to dig quite deep before you notice the difference). An XSD schema is itself an XML document, there is even an a XSD schema to describe the XSD standard.
There are also a number of other standards but their take up has been patchy at best.

<xs:element/>

Sample:

<xs:element name="Customer_order" type="xs:integer" minOccurs ="0" maxOccurs="unbounded" default="unknown"|fixed=" UK"/>

Complex type

<xs:element name="Customer">        <xs:complexType>            <xs:sequence>                <xs:element name="Dob" type="xs:date" />                <xs:element name="Address" type="xs:string" />            </xs:sequence>        </xs:complexType></xs:element>

XML sample

<Customer>    <Dob> 2000-01-12T12:13:14Z </Dob>    <Address> 34 thingy street, someplace, sometown, w1w8uu </Address></Customer>

There are 3 types of compositors <xs:sequence>, <xs:choice> and <xs:all>. These compositors allow us to determine how the child elements within them appear within the XML document.

CompositorDescriptionSequenceThe child elements in the XML document MUST appear in the order they are declared in the XSD schema.ChoiceOnly one of the child elements described in the XSD schema can appear in the XML document.AllThe child elements described in the XSD schema can appear in the XML document in any order.

Notes

The compositors <xs:sequence> and <xs:choice> can be nested inside other compositors, and be given their ownminOccurs and maxOccurs properties. This allows for quite complex combinations to be formed.

  <xs:element name="Customer">    <xs:complexType>      <xs:sequence>        <xs:element name="Dob" type="xs:date" />        <xs:element name="Address">          <xs:complexType>            <xs:sequence>              <xs:element name="Line1" type="xs:string" />              <xs:element name="Line2" type="xs:string" />            </xs:sequence>          </xs:complexType>        </xs:element>      </xs:sequence>    </xs:complexType>  </xs:element>

Re-use

<xs:complexType name="AddressType">    <xs:sequence>        <xs:element name="Line1" type="xs:string"/>        <xs:element name="Line2" type="xs:string"/>    </xs:sequence></xs:complexType>

We have now defined a <xs:complexType name="AddressType">that describes our representation of an Address, so let use it.

Remember when we started looking at elements and we said you could define your own type instead of using one of the standard

ones(xs:string,xs:integer), well that's exactly what were doing now.

<xs:element name="Customer">    <xs:complexType>        <xs:sequence>            <xs:element name="Dob" type="xs:date"/>            <xs:element name="Address" type="AddressType"/>        </xs:sequence>    </xs:complexType></xs:element>
Attributes
An attribute provides extra information within an element. Attributes are defined within an XSD as follows, having name and type properties.
<xs:attribute name="x" type="y" use="optional|required" default="unknown"|fixed=" UK"/>
Mixed Element Content
Extending an Existing ComplexType
It is possible to take an existing <xs:complexType> and extend it. Lets see how this may be useful with an example.
Looking at the AddressType that we defined earlier (in part 1), let's assume our company has now gone international and we need to capture country specific addresses. In this case we need specific information for UK addresses (County and Postcode), and for US addresses (State and ZipCode).
So we can take our existing definition of address and extend it as follows:
 Collapse
<xs:complexType name="UKAddressType">    <xs:complexContent>        <xs:extension base="AddressType">             <xs:sequence>                <xs:element name="County" type="xs:string"/>                <xs:element name="Postcode" type="xs:string"/>            </xs:sequence>        </xs:extension>    </xs:complexContent></xs:complexType><xs:complexType name="USAddressType">    <xs:complexContent>        <xs:extension base="AddressType">             <xs:sequence>                <xs:element name="State" type="xs:string"/>                <xs:element name="Zipcode" type="xs:string"/>            </xs:sequence>        </xs:extension>    </xs:complexContent></xs:complexType> 
This is clearer when viewed graphically. But basically it is saying - we are defining a new <xs:complexType> called "USAddressType", this extends the existing type "AddressType", and adds to it a sequence containing the elements "State", and "Zipcode".
There are 2 new things here the <xs:extension> element and the <xs:complexContent> element; we'll get to these shortly.
Extending Simple Types
There are 3 ways in which a simpleType can be extended; Restriction, List or Union. The most common is Restriction, but we will cover the other 2 as well.
Restriction
Restriction is a way to constrain an existing type definition. We can apply a restriction to the built in data types xs:string, xs:integer,xs:date, etc. or ones we create ourselves.
Here we are defining a restriction the existing type "string", and applying a regular expression to it to limit the values it can take.
 Collapse
<xs:simpleType name="LetterType">    <xs:restriction base="xs:string">        <xs:pattern value="[a-zA-Z]"/>    </xs:restriction></xs:simpleType>
Shown graphically in Liquid XML Studioas follows
Let's go through this line by line:
A <simpleType> tag is used to define our new type, we must give the type a unique name - in this case "LetterType"
We are restricting an existing type - so the tag is <restriction> (you can also extend an existing t<code>ype - but more about this later). We are basing our new type on a string so type="xs:string"
We are applying a restriction in the form of a Regular expression, this is specified using the <pattern> element. The regular expression means the data must contain a single lower or upper case letter a through to z.
Closing tag for the restriction
Closing tag for the simple type
Restrictions may also be referred to as "Facets". For a complete list, see the XSD Standard, but to give you an idea, here are a few to get you started.
OverviewSyntaxSyntax explainedThis specifies the minimum and maximum length allowed. 
Must be 0 or greater.<xs:minLength value="3">
<xs:maxLength value="8">In this example the length must be between 3 and 8.The lower and upper range for numerical values. 
The value must be less than or equal to, greater than or equal to<xs:minInclusive value="0"><xs:maxInclusive value="10">The value must be between 0 and 10The lower and upper range for numerical values 
The value must be less than or greater than<xs:minExclusive value="0"><xs:maxExclusive value="10">The value must be between 1 and 9The exact number of characters allowed<xs:length value="30"><code>The length must not be more than 30Exact number of digits allowed<xs:totalDigits value="9">Can not have more than 9 digitsA list of values allowed<xs:enumeration value="Hippo"/>
<xs:enumeration value="Zebra"/>
<xs:enumeration value="Lion"/>The only permitted values are Hippo, Zebra or LionThe number of decimal places allowed (must be >= 0)<xs:fractionDigits value="2"/>The value has to be to 2 d.p.This defines how whitespace will be handled. 
Whitespace is line feeds, carriage returns, tabs, spaces, etc.<xs:whitespace value="preserve"/><xs:whitespace value="replace"/><xs:whitespace value= "collapse"/>Preserve - Keeps whitespaces
Replace - Replaces all whitespace with a space
Collape - Replaces whitespace characters with a space, then if there are multiple spaces together then they will be reduced to one space.Pattern determines what characters are allowed and in what order. 
These are regular expressions and there is a complete list at:
http://www.w3.org/TR/xmlschema-2/#regexs<xs:pattern value="[0-999]"/>[0-999] - 1 digit only between 0 and 999 
[0-99][0-99][0-99] - 3 digits all have to be between 0 and 99 
[a-z][0-10][A-Z] - 1 st digit has to be between a and z and 2nd digit has to be between 0 and 10 and the 3rd digit is between A and Z. These are case sensitive. 
[a-zA-Z] - 1 digit that can be either lower or uppercase A – Z 
[123] - 1 digit that has to be 1, 2 or 3 
([a-z])* - Zero or more occurrences of a to z 
([q][u])+ - Looking for a pair letters that satisfy the criteria, in this case a q followed by a u 
([a-z][0-999])+ - As above, looking for a pair where the 1st digit is lowercase and between a and z, and the 2nd digit is between 0 and 999, for example a1, c99, z999, f45 
[a-z0-9]{8} - Must be exactly 8 characters in a row and they must be lowercase a to z or number 0 to 9.It is important to note that not all facets are valid for all data types - for example, maxInclusive has no meaning when applied to a string. For the combinations of facets that are valid for a given data type refer to the XSD standard.
Union
A union is a mechanism for combining 2 or more different data types into one.
The following defines 2 simple types "SizeByNumberType" all the positive integers up to 21 (e.g. 10, 12, 14), and "SizeByStringNameType" the values small, medium and large.
 Collapse
<xs:simpleType name="SizeByNumberType">     <xs:restriction base="xs:positiveInteger">         <xs:maxInclusive value="21"/>     </xs:restriction> </xs:simpleType> <xs:simpleType name="SizeByStringNameType">     <xs:restriction base="xs:string">         <xs:enumeration value="small"/>         <xs:enumeration value="medium"/>         <xs:enumeration value="large"/>     </xs:restriction> </xs:simpleType>
We can then define a new type called "USClothingSizeType", we define this as a union of the types "SizeByNumberType" and "SizeByStringNameType" (although we can add any number of types, including the built in types - separated by whitespace).
 Collapse
<xs:simpleType name="USClothingSizeType">    <xs:union memberTypes="SizeByNumberType SizeByStringNameType" /></xs:simpleType>
This means the type can contain any of the values that the 2 members can take (e.g. 1, 2, 3, ...., 20, 21, small, medium, large). This new type can then be used in the same way as any other <xs:simpleType>
List
A list allows the value (in the XML document) to contain a number of valid values separated by whitespace.
A List is constructed in a similar way to a Union. The difference being that we can only specify a single type. This new type can contain a list of values that are defined by the itemType property. The values must be whitespace separated. So a valid value for this type would be "5 9 21".
<xs:simpleType name="SizesinStockType">    <xs:list itemType="SizeByNumberType" /></xs:simpleType>
So far we have seen how an element can contain data, other elements or attributes. Elements can also contain a combination of all of these. You can also mix elements and data. You can specify this in the XSD schema by setting the mixed property.
<xs:element name="MarkedUpDesc">    <xs:complexType mixed="true">        <xs:sequence>            <xs:element name="Bold" type="xs:string" />            <xs:element name="Italic" type="xs:string" />        </xs:sequence>    </xs:complexType></xs:element> 
A sample XML document could look like this.
<MarkedUpDesc>    This is an <Bold>Example</Bold> of <Italic>Mixed</Italic> Content,    Note there are elements mixed in with the elements data.</MarkedUpDesc>
Conventions
All Element and Attributes should use UCC camel case, eg (PostalAddress), avoid hyphens, spaces or other syntax.
Readability is more important than tag length. There is always a line to draw between document size and readability, wherever possible favor readability.
Try to avoid abbreviations and acronyms for element, attribute, and type names. Exceptions should be well known within your business area eg ID(Identifier), and POS (Point of Sale).
Postfix new types with the name 'Type'. eg AddressType, USAddressType.
Enumerations should use names not numbers, and the values should be UCC camel case.
Names should not include the name of the containing structure, eg CustomerName, should be Name within the sub element Customer.
Only produce complexTypes or simpleTypes for types that are likely to be re-used. If the structure only exists in one place, define it inline with ananonymous complexType.
Avoid the use of mixed content.
Only define root level elements if the element is capable of being the root element in an XML document.
Use consistent name space aliasesxml (defined in XML standard)
xmlns (defined in Namespaces in XML standard)
xs http://www.w3.org/2001/XMLSchema
xsi http://www.w3.org/2001/XMLSchema-instance
Try to think about versioning early on in your schema design. If it is important for a new versions of a schema to be backwardly compatible, then all additions to the schema should be optional. If it is important that existing products should be able to read newer versions of a given document, then consider adding any and anyAttribute entries to the end of your definitions. See Versioning recommendations.
Define a targetNamespace in your schema. This better identifies your schema and can make things easier to modularize and re-use.
Set elementFormDefault="qualified" in the schema element of your schema. This makes qualifying the namespaces in the resulting XML simpler (if not more verbose)
Namespaces
In this example, the schema is broken out into 4 files.
CommonTypes - this could contain all your basic types, AddressType, PriceType, PaymentMethodType etc.
CustomerTypes - this could contain all your definitions for your customers.
OrderTypes - this could contain all your definitions for orders.
Main - this would pull all the sub schemas together into a single schema, and define your main element/s.
<?xml version="1.0"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"        targetNamespace="myNamespace">        ...    </xs:schema>
The value of targetNamespace is just a unique identifier, typically companies use their URL followed by something to qualify it. In principle, the namespace has no meaning, but some companies have used the URL where the schema is stored as the targetNamespace and so some XML parsers will use this as a hint path for the schema, e.g.: targetNamespace="http://www.microsoft.com/CommonTypes.xsd", but the following would be just as validtargetNamespace="my-common-types".
Placing the targetNamespace attribute at the top of your XSD schema means that all entities defined in it are part of this namespace. So in our example above each of the 4 schema files could have a distinct targetNamespace valu
.
 xmlns:xs="http://www.w3.org/2001/XMLSchema" is used to import the namespace "http://www.w3.org/2001/XMLSchema"
that define the XSD, “xmlns” is used to give the namespace a alias, for this example, the "xs" the alias of namespace "http://www.w3.org/2001/XMLSchema".
"schema" is defined in namespace "http://www.w3.org/2001/XMLSchema".  so it is used as "<xs:schema ...>". targetNamespace
is used to define the namespace for current schema file.
CommonTypes.xsd
<?xml version="1.0" encoding="utf-16"?><!-- Created with Liquid XML Studio 0.9.8.0 (http://www.liquid-technologies.com) --><xs:schema targetNamespace="http://NamespaceTest.com/CommonTypes"           xmlns:xs="http://www.w3.org/2001/XMLSchema"           elementFormDefault="qualified">  <xs:complexType name="AddressType">    <xs:sequence>      <xs:element name="Line1" type="xs:string" />      <xs:element name="Line2" type="xs:string" />    </xs:sequence>  </xs:complexType>  <xs:simpleType name="PriceType">    <xs:restriction base="xs:decimal">      <xs:fractionDigits value="2" />    </xs:restriction>  </xs:simpleType>  <xs:simpleType name="PaymentMethodType">    <xs:restriction base="xs:string">      <xs:enumeration value="VISA" />      <xs:enumeration value="MasterCard" />      <xs:enumeration value="Cash" />      <xs:enumeration value="Amex" />    </xs:restriction>  </xs:simpleType></xs:schema>
<?xml version="1.0" encoding="utf-16"?>
<!-- Created with Liquid XML Studio 0.9.8.0 (http://www.liquid-technologies.com) --><xs:schema     xmlns:cmn="http://NamespaceTest.com/CommonTypes"               targetNamespace="http://NamespaceTest.com/CustomerTypes"               xmlns:xs="http://www.w3.org/2001/XMLSchema"               elementFormDefault="qualified">    <xs:import schemaLocation="CommonTypes.xsd"               namespace="http://NamespaceTest.com/CommonTypes"/>  <xs:complexType name="CustomerType">    <xs:sequence>      <xs:element name="Name" type="xs:string" />      <xs:element name="DeliveryAddress" type="cmn:AddressType" />      <xs:element name="BillingAddress" type="cmn:AddressType" />    </xs:sequence>  </xs:complexType></xs:schema>
<import> tag is used to import other xsd file, it is defined in namespace "http://www.w3.org/2001/XMLSchema", the 
schemaLocation attribute is used to indicate the location of the xsd file that will be imported.  namespace attribute is used to 
imported the namespace.
 <xs:element name="DeliveryAddress" type="cmn:AddressType" /> because AddressType etc. type is define in namespace "http://NamespaceTest.com/CommonTypes" so when we reference 
it we must add the alias "cmn" of namespace "http://NamespaceTest.com/CommonTypes" prefix for it.
OrderType.xsd
 Collapse
<?xml version="1.0" encoding="utf-16"?><!-- Created with Liquid XML Studio 0.9.8.0 (http://www.liquid-technologies.com) --><xs:schema     xmlns:cmn="http://NamespaceTest.com/CommonTypes"               targetNamespace="http://NamespaceTest.com/OrderTypes"                xmlns:xs="http://www.w3.org/2001/XMLSchema"               elementFormDefault="qualified">    <xs:import namespace="http://NamespaceTest.com/CommonTypes"               schemaLocation="CommonTypes.xsd" />  <xs:complexType name="OrderType">    <xs:sequence>      <xs:element maxOccurs="unbounded" name="Item">        <xs:complexType>          <xs:sequence>            <xs:element name="ProductName" type="xs:string" />            <xs:element name="Quantity" type="xs:int" />            <xs:element name="UnitPrice" type="cmn:PriceType" />          </xs:sequence>        </xs:complexType>      </xs:element>    </xs:sequence>  </xs:complexType></xs:schema>
Main.xsd
 Collapse
<?xml version="1.0" encoding="utf-16"?><!-- Created with Liquid XML Studio 0.9.8.0 (http://www.liquid-technologies.com) --><xs:schema     xmlns:ord="http://NamespaceTest.com/OrderTypes"               xmlns:pur="http://NamespaceTest.com/Purchase"               xmlns:cmn="http://NamespaceTest.com/CommonTypes"               xmlns:cust="http://NamespaceTest.com/CustomerTypes"               targetNamespace="http://NamespaceTest.com/Purchase"               xmlns:xs="http://www.w3.org/2001/XMLSchema"               elementFormDefault="qualified">    <xs:import schemaLocation="CommonTypes.xsd"                 namespace="http://NamespaceTest.com/CommonTypes" />    <xs:import schemaLocation="CustomerTypes.xsd"                 namespace="http://NamespaceTest.com/CustomerTypes" />    <xs:import schemaLocation="OrderTypes.xsd"                 namespace="http://NamespaceTest.com/OrderTypes" />    <xs:element name="Purchase">        <xs:complexType>            <xs:sequence>                <xs:element name="OrderDetail" type="ord:OrderType" />                <xs:element name="PaymentMethod" type="cmn:PaymentMethodType" />                <xs:element ref="pur:CustomerDetails"/>            </xs:sequence>        </xs:complexType>    </xs:element>    <xs:element name="CustomerDetails" type="cust:CustomerType"/></xs:schema>
Because the root element Purchase is in the namespace "http://NamespaceTest.com/Purchase", we must quantify the <Purchase> element within the resulting XML document. Lets look at an example:
<?xml version="1.0"?><!-- Created with Liquid XML Studio 0.9.8.0 (http://www.liquid-technologies.com) --><p:Purchase xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"             xsi:schemaLocation="http://NamespaceTest.com/Purchase Main.xsd"             xmlns:p="http://NamespaceTest.com/Purchase"            xmlns:o="http://NamespaceTest.com/OrderTypes"            xmlns:c="http://NamespaceTest.com/CustomerTypes"            xmlns:cmn="http://NamespaceTest.com/CommonTypes">    <p:OrderDetail>        <o:Item>            <o:ProductName>Widget</o:ProductName>            <o:Quantity>1</o:Quantity>            <o:UnitPrice>3.42</o:UnitPrice>        </o:Item>    </p:OrderDetail>    <p:PaymentMethod>VISA</p:PaymentMethod>    <p:CustomerDetails>        <c:Name>James</c:Name>        <c:DeliveryAddress>            <cmn:Line1>15 Some Road</cmn:Line1>            <cmn:Line2>SomeTown</cmn:Line2>        </c:DeliveryAddress>        <c:BillingAddress>            <cmn:Line1>15 Some Road</cmn:Line1>            <cmn:Line2>SomeTown</cmn:Line2>        </c:BillingAddress>    </p:CustomerDetails></p:Purchase> 
The first thing we see is the xsi:schemaLocation attribute in the root element. This tells the XML parser that the elements within the namespace "http://NamespaceTest.com/Purchase" can be found in the file "Main.xsd" (Note the namespace and URL are separated with whitespace - carriage return or space will do).
The next thing we do is define some aliases
"p" to mean the namespace "http://NamespaceTest.com/Purchase"
"c" to mean the namespace "http://NamespaceTest.com/CustomerTypes"
"o" to mean the namespace "http://NamespaceTest.com/OrderTypes"
"cmn" to mean the namespace "http://NamespaceTest.com/CommonTypes"
You have probably noticed that every element in the schema is qualified with one of these aliases.
The general rules for this are:
The alias must be the same as the target namespace in which the element is defined. It is important to note that this is where the element is defined - not where the complexType is defined. 
So the element <OrderDetail> is actually defined in main.xsd so that it is part of the namespace "http://NamespaceTest.com/Purchase", even though it uses the complexType "OrderType" which is defined in the OrderTypes.xsd. 
The contents of <OrderDetail> are defined within the complexType "OrderType", which is in the target namespace "http://NamespaceTest.com/OrderTypes", so the child element <Item> needs qualifiing within the namespace "http://NamespaceTest.com/OrderTypes".
The Effect of elementFormDefault
You may have noticed that each schema contained an attribute elementFormDefault="qualified". This has 2 possible values, qualified, andunqualified, the default is unqualified. This attribute changes the namespacing rules considerably. It is normally easier to set it to qualifed.
So to see the effects of this property, if we set it to be unqualified in all of our schemas, the resulting XML would look like this:
 Collapse
<?xml version="1.0"?><!-- Created with Liquid XML Studio 0.9.8.0 (http://www.liquid-technologies.com) --><p:Purchase    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"               xsi:schemaLocation="http://NamespaceTest.com/Purchase Main.xsd"               xmlns:p="http://NamespaceTest.com/Purchase">    <OrderDetail>        <Item>            <ProductName>Widget</ProductName>            <Quantity>1</Quantity>            <UnitPrice>3.42</UnitPrice>        </Item>    </OrderDetail>    <PaymentMethod>VISA</PaymentMethod>    <p:CustomerDetails>        <Name>James</Name>        <DeliveryAddress>            <Line1>15 Some Road</Line1>            <Line2>SomeTown</Line2>        </DeliveryAddress>        <BillingAddress>            <Line1>15 Some Road</Line1>            <Line2>SomeTown</Line2>        </BillingAddress>    </p:CustomerDetails></p:Purchase>
This is considerably different from the previous XML document.
These general rules now apply:
Only root elements defined within a schema need qualifying with a namespace.
All types that are defined inline do NOT need to be qualified.
The first element is Purchase, this is defined gloablly in the Main.xsd schema, and therefore needs qualifying within the schemas target namespace "http://NamespaceTest.com/Purchase".
The first child element is <OrderDetail> and is defined inline in Main.xsd->Purchase. So it does not need to be aliased.
The same is true for all the child elements, they are all defined inline, so they do not need qualifying with a namespace.
The final child element <CustomerDetails> is a little different. As you can see, we have defined this as a global element within the targetNamespace"http://NamespaceTest.com/Purchase". In the element "Purchase" we just reference it. Because we are using a reference to an element, we must take into account its namespace, thus we alias it <p:CustomerDetails>.
NamesapceReview
xsd----important point: xs
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="" elementFormDefault="qualified">
<xs:import scheamLocation="" namespace=""/>
</xs:scheam>
xml-----importatn point xsi
< <rootElementName> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:scheamLocation="">
...
</<rootElementName>>