XML Schema: Understanding Structures

Developer: XML

by Rahul Srivastava

Learn how to use XML Schema constructs to declare, extend, and restrict the structure of your XML.

Other articles in this series:

Downloads for this article:

Oracle XML Developer's Kit

Oracle JDeveloper 10g (includes visual XML Schema editor)

A grammar defines the structure and semantics of a language, enforces constraints, and ensures validity of the instance (the actual data). Just as the English (or any other) language has an associated grammar that defines the rules about how a particular sentence can be composed—and at the same time, given an English sentence, can be used to check the validity of that sentence—a grammar for an XML instance document defines as well as ensures the validity of the structure and content of that document.

The W3C XML Schema definition (WXS) represents the Abstract Data Model of W3C XML Schema (WXS) in XML language. By defining an Abstract Data Model of the schema, the W3C Schema becomes agnostic about the language used to represent that model. XML representation is the formal representation specified by WXS, but you are free to represent the Abstract Data Model any way you want and use it for validation. For example, you can directly create an in-memory schema using any data structure that adheres to the Abstract Data Model. This encourages the vendors that develop W3C Schema validators to provide an API that you can use create an in-memory schema directly.

There are numerous grammars available for validating XML-instance documents. Some became obsolete immediately, while others-such as DTD, which is part of W3C XML 1.0 REC—have passed the test of time. Of the extant grammars, XML Schema is the most popular among XML developers because:

  • 1. It uses XML as the language to define the schema.
  • 2. It has more than 44 built-in datatypes, and each of these datatypes can be further refined for fine-grained validation of the character data in XML.
  • 3. The cardinality of the elements can be defined in a fine-grained manner using the minOccurs and maxOccurs attributes.
  • 4. It supports modularity and re-usability by extension, restriction, import, include, and redefine constructs.
  • 5. It supports identity constraint to ensure uniqueness of a value in an XML document, in the specified set.
  • 6. It has an Abstract Data Model and therefore is not bound to the XML representation only.

Here's an example of how you would validate an XML instance against an externally specified schema:


 
 
 import java.io.FileInputStream;
 import oracle.xml.parser.v2.XMLError;
 import oracle.xml.parser.schema.XML Schema;
 import oracle.xml.parser.schema.XSDBuilder;
 import oracle.xml.schemavalidator.XSDValidator;
 ...
 //load XML Schema
 
 XSDBuilder schemaBuilder = new 
 
 XSDBuilder();
 XML Schema schema = schemaBuilder.build(new FileInputStream("myschema.xsd"), null);
 
 //set the loaded XML Schema to the XSDValidator
 
 XSDValidator validator = new XSDValidator();
 validator.
 
 setSchema(schema);
 
 //validate the XML-instance against the supplied XML Schema.
 validator.
 
 validate(new FileInputStream("data.xml"));
 
 //check for errors
 XMLError error = validator.getError();
 if (error.getNumMessages() > 0) {
 System.out.println("XML-instance is invalid.");
 error.flushErrors();
 }
 else {
 System.out.println("XML-instance is valid.");
 }
 
 
 

Of course, XML Schema has limitations as well:

  • 1. It doesn't support rule-based validation. An example of rule-based validation would be: If the value of attribute "score" is greater than 80, then the element "distinction" must exist in the XML instance, otherwise not.
  • 2. The Unique Particle Attribution (UPA) constraint too strictly defines a grammar for all types of XML documents. (See the "UPA Constraint" section for details.)

In my previous articles, I discussed the concept of namespaces, which is essential to understand before you dive into XML Schema; and the datatypes supported in XMLSchema, as well as the simpleType construct used for further constraining these datatypes and using them.

In this article, I will explain the schema constructs used to declare, extend, and restrict the structure of XML. You will also learn about the model groups, particles, and other constraints provided by XML Schema.

Oracle XML Developer's Kit (XDK) includes a W3C-complaint XML Schema processor, as well as several utilities, such as for creating schema datatypes and restricting them programatically using the APIs, parsing and validating the XML Schema structure itself, traversing the Abstract Data Model of an XMLSchema, and so on. Check out the oracle.xml.parser.schema and oracle.xml.schemavalidator packages.

The Content and Model

Element Content

In an XML document, the content of an element is the content enclosed between its <opening> and </closing> tag. An element can have only four types of content: TextOnly, ElementOnly, Mixed, and Empty. Attributes declared on an element are not considered to contribute to the content of an element. They are just part of the element on which they are declared, and contribute to the structure of XML.

TextOnly

The content of an element is said to be TextOnly, when that element has only character data (or simply called as text data) between its <opening> and </closing> tag, or in other words, when that element has no child elements. For example:


 
 <TextOnly>some character data</TextOnly>
 

ElementOnly

The content of an element is said to be ElementOnly, when that element has only child elements between its <opening> and </closing> tag, optionally separated by whitespaces (space, tab, newline, carriage return). These whitespaces are called ignorable whitespaces, and are often used for indenting the XML. Therefore the following:

ElementOnly content without whitespaces


 
 
 <ElementOnly><child1 .../><child2 .../></ElementOnly> 
 
 

is the same as:

ElementOnly content with whitespaces


 
 <ElementOnly>
 <child1 .../>
 <child2 .../>
 </ElementOnly>
 
 

Mixed

The content of an element is said to be Mixed when that element has character data interspersed with child elements between its <opening> and </closing> tag. (In other words, its content has both character data as well as child elements.) When the content is mixed, then so-called ignorable whitespaces are not ignorable anymore. Therefore, the following:


 
 <Mixed><child1.../>some character data<child1.../></Mixed> 
 

is different than:


 
 
 <Mixed>
 <child1 .../>
 some character data
 <child1 .../>
 </Mixed>
 
 

Empty

The content of an element is said to be Empty when that element has absolutely nothing between the <opening> and </closing> tag, not even whitespaces. For example:


 
 
 <Empty></Empty>
 
 

Another way, for ease of use and clarity, to represent an element, which has an empty content is to use a single empty tag, as follows:


 
 
 <Empty /> 
 

Content Models

In an XML grammar, one declares the content model of an element to specify the type of element content in the corresponding XML instance document. Therefore, a content model is the definition of the element content.

The figure below illustrates how to declare the content models in an XML Schema. Trace the paths in this figure starting from <schema>, to understand how to declare the content model for the four types of element content, with and without attribute declarations. Let's examine each one briefly.

Figure 1. Declare the content models in an XML Schema

TextOnly

In the illustration above, trace the path until simpleType-1 to declare an element with TextOnly content model:


 
 
 <xsd:
 
 element name="TextOnly">
 <xsd:
 
 simpleType>
 <xsd:restriction 
 
 base="xsd:string" />
 </xsd:simpleType> 
 </xsd:element>
 
 OR equivalent
 
 <xsd:element name="TextOnly" 
 
 type="xsd:string" />
 
 
 

The above schema declares an element named "TextOnly" (can be anything) with the TextOnly content model, whose content must be a string in the corresponding XML instance. When the content model of an element is TextOnly there is always a simpleType associated with it that indicates the datatype of that element. For example, in this case the datatype for element TextOnly is string. See the corresponding XML instance for this schema in the previous section.

As mentioned previously, attributes don't contribute to the element content; therefore, another example of an XML instance with a TextOnlycontent, and with attributes, is:


 
 
 <TextOnly 
 
 att="val">some character data</TextOnly>
 

Now trace the path in Figure 1 until simpleContent-3 to declare an element with TextOnly content model, and with attributes:


 
 <xsd:
 
 element name="TextOnly">
 <xsd:
 
 complexType>
 <xsd:
 
 simpleContent>
 <xsd:extension base="
 
 xsd:string">
 <xsd:
 
 attribute name="att" type="xsd:string" use="required" />
 </xsd:extension>
 </xsd:simpleContent>
 </xsd:complexType> 
 </xsd:element>
 

The above schema declares an element named "TextOnly" with TextOnly content model whose content must be a string and must have an attribute named "attr" in the corresponding XML instance.

ElementOnly

Trace the path in Figure 1 until either one of sequence-5, choice-6 , or all-7 to declare an element with ElementOnly content model:


 
 
 <xsd: 
 
 element name="ElementOnly"> 
 <xsd: 
 complexType > 
 <xsd: 
 sequence >
 <!-- could have used choice or all instead —> 
 <xsd:element name="child1" type="xsd:string" /> 
 <xsd:element name="child2" type="xsd:string" /> 
 </xsd:sequence> 
 </xsd:complexType> 
 </xsd:element> 
 
 

The above schema declares an element named "ElementOnly" with ElementOnly content model. The element "ElementOnly" must have the child elements "child1" and "child2" in the corresponding XML instance document. See the corresponding XML instance for this schema in the previous section.

Another XML instance with ElementOnly element content and with attributes looks like:


 
 
 <ElementOnly 
 
 att="val">
 <child1 .../>
 <child2 .../>
 </ElementOnly>
 
 

Mixed

Trace the path in Figure 1 until either one of sequence-5, choice-6 , or all-7 to declare an element with Mixed content model—which is identical to declaring ElementOnly content model—but this time set the mixed attribute on the complexType to true, as follows:


 
 <xsd:element name="Mixed">
 <xsd:
 
 complexType 
 
 mixed="true">
 <xsd:sequence>
 <xsd:element name="child1" type="xsd:string" />
 <xsd:element name="child2" type="xsd:string" />
 </xsd:sequence>
 <xsd:attribute name="att" type="xsd:string" use="required" />
 </xsd:complexType> 
 </xsd:element> 
 

To declare an element with ElementOnly content model and with attributes, the path in Figure 1 is same as that of declaring ElementOnly content model. The attributes are then declared within the complexType as follows:


 
 
 <xsd:
 
 element name="ElementOnly">
 <xsd:
 
 complexType>
 <xsd:
 
 sequence>
 <xsd:element name="child1" type="xsd:string" />
 <xsd:element name="child2" type="xsd:string" />
 </xsd:sequence>
 <xsd:
 
 attribute name="att" type="xsd:string" use="required" />
 </xsd:complexType> 
 </xsd:element> 
 
 

The corresponding XML instance for the above schema looks like


 
 
 <Mixed att="val">
 <child1 .../>
 some character data
 <child1 .../>
 </Mixed>
 

Empty

Trace the path until complexType-2 to declare an element with Empty content model, with or without attributes:


 
 
 <xsd:element name="EmptyContentModels">
 <xsd:complexType>
 <xsd:sequence>
 
 <xsd:
 
 element name="Empty1">
 
 
 <xsd:complexType />
 </xsd:element>
 
 <xsd:
 
 element name="Empty2">
 
 
 <xsd:complexType>
 <xsd:
 
 attribute name="att" type="xsd:string" use="required" />
 
 
 </xsd:complexType>
 </xsd:element>
 
 </xsd:sequence>
 </xsd:complexType> 
 </xsd:element>
 
 

The corresponding XML instance for the above schema looks like


 
 
 <EmptyContentModels>
 <Empty1 />
 <Empty2 att="val" />
 </EmptyContentModels>
 
 

Model Groups

When the content model of an element is declared to be ElementOnly (or mixed), which means that the element has child elements, then you can specify the order and occurrence of the child elements in more detail using the model groups. A model group consists of particles ; a particle can be an element declaration or yet another model group. The model groups itself can have a cardinality, which can be refined using the minOccurs and maxOccurs attributes. These characteristics make model groups quite powerful.

The three model groups supported by XML Schema are:

  • Sequence -(a , b)* - means that the child elements declared within the sequence model group must occur in the corresponding XML-instance in the same order as defined in the schema. The cardinality of a sequence model group can range from 0 to unbounded. A sequence model group can futher contain a sequence or a choice model group recursively.
  • Choice - (a | b)* - means that from the set of child elements declared within the choice model group exactly one element must occur in the corresponding XML-instance. The cardinality of a choice model group can range from 0 to unbounded . A choice model group can futher contain a sequence or a choice model group recursively.
  • All - {a , b}? - means that the entire set of child elements declared within the all model group must occur in the corresponding XML-instance, but unlike sequence model group, the order is not important. The child elements can therefore occur in any order. The cardinality of an all model group can only be either 0 or 1. An all model group can only contain element declarations and not any other model group.

These model groups can either be declared in-line or as a global declaration (immediate child of <schema> construct with a name for re-usability). A global model group must be declared within the <group> construct, which you can later refer to by its name. But unlike the in-line model groups, the minOccurs/maxOccurs attributes cannot be declared on the globally declared model groups. When required, you can use the minOccurs/maxOccurs attributes when referencing the globally declared model group. For example:


 
 
 <xsd:group name="globalDecl">
 <xsd:sequence>
 <xsd:element name="child1" type="xsd:string" />
 <xsd:element name="child2" type="xsd:string" />
 </xsd:sequence>
 </xsd:group>
 

Subsequently, you can reference the globally declared model group using the group construct along with the minOccurs/maxOccurs attributes, if required, as follows:


 
 
 <xsd:
 
 group ref="globalDecl" maxOccurs="unbounded">
 

Here is a complex example for a much better understanding of model groups:


 
 ((a | b)* , c+)?
 
 <xsd:element name="complexModelGroup">
 <xsd:complexType>
 
 <xsd:sequence minOccurs="0" maxOccurs="1">
 <xsd:choice minOccurs="0" maxOccurs="unbounded">
 <xsd:element name="a" type="xsd:string" />
 <xsd:element name="b" type="xsd:string" />
 </xsd:choice>
 <xsd:element name="c" type="xsd:string" minOccurs="1" maxOccurs="unbounded">
 </xsd:sequence>
 
 </xsd:complexType> 
 </xsd:element> 
 

The complexType story

You now have enough information to write a simple schema for an XML document. But many advanced concepts in XML Schema remain to be addressed.

complexType is one of the other most powerful constructs in the XML Schema. Apart from allowing you to declare all four content models with or without attributes, you can derive a new complexType by inheriting an already declared complexType. Consequently, the derived complexType can either add more declarations to the ones inherited from the base complexType (using extension ) or can restrict the declarations from the base complexType (using restriction ).

A complexType can be extended or restricted using either simpleContent or complexContent . A complexType with simpleContent declares a TextOnly content model, with or without attributes. A complexType with complexContent can be used to declare the remaining three content models—ElementOnly, Mixed, or Empty—with or without attributes.

Extending a complexType

simpleContent

Figure 2. A complexType with simpleContent can only be extended to add attributes.

A complexType with simpleContent can extend either a simpleType or a complexType with simpleContent. As illustrated in Figure 2, in the derived complexType, then, the only thing you are allowed to do is add attributes. For example:


 
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://inheritance-ext-res"
 xmlns:tns="http://inheritance-ext-res"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 <xsd:complexType name="DerivedType1">
 <xsd:
 
 simpleContent>
 <xsd:
 
 extension base="xsd:string">
 <xsd:
 
 attribute name="att1" type="xsd:string" use="required" />
 </xsd:extension>
 </xsd:simpleContent>
 </xsd:complexType>
 
 <xsd:complexType name="DerivedType2">
 <xsd:
 
 simpleContent>
 <xsd:
 
 extension base="tns:DerivedType1">
 <xsd:
 
 attribute name="att2" type="xsd:string" use="required" />
 </xsd:extension>
 </xsd:simpleContent>
 </xsd:complexType>
 
 <xsd:element name="SCExtension">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="Derived1" type="tns:DerivedType1" />
 <xsd:element name="Derived2" type="tns:DerivedType2" />
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 
 </xsd:schema>
 
 

In the above schema:

  • 1. DerivedType1 extends from the built-in simpleType string , and adds an attribute attr1.
  • 2. DerivedType2 inherits attribute attr1 from the base DerivedType1, which is a "complexType with simpleContent," and adds an attributeattr2.

An XML instance corresponding to the above schema looks like:


 
 <SCExtension xmlns="http://inheritance-ext-res"
 xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
 xsi:schemaLocation="http://inheritance-ext-res CTSCExt.xsd">
 
 <Derived1 
 
 att1="val">abc</Derived1>
 <Derived2 
 
 att1="val" 
 
 att2="val">def</Derived2>
 
 </SCExtension>
 
 

complexContent

Figure 3. A complexType with complexContent can be used to extend the model group as well as add attributes.

A complexType with complexContent can extend either a complexType or a complexType with complexContent. As illustrated in Figure 3, in the derived complexType, then, you are allowed to add attributes, as well as extend the model group. For example:


 
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://inheritance-ext-res"
 xmlns:tns="http://inheritance-ext-res"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 <!— (child1)+ —>
 <xsd:complexType name="BaseType">
 <xsd:
 
 sequence maxOccurs="unbounded">
 <xsd:element name="child1" type="xsd:string" />
 </xsd:sequence>
 <xsd:attribute name="att1" type="xsd:string" use="required" />
 </xsd:complexType>
 
 <!- ((child1)+ , (child2 | child3)) ->
 <xsd:complexType name="DerivedType">
 <xsd:
 
 complexContent>
 <xsd:
 
 extension base="tns:BaseType">
 <xsd:
 
 choice>
 <xsd:element name="child2" type="xsd:string" />
 <xsd:element name="child3" type="xsd:string" />
 </xsd:choice>
 <xsd:
 
 attribute name="att2" type="xsd:string" use="required" />
 </xsd:extension>
 </xsd:complexContent>
 </xsd:complexType>
 
 <xsd:element name="CCExtension">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="Base" type="tns:BaseType" />
 <xsd:element name="Derived" type="tns:DerivedType" />
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 
 </xsd:schema>
 
 
 

In the above schema:

  • 1. The DerivedType inherits the sequence model group from the base complexType, and adds a choice model group, thereby, making the final content model of the derived complexType - ((child1)+ , (child2 | child3)).
  • 2. The DerivedType inherits attribute attr1 from the BaseType, and adds attribute attr2.

An XML instance corresponding to the above schema looks like:


 
 
 <CCExtension xmlns="http://inheritance-ext-res"
 xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
 xsi:schemaLocation="http://inheritance-ext-res CTCCExt.xsd">
 
 <Base att1="val">
 <child1>This is base</child1>
 <child1>This is base</child1>
 </Base>
 
 <Derived att1="val" 
 
 att2="val">
 <child1>This is inherited from base</child1>
 <child1>This is inherited from base</child1>
 <child1>This is inherited from base</child1>
 <
 
 child3>This is added in the derived</
 
 child3>
 </Derived>
 
 </CCExtension>
 

Restricting a complexType

simpleContent

Figure 4. A complexType with simpleContent can be used to restrict the datatype and attributes.

A complexType with simpleContent can only restrict a complexType with simpleContent. As illustrated in Figure 4, in the derived complexType, then, you can restrict the simpleType of the base, as well as restrict the type and use (optional, mantatory, etc.) of the attributes from the base. For example:


 
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://inheritance-ext-res"
 xmlns:tns="http://inheritance-ext-res"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 <xsd:complexType name="BaseType">
 <xsd:simpleContent>
 <xsd:extension base="xsd:string">
 <xsd:attribute name="att1" type="xsd:string" use=
 
 "optional" />
 <xsd:attribute name="att2" type="xsd:integer" use=
 
 "optional" />
 </xsd:extension>
 </xsd:simpleContent>
 </xsd:complexType>
 
 <xsd:complexType name="DerivedType">
 <xsd:
 
 simpleContent>
 <xsd:
 
 restriction base="tns:BaseType">
 <xsd:
 
 maxLength value="35" />
 <xsd:
 
 attribute name="att1" use=
 
 "prohibited" />
 
 <xsd:
 
 attribute name="att2" use=
 
 "required">
 <xsd:
 
 simpleType>
 <xsd:
 
 restriction base="xsd:integer">
 <xsd:totalDigits value="2" />
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:attribute>
 
 </xsd:restriction>
 </xsd:simpleContent>
 </xsd:complexType>
 
 <xsd:element name="SCRestriction">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="Base" type="tns:BaseType" />
 <xsd:element name="Derived" type="tns:DerivedType" />
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 
 </xsd:schema>
 
 

In the above schema:

  • 1. You restricted the simpleType content of the base (of type string) to a string of length 35 in the derived.
  • 2. You blocked the attribute att1 from being inherited from base.
  • 3. You restricted the type of the attribute att2 to an integer of 2 digits, and made it mandatory from optional.

An XML instance corresponding to the above schema looks like:


 
 
 <SCRestriction xmlns="http://inheritance-ext-res"
 xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
 xsi:schemaLocation="http://inheritance-ext-res CTSCRes.xsd">
 
 <Base att1="val">This is base type</Base>
 <Derived att2="12">This is restricted in the derived</Derived>
 
 </SCRestriction>
 

complexContent

Figure 5. A complexType with complexContent can be used to restrict the model group as well as the attributes.

A complexType with complexContent can either restrict a complexType or a complexType with complexContent. As illustrated in Figure 5, in the derived complexType, then, you must repeat the entire content model from the base and restrict them as desired, if required. You can restrict the attributes the same way as you would do while restricting a simpleContent. For example:


 
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://inheritance-ext-res"
 xmlns:tns="http://inheritance-ext-res"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 <xsd:complexType name="BaseType">
 <xsd:sequence>
 <xsd:element name="child1" type="xsd:string" maxOccurs="unbounded" />
 <xsd:element name="child2" type="xsd:string"/>
 </xsd:sequence>
 <xsd:attribute name="att1" type="xsd:string" use="optional" />
 </xsd:complexType>
 
 <xsd:complexType name="DerivedType">
 <xsd:
 
 complexContent>
 <xsd:
 
 restriction base="tns:BaseType">
 <xsd:
 
 sequence>
 <xsd:element name="child1" type="xsd:string" 
 
 maxOccurs="4" />
 
 <xsd:element name="child2">
 <xsd:simpleType>
 <xsd:
 
 restriction base="xsd:string">
 <xsd:
 
 maxLength value="35" />
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:element>
 
 </xsd:sequence>
 
 
 <xsd:attribute name="att1" type="xsd:string" use="prohibited" />
 </xsd:restriction>
 </xsd:complexContent>
 </xsd:complexType>
 
 <xsd:element name="CCRestriction">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="Base" type="tns:BaseType" />
 <xsd:element name="Derived" type="tns:DerivedType" />
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 
 </xsd:schema>
 

In the above schema:

  • 1. You restricted the cardinality of child1 in the DerivedType, inherited from the BaseType, from unbounded to 4.
  • 2. You restricted the type of child2 in the DerivedType, inherited from the BaseType to a string of length 35
  • 3. You prohibited the attribute att1 from being inherited from the BaseType.

An XML instance corresponding to the above schema looks like:


 
 <CCRestriction xmlns="http://inheritance-ext-res"
 xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
 xsi:schemaLocation="http://inheritance-ext-res CTCCRes.xsd">
 
 <Base att1="val">
 <child1>This is base type</child1>
 <child2>This is base type</child2>
 </Base>
 
 <Derived>
 <child1>This is restricted in the derived</child1>
 <child2>This is restricted in the derived</child2>
 </Derived>
 
 </CCRestriction>
 
 

Assembling Schemas

Imports, includes, and chameleon effects

Many Java projects involve multiple different classes and packages instead of a single, huge Java file because modularization makes the code easy to re-use, read, and maintain. Subsequently, you have to stick the necessary import into the classes before you can use them. Similarly, in XML Schema, you have to manage multiple different schemas from various different namespaces and you need to stick the necessary import in the schemas before you use them.

XML Schemas can be assembled using <import/> and <include/> schema constructs, and of course, the following should be the first statement in the schema before any other declarations:


 
 
 <schema>
 <import namespace="foo" schemaLocation="bar.xsd" />
 <include schemaLocation="baz.xsd" />
 ...
 </schema>
 

Usually <import /> is used when the schema being imported has a targetNamespace, while <include /> is used when the schema being included has no targetNamespace declared.

Let's look at an example involving two schemas - A and B— with A referring to items declared in B.

Case I When both the schemas have a targetNamespace and the targetNamespace of schema A (tnsA) is different from the targetNamespace of schema B (tnsB), then A must import B.


 
 <import namespace="tnsB" schemaLocation="B.xsd">
 

It is however an error for A to import B without specifying the namespace, as well as for A to include B.

Case II When both the schemas have a targetNamespace and the targetNamespace of schema A (tnsAB) is same as the targetNamespace of schema B (tnsAB), then A must include B.


 
 
 <include schemaLocation="B.xsd"> 
 

It is an error for A to import B.

Case III When both the schemas A and B don't have a targetNamespace. In this case, A must include B.


 
 
 <include schemaLocation="B.xsd" />
 
 

Case IV

When schema A has no targetNamespace, and schema B has a targetNamespace (tnsB), then, A must import B.


 
 
 <import namespace="tnsB" schemaLocation="B.xsd" /> 
 
 

It is an error for A to include B because B has a targetNamespace.

Case V When schema A has a targetNamespace (tnsA) and schema B has no targetNamespace, then...? Loudly please! A should include B. But what if I say that in this case, A should import B? Actually, in this case A can either import or include B, and both are legal, though the effects are different.

When A includes B, all the included items from B get the namespace of A. Such an include is known as a chameleon include.

When you don't want such a chameleon effect to take place, you must use an import without specifying the namespace. An import without the namespace attribute allows unqualified reference to components with no target namespace.


 
 
 <import schemaLocation="B.xsd">
 
 

Importing or including a schema multiple times is not an error, because the schema processors can detect such a scenario and not load an already loaded schema. Therefore, it is not an error if A.xsd imports B.xsd and C.xsd; and both B.xsd and C.xsd individually import A.xsd. Circular references are not errors either but are strongly discouraged.

By the way, a mere import like <import /> is legal as well. This approach simply allows unqualified reference to foreign components with no target namespace without giving any hints as to where to find them. It is up to the Schema processor to either throw an error or lookup for unknown items using some mechanism, and this behaviour may vary from one Schema processor to other. A mere <include /> is however illegal.

Rules of thumb:

  • 1. <include/> - is as good as saying that the <include/>d schema is defined in-line in the including schema.
  • 2. <import/> - is always used when <import/>ed schema has a targetNamespace, which is different than the targetNamespace of the importing schema.

Redefining Schemas

You may not always want to assemble schemas in their original forms. For example, you may want to modify the components being imported from the schema. In such cases, when we want to redefine a declaration without changing its name, we use the redefine component to do this, with the constraint that the schema which is to be redefined must either have (a) the same targetNamespace as the <redefine>ing schema document, or have (b) no targetNamespace at all, in which case the <redefine>d schema document is converted to the <redefine>ing schema document's targetNamespace.

For example:

actual.xsd


 
 
 actual.xsd
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://inheritance-ext-res"
 xmlns:tns="http://inheritance-ext-res"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 <xsd:complexType name="BaseType">
 <xsd:sequence>
 <xsd:element name="child1" type="xsd:string" />
 </xsd:sequence>
 <xsd:attribute name="att1" type="xsd:string" use="required" />
 </xsd:complexType>
 
 <xsd:complexType 
 
 name="DerivedType">
 <xsd:complexContent>
 <xsd:extension base="tns:BaseType">
 <xsd:choice>
 <xsd:element name="child2" type="xsd:string" />
 <xsd:element name="child3" type="xsd:string" />
 </xsd:choice>
 <xsd:attribute name="att2" type="xsd:string" use="required" />
 </xsd:extension>
 </xsd:complexContent>
 </xsd:complexType>
 
 </xsd:schema>
 

redefine.xsd


 
 
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://inheritance-ext-res"
 xmlns:tns="http://inheritance-ext-res"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 <xsd:
 
 redefine schemaLocation="
 
 actual.xsd">
 <xsd:complexType 
 
 name="DerivedType">
 <xsd:complexContent>
 <xsd:extension 
 
 base="tns:DerivedType">
 <xsd:sequence>
 <xsd:element name="child4" type="xsd:string" />
 </xsd:sequence>
 </xsd:extension>
 </xsd:complexContent>
 </xsd:complexType>
 </xsd:
 
 redefine>
 
 <xsd:element name="Redefine">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="Base" type="tns:BaseType" />
 <xsd:element name="Derived" type="tns:DerivedType" />
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 
 </xsd:schema>
 

In the above schema:

  • 1. You redefined the DerivedType complexType by adding one more element to the content model, without changing its name.
  • 2. By not redefining the BaseType in the redefine schema, it is inherited as is.

Note that the name of a type is not changed when redefining it. Therefore, redefined types use themselves as their base types.

In the above example, we redefine a complexType named DerivedType without changing its name. While redefining DerivedType, any reference to "DerivedType" (for example base="tns:DerivedType" ) is supposed to refer to the actual DerivedType. After the type is redefined, any reference to the DerivedType is supposed to refer to the redefined type.

An XML instance corresponding to the above-redefined schema looks like:


 
 <Redefine xmlns="http://inheritance-ext-res"
 xmlns:xsi="http://www.w3.org/2001/XML Schema-instance"
 xsi:schemaLocation="http://inheritance-ext-res redefine.xsd">
 
 <Base att1="val">
 <child1>This is base type</child1>
 </Base>
 
 <Derived att1="val" att2="val">
 <child1>This is inherited from the base as is</child1>
 <child2>This is added in the derived</child2>
 <child4>This is added when redefining</child4>
 </Derived>
 
 </Redefine>
 

Constraints

Identity constraint

XML Schema allows you to enforce uniqueness constraints on the content of elements and attributes, which guarantees that in the instance document the value of the specified elements or attributes are unique. When uniqueness is enforced, there must be an item whose value is to be checked for uniqueness—ISBN number, for example. When you have identified the item, then you must identify the set in which the value of those selected items should be checked for uniqueness (a set of books, for example).

XML Schema provides two constructs -unique and key -to enforce uniqueness constraints. Unique ensures that if the specified values are not null, then they must be unique in the defined set; key ensures that the specified values are never null and are unique in the defined set.

There is one more construct - keyref , which points to some key already defined. Keyref then ensures that the value of the specified item within keyref exists in the set of keys the keyref is pointing to.

All three constructs have the same syntax (all of them use a selector and fields) but different meanings. The selector is used to define the set in which uniqueness is to enforced, and field (multiple fields are used to define a composite item) is used to define the item whose value is to be checked for uniqueness. The value for both selector and field are XPath expressions. XPath expressions do not respect default namespaces; therefore, it becomes very essential to make the XPath expressions namespace aware by explicitly using prefixes bound to appropriate namespace, if the elements/attributes are in a namespace. For example:


 
 
 <?xml version="1.0" ?>
 <xsd:schema targetNamespace="http://identity-constraint"
 xmlns:tns="http://identity-constraint"
 xmlns:xsd="http://www.w3.org/2001/XML Schema"
 elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 
 
 <xsd:complexType name="BookType">
 <xsd:sequence>
 <xsd:element name="title" type="xsd:string" />
 <xsd:element name="half-isbn" type="xsd:string" />
 <xsd:element name="other-half-isbn" type="xsd:float" />
 </xsd:sequence>
 </xsd:complexType>
 
 <xsd:element name="Books">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="Book" type="tns:BookType" maxOccurs="unbounded" />
 </xsd:sequence>
 </xsd:complexType>
 
 <xsd:
 
 key name="isbn">
 <xsd:
 
 selector xpath=".//
 
 tns:Book" />
 <xsd:
 
 field xpath="
 
 tns:half-isbn" />
 <xsd:
 
 field xpath="
 
 tns:other-half-isbn" />
 </xsd:key>
 
 </xsd:element>
 
 </xsd:schema>
 

In the above schema, we declared a key named "isbn" that says, "The composite value (half-isbn + other-half-isbn) specified by field must be not null and unique in the set of books, as specified by the selector."

Unique Particle Attribution (UPA) Constraint

The UPA constraint ensures that the content model of every element be specified in a way such that while validating XML instance there is no ambiguity and the correct element declarations can be determined deterministically for validation. For example, the following schema violates the UPA constraint:


 
 <xsd:element name="upa">
 <xsd:complexType> 
 <xsd:sequence>
 <xsd:element name="a" minOccurs="0"/>
 <xsd:element name="b" minOccurs="0"/>
 <xsd:element name="a" minOccurs="0"/>
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 

...because in the corresponding XML-instance for the above schema:


 
 
 <upa> 
 <a/>
 </upa>
 

It is not deterministic that the element "a" in the XML instance corresponds to which element declaration in the schema—the element declaration for "a", which is before the element declaration for "b"; or the element declaration for "a", which is after the element declaration for "b"? This restriction limits you to write an XMLSchema for the type of XML instance you just saw. Anyway, in this case, if you just set the minOccurs of element "b" to anything greater than 0, then the UPA is not violated.

The following, then, is a valid schema:


 
 
 <xsd:element name="upa">
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="a" minOccurs="0"/>
 <xsd:element name="b" 
 
 minOccurs="1"/>
 <xsd:element name="a" minOccurs="0"/>
 </xsd:sequence>
 </xsd:complexType>
 </xsd:element>
 

...because in the corresponding XML-instance for the above schema:


 
 
 <upa> 
 <a/> 
 <b/> 
 </upa>
 

It is quite clear that the element "a" in the XML instance is actually an instance of the element declaration for "a", which is before the element declaration for "b" in the schema.

Conclusion

Now that you have completed this series, you should understand:

  • 1. The concept of namespaces in XML and XML Schema
  • 2. The scalar datatypes supported in XML Schema, and how to further restrict them using simpleType
  • 3. The element content, content model, model groups, particles, extending and restricting a complexType, assembling schemas, identity constraint, and UPA, which allow you to define and constrain the structure of XML.

You should have a pretty good grasp of XML Schema by now.

Rahul Srivastava ( rahuls@apache.org ) is a senior member of Oracle Application Server development team at Oracle and is presently working in the EAI space. He has contributed in the development of the Apache open-source Xerces2-J W3C complaint validating XML Parser primarily in the area of W3C XML Schema. Rahul was also a contributor to JAXP and JSR-173 when working with Sun Microsystems as part of the Web services team.