GXMLBeans 2.0 - A Java Developer's Perspective

Abstract

With the emergence of service-oriented architecture, most of you have had to work with XML in your applications. In doing so, you may have noticed that various models are available to parse and handle XML, both open source and proprietary. Each of these models has its advantages and disadvantages. Choosing the wrong model for your business needs can result in wasted development time and resources. Apache XMLBeans is a valuable tool that provides an easy way to work with XML from within Java. In this article, we introduce XMLBeans and some of the features available in XMLBeans 2.0.

Introduction to XMLBeans

A W3C XML Schema is an XML document that defines a set of rules other XML documents must conform to in order to be valid. W3C XML Schema has several advantages over earlier XML schema languages, such as document type definition (DTD) or simple object XML (SOX), and provides a rich collection of features you can use in different ways.

XMLBeans is a 100-percent schema-compliant XML-Java binding tool you can use to access the full power of XML in a Java-friendly way. The XMLBean solution is unique because it provides a dual view of the XML data. XMLBeans maintain the original XML document with no change in information and structure, and also provide a Java-based view of the XML data.

We will now demonstrate some of features in XMLBeans 2.0 by showing several code samples. In each sample we will provide the schema and some Java code that manipulates the XMLBean representation of the schema. The schema and Java samples will also be available for download.

Let's consider the following schema snippet:

<xs:element name="order">

   <xs:complexType>

      <xs:sequence>

       <xs:element name="orderNo" type="xs:string"/>

       <xs:element name="item" nillable="true"

          maxOccurs="unbounded" type="tns:itemType"/>

      <xs:element name="address" type="tns:addressType"/>

       <xs:element name="quantity" type="tns:quantityType"/>

    </xs:sequence>

   </xs:complexType>

 </xs:element>

To generate XMLBeans classes, the schema needs to be compiled. This can be easily done using the scomp utility, which will generate interfaces for all simple and complex types. The package name for all classes and interfaces is derived from the targetNamespace value specified in the schema. For a detailed look, read Configuring XMLBeans by Hetal Shah (Dev2Dev, November 2004).

Let's now look at how to generate an instance document, check the validity of the document against the schema, and save the instance to the file system.

The generated OrderDocument interface listed below is an example of one of the special "document" types created for any global elements or types by XMLBeans.

AddressType and ItemType are interfaces created for the global complex types addressType and sizeType:

OrderDocument orderDoc = OrderDocument.Factory.newInstance();

  Order order = orderDoc.addNewOrder();

  order.setOrderNo("ORD1234");

 order.setQuantity(4);

 

 AddressType aType = order.addNewAddress();

 aType.setCity("Kirkland");

 

 ItemType iType = order.addNewItem();

 iType.setId("ITEM003");

 

 boolean isValid = orderDoc.validate(xopt);

 

 orderDoc.save(new File("sample.xml"),xopt);

Running this sample results in the construction of an instance document that will be validated and saved to the local file system under the name "sample. xml." The program will also display the contents of this instance document and the results from the validation test to the command prompt or Unix shell:

<sam:order xmlns:sam="http://temp.openuri.org/Sample">

   <sam:orderNo>ORD1234</sam:orderNo>

   <sam:item>

     <sam:id>ITEM003</sam:id>

     <sam:description>Latest Item</sam:description>

     <sam:size>Large</sam:size>

   </sam:item>

   <sam:address>

     <sam:Name>BEA Systems, Inc</sam:Name>

     <sam:Street>10230 NE Points Drive, Ste 300</sam:Street>

     <sam:City>Kirkland</sam:City>

     <sam:Zip>98033</sam:Zip>

     <sam:State>WA</sam:State>

     <sam:Country>USA</sam:Country>

   </sam:address>

   <sam:quantity>4</sam:quantity>

 </sam:order>

This is a valid instance document. When a schema is compiled, the API generated from the schema is integrated with the XMLBeans type system that represents the underlying XML schema. Access to schema-related information can be obtained using the schema type system API.

In the next sample, we show how to programmatically access the various enumeration values for a certain schema type using the getEnumerationValues() method. The schema type we are using is sizeType, which is an enumeration with three possible values. The schema snippet is listed below:

<xs:simpleType name="sizeType">

    <xs:restriction base="xs:token">

      <xs:enumeration value="Small"/>

      <xs:enumeration value="Medium"/>

      <xs:enumeration value="Large"/>

    </xs:restriction>

  </xs:simpleType>

SizeType is the SchemaType class containing information regarding the simpleType schema type:

SchemaType schType = null;

XmlAnySimpleType [] xmlarray = null;

SizeType sType = SizeType.Factory.newInstance();

schType = sType.schemaType();

xmlarray = schType.getEnumerationValues();

Running this code sample (EnumerationSample.java) will result in the enumeration values being programmatically obtained and redirected to System.out:

Enumeration values for ItemType :

 Small

 Medium

 Large

XmlCursors are an interesting feature in XMLBeans; they provide an intuitive way to manipulate or navigate an XML instance document. XmlCursors also give you a way to execute XQuery expressions. Once you load an XML document, you can create a cursor to represent a specific place in the XML. Because you can use a cursor with or without a schema corresponding to the XML, cursors are an ideal way to handle XML.

The next sample demonstrates the use of a cursor to manipulate an XMLBean instance. This sample parses the sample.xml created in the first sample. Once the file is in memory, the XmlCursor API is used to navigate to the quantity element and change the value to 104:

orderDoc = OrderDocument.Factory.parse(new File("sample.xml"));

XmlCursor xcursor = orderDoc.newCursor();

xcursor.toFirstChild();

xcursor.toChild(3);

xcursor.toEndToken();

xcursor.toPrevChar(1);

xcursor.insertChars("10");

xcursor.dispose();

Running this sample generates the following output that displays the reason why the modified XMLBean document isn't valid:

Message: decimal

value (104) is greater than maxInclusive facet (5) for

 quantityType in namespace http://temp.openuri.org/Sample

Location of invalid XML:

<xml-fragment xmlns:sam="http://temp.openuri.org/Sample"/>

Now that we've taken a brief look at XMLBeans, let's look at what's new in version 2.0.

New Features in XMLBeans 2.0

Often, it can be easier to get a feel for new features in a product by seeing them in action. We'd like to relate some of the new features of XMLBeans to you by talking about one of our own projects that makes use of some of these great features. As you may already know, because XMLBeans is an Apache project, it tracks bugs, features, and other issues using Atlassian's Jira Issue tracking and project management application. BEA has an investment in XMLBeans as well as a standard of shipping high-quality software. This means BEA has an interest in the quality of projects like XMLBeans. Since XMLBeans is an open-source project and uses Apache's common tools like Jira, the issue becomes one of how BEA can track quality metrics of XMLBeans.

The project to uncover some of the new features in XMLBeans 2.0 was a response to the question: How can we easily gather quality metrics from Jira?

The following screen shot shows the main project page for XMLBeans. If you look on the right-hand side, under the Project Summary section, you have options for seeing issues related to some of the quality metrics we are concerned about.

Figure 1: XMLBeans Jira Project Page (click the image for a full-size screen shot)

What is nice about Jira is it provides different views of the issue data. In the following picture, look at the heading titled Current View. In the screenshot, the Browser view is currently selected, but other options include a print view, an XML view, and even an Excel spreadsheet view:

Figure 2: XMLBeans Jira Issue Navigator (click the image for a full-size screen shot)

After we had become familiar with Jira and how XMLBeans tracks quality metrics, we had several ways to gather quality metrics. Our options included screen scraping HTML, parsing a spreadsheet, and getting XML from a URL. We decided it made the most sense to use the XML view from the URL provided by clicking the XML link from the Issue Navigator page. The contents of the URL looks something like the XML document below:

<?xml version="1.0" encoding="utf-8" ?>
<!--  RSS generated by JIRA 98 at Sun Dec 04 18:08:34 CET 2005
-->
<rss version="0.92">
  <channel>
    <title>ASF JIRA</title>
    <link>http://issues.apache.org/jira</link>
    <description>This file is an XML representation of some
      issues</description>
    <language>en</language>
    <item>
      <title>[XMLBEANS-232] Fast Xml Infoset</title>
      <link>http://issues.apache.org/jira/browse/x</link>
      <description>
      <!-- left out for brevity -->
      </description>
      <environment><![CDATA[]]></environment>
      <key id="12326193">XMLBEANS-232</key>
      <summary>Fast Xml Infoset</summary>
      <type id="4">Improvement</type>
      <priority id="3">Major</priority>
      <status id="1">Open</status>
      <resolution>Unresolved</resolution>
      <assignee>Unassigned</assignee>
      <reporter username="rrusin">Rafal
      Rusin</reporter>
      <created>Wed, 30 Nov 2005 13:29:44 +0100
      (CET)</created>
      <updated>Sat, 3 Dec 2005 18:15:10 +0100
      (CET)</updated>
      <version>unspecified</version>
      <fixVersion>unspecified</fixVersion>
      <component>XmlObject</component>
      <due></due>
      <votes>0</votes>
      <comments>
        <comment author="dandiep" created="Sat, 3 Dec 2005
      18:15:10 +0100 (CET)" level="">
          <!-- ... -->
        </comment>
      </comments>
      <customfields>
      </customfields>
      </item>
    <item>
      <!-- left out for brevity -->
    </item>
  </channel>
</rss>

If we look at the snippet from the XML feed above, we see it's defined as an RSS feed. The first step we took was to find an XML Schema for RSS version 0.92 so we could compile the schema and use XMLBeans to parse the URL using XMLBeans' simple JavaBeans-like API. We never found an official schema, but we did find a specification and began creating a schema from there. Along the way, we found that the schema we created for the specification did not match the RSS feed we got from Jira. What were we to do? Our only real option was to create a schema just for this RSS feed, but that was time consuming and error prone. After doing a little more investigation, we stumbled upon the new inst2xsd feature.

Schema to Instance to Schema

The inst2xsd tool is available as a command-line utility, but you can also use the APIs programmatically. Its purpose is to take an XML instance and create a valid set of schemas. The tool is also very configurable and provides options for specifying which design pattern to use (including Russian Doll, Salami Slice, Venetian Blind; look at this set of Schema Design Guidelines for more information). The tool also has the ability to map enumerations to repeated values and create types based on the lowest common denominator of data types.

As an example of creating lowest common denominator types, let's use the value, lcd:val. The text can be represented by several built-in XML Schema datatypes such as several string derived types ( xsd:string, xsd:normalizedString, xsd:token, and so on) as well as the QName type. In this case, the way the inst2xsd feature determines the type is by looking for a namespace declaration with a prefix of lcd. If the prefix is found, the type will be the QName rather than one of the possible string-based types.

Let's take a look at what the results were for the RSS feed we received from Jira. If we had saved the feed to an instance titled jiraRssFeed.xml and placed the XMLBEANS_HOME\bin on our path, our workflow may have looked like the following:

/home/user>inst2xsd
Generates XMLSchema from instance xml documents.
Usage: inst2xsd [opts] [instance.xml]*
Options include:
    -design [rd|ss|vb] - XMLSchema design type
        rd  - Russian Doll Design - local elements and local types
        ss  - Salami Slice Design - global elements and local
              types
        vb  - Venetian Blind Design (default) - local elements and
              global complex types
    -simple-content-types [smart|string] - Simple content types
                       detection (leaf text). Smart is the default
    -enumerations [never|NUMBER] - Use enumerations. Default 
                                   value is 10.
    -outDir [dir] - Directory for output files. Default is '.'
    -outPrefix [file_name_prefix] - Prefix for output file names.
                                    Default is 'schema'
    -validate - Validates input instances against generated
                schemas.
    -verbose - print more informational messages
    -license - print license information
    -help - help information

/home/user>inst2xsd jiraRssFeed.xml -enumerations never 
                                  -design rd -verbose -validate
# this generates a schema named schema0.xsd

This will produce a file title schema0.xsd (this is configurable), and the schema will look similar to the snippet below:

<?xml version="1.0" encoding="UTF-8"?>
  <xs:schema attributeFormDefault="unqualified"
                elementFormDefault="qualified" 
                xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="rss">
     <xs:annotation>
       <xs:documentation>RSS generated by JIRA 98...
        </xs:documentation>
     </xs:annotation>
     <xs:complexType>
       <xs:sequence>
         <xs:element name="channel">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="title"/>
              <xs:element type="xs:anyURI" name="link"/>
              <xs:element type="xs:string" name="description"/>
              <xs:element type="xs:string" name="language"/>
              <xs:element name="item" maxOccurs="unbounded"
                               minOccurs="0">

From the snippet, we see that all of the elements we need for the Jira RSS feed have been defined.

If you want to work the other way, starting from an XML Schema, the latest release of XMLBeans provides this ability. The xsd2inst tool is a way for you to create a sample document from a schema and a global element; the instance will contain values for simple types. Using both of these tools makes working with XML instances and schemas much simpler.

At this point in our project, we have a schema we can use along with the scomp utility to create an XMLBeans type jar and get started working on our business logic and the quality metrics we have been trying to gather.

We know from looking at the Jira RSS feed instance that the bug details we care about are in an element named "item," and the resulting schema makes the item element an array. This means if we want to get information that may be occurring in all items, we will need to iterate through all items. Let's take a look at how we could do this with some code. In the following code, we are going to get all the issues opened by a user with the name specified as a method parameter:

public Vector getItemsFromReporter(String reporter) {
 
  // Get the Jira RSS feed instance from a URL
  URL jiraFeedUrl; = new URL("<JiraFeedURL>");
   
   // Get instance objects
  RssDocument rssDoc = RssDocument.Factory.parse(jiraFeedUrl);
   RssDocument.Rss rss = rssDoc.getRss();
   RssDocument.Rss.Channel channel = rss.getChannel();
  
  // We will use this object to get most of our data
  RssDocument.Rss.Channel.Item[] items = channel.getItemArray();
 
  //We will store all of the valid results in a vector
  Vector results = new Vector();
  
  for (int i = 0; i < items.length; i++) {
   RssDocument.Rss.Channel.Item item = items[i];
  
   //Add item to results vector when reporter == username
   if(item.getReporter().getUsername().compareTo(reporter) == 0)
     results.add(item);
   }
  }
  
  return results;
 }

As you can see, this is very clean Java. However, there are performance implications in using this code when the number of items grows large. In the latest release of XMLBeans, two new features were created to help with just these kinds of issues. The first is support for JDK 5.0 generics, and the second is better support for XPath and XQuery. Let's look at how we can use generics with XMLBeans.