Frequently Asked Questions About Berkeley DB XML

FAQ

Open all Close all

    General

  • Can DB XML parse my unusually encoded XML document?

    DB XML uses XQilla which is an XQuery and XPath 2 library implemented on top of the Xerces-C library for a lot of the XML parsing. Out of the box, Xerces-C has the ability to parse XML document in a number of well known encodings, including (but not limited to) UTF-8, UTF-16 and ISO-8859-1. However, if you have documents that use an unsupported encoding (Big-5 for instance) there is still a solution. You can compile the Xerces-C library with ICU support, which allows BDB XML to transcode and parse over 500 different character encodings. Using the following options to the buildall.sh script that comes with BDB XML is one way to do this:

    
    ./buildall.sh --with-xerces-conf="-t icu"
    
  • What language interfaces (APIs) does Berkeley DB XML support?
    • C++ is the native interface
    • Java is supported via SWIG plus custom Java code
    • Perl is supported via a custom interface
    • PHP is supported via a custom interface
    • Python is supported via SWIG plus custom Python code
    • Tcl is supported via SWIG
    • C#/.NET is not supported internally but is available from both JanaBiz
    • There is a Ruby API available
  • What APIs are supported by the pre-compiled Windows binaries?

    C++, Java, Perl, Python, and PHP. The binaries are compiled against specific versions of the scripting languages and will only work in those environments. See the Windows binaries documentation for details.

  • Do I need to have XML as a string or file to insert it into Berkeley DB XML?

    No. If you have XML that is not a string (e.g. DOM or other object format) it can be inserted by using the XmlEventWriter class to generate XML "events" that directly insert content into a document. This is the most efficient way to get content into BDB XML.

  • Do I need to serialize XML query results to process them?

    No. You can use the methods XmlDocument.getContentAsEventReader() and XmlValue.asEventReader() to return an XmlEventReader object that can be used to directly access the raw information. This is the most efficient mechanism for content handling provided by BDB XML.

  • Does Berkeley DB XML support XSLT?

    BDB XML does not provide an implementation of XSLT, but it's easy to integrate BDB XML with a third-party XSLT engine like Xalan or Saxon. Alternatively, you may find the the built-in XQuery engine can satisfy your XML transformation needs.

  • Does Berkeley DB XML support the UNICODE standard for text encoding?

    Yes. XML content can be provided in any UNICODE encoding that Xerces supports. However, keep in mind the interfaces that accept strings for XQuery expressions, document names, indexes, etc. only support the UTF-8 encoding.

  • How can I tell if a document exists in a container?

    How can I tell if a document exists in a container? There is no direct "does this document exist" interface. Instead there are several ways to do this:

    • Use the XmlContainer::getDocument() interface. E.g. in C++:
      
      bool existsDoc(const std::string &docname, XmlContainer &cont) {
           bool ret;
           try {
               XmlDocument doc = cont.getDocument(docname, DBXML_LAZY_DOCS);
               ret = true;
           } catch (XmlException &e) {
               if (e.getExceptionCode() == XmlException::DOCUMENT_NOT_FOUND)
                   ret = false;
               else
                   throw; // unknown error
           }     return ret;
      }
      
    • Use the XmlIndexLookup object and the built-in index on document names. This example is in Java:
      
      public boolean existsDoc(String docname, XmlContainer cont,
                XmlManager mgr) throws XmlException {
           boolean ret = false;
           XmlResults res = null;
           XmlQueryContext context = null;
           try {
               XmlIndexLookup il = mgr.createIndexLookup(cont,
                                  XmlManager.metaDataNamespace_uri,
                                        XmlManager.metaDataName_name,
                                        "metadata-equality-string",
                                        new XmlValue(docname));
               context = mgr.createQueryContext();
               res = il.execute(context, new XmlDocumentConfig().setLazyDocs(true));
               if (res.size() != 0)
                   ret = true;
           } catch (XmlException e) {
               if (e.getErrorCode() != XmlException.DOCUMENT_NOT_FOUND)
                   throw(e);
           } finally {
               if (res != null) res.delete();
           }     return ret;
      }
      
    • If that isn't enough options, it is also possible to do this using an XQuery expression. For example, this expression will return the value of true or false depending on existence of the named document: exists(doc('mycollection.dbxml/mydocname'))
  • How can I validate my XML documents?

    By default, BDB XML does not perform any DTD or schema validation of documents. However by specifying the DBXML_ALLOW_VALIDATION} flag when you create or open a container, you can ask BDB XML to make sure that any document that you add to the container is DTD and schema valid based on DTD or schema references within the document. If BDB XML cannot locate the referenced DTD or it is on the local file system and DBXML_ALLOW_EXTERNAL_ACCESS was not specified when creating your XmlManager validation will not be performed and invalid (but well-formed) documents may be inserted. You should test that validation is happening as expected. Alternatively you can use whatever C++ validation tools you would normally use, and interface them to DB XML's event API to efficently store and retrieve XML.

  • How should I store and query very large documents?

    Large documents are those which are of an unwieldy size. There is no specific size that means "large" but a 50MiB document can certainly be called large. The first thing to consider is whether such documents can be split up to make them more flexible to manage. For this discussion, let's assume not. Large documents should be stored:

    • in node storage containers. This allows them to be streamed in and out without using an excess of memory. It also allows them to be queried without materializing the entire document.
    • with node indexes. Without node indexes, indexes do not help query within large documents, which is necessary for performance.
    • with XmlInputStream. This prevents an excess of memory usage.
    • with autoIndexing off (unless you know precisely what you want from it). AutoIndexing can have the side effect of reindexing the entire container. For this reason auto-indexing is not recommended for containers of heterogenous documents.

    Another consideration is output processing. Retrieving small parts of a large document is reasonably efficient, but serializing the entire document, or performing a query which must iterate the entire document is simply going to be slow. Such operations cannot be optimized.

  • Which objects can be shared among threads and which cannot?

    XmlManager, XmlContainer, and XmlQueryExpression are all completely thread-safe, and not only may be shared among threads, they should be shared for performance and memory management reasons. XmlModify (deprecated in 2.5) and XmlIndexLookup are both thread safe after construction - that is, they have get and set methods on them that aren't thread safe, but once these methods have been used to set up the object as required, they can be executed by multiple threads safely. The remaining objects are not thread safe and should only be used by a single thread at any given time. When using threads and any API other than Java be sure to specify the DB_THREAD flag when opening your Berkeley DB environment object. Threading is on by default in Java.

  • What are the advantages of an XML Database compared to RDBMS storage?

    The advantages of using an XML repository over an RDBMS come from the reduced mismatch between the application-programming model and the data storage model. In particular applications that deal with document content or non-tabular information benefit from using an XML database. Any information that has no schema, conforms only loosely to a schema, or conforms to a schema that is extensible, or changes frequently is a good candidate. Ronald Bourret writes a lot on this topic.

  • What are the delete() methods for in the Java API?

    The delete() methods in the Java API are needed to free the C++ resources associated via JNI with the Java objects. The delete() method is called by an object's finalize() method - with all the caveats that Java places on calling of finalize(). In short, it makes no guarentees that finalize will be called on any object. Most Java DB XML object has a corresponding C++ object attached over JNI. Java is quite capable of calling garbage collection when it looks like it is running out of memory, but it doesn't know that these objects are much larger than they seem - because of the attached C++ object. Therefore it is always a good idea to call the delete() method. Several objects use the delete() method like a close() similar to closing a file. These include XmlManager, XmlContainer, and XmlResults. It is necessary to explicitly delete() these to release locks and other database resources they may hold. As of release 2.4.x it is no longer necessary to call delete() on these objects:

    • XmlValue
    • XmlDocument
    • XmlQueryContext
    • XmlUpdateContext
    • XmlMetaData
    • XmlMetaDataIterator
    • XmlTransaction -- this is deleted if you commit() or abort()
  • What are the physical limitations on a container?

    Limitations can be expressed in several dimensions:

    • Container Size Containers are files in a file system and are limited by the maximum file size allowed on a given platform.
    • Number of Containers BDB XML does not limit the number of containers, but as files in a file system, there is a limit on the number of open files.
    • Number of Documents in a Container Internal document IDs are 64-bit integers, and must be unique within a container, which limits the number of documents in a container.
  • What are the restrictions on document names?

    BDB XML treats document names as a simple, null-terminated string, of any length. There are no restrictions imposed by BDB XML in general, however, there can be problems accessing a document by name in an XQuery expression. In an XQuery expression, a document may be named using fn:doc(), which takes a URI argument. By default, the URI looks like "dbxml:/<container_name_or_alias>/<document_name>". The "dbxml:" scheme is implicit and is the default Base URI for resolution. Because the '/' character is used as a pathname delimiter in the URI, using it in a document name creates a problem -- BDB XML does not know which part of the string is container name and which part is document name. It is possible to handle odd characters in a container name by using the XmlContainer::addAlias() method to create a "safe" alias to use for the container. There is no corresponding interface for document names. Consider an application that wants to name documents using URIs, such as "myscheme://a/b/c". BDB XML is perfectly happy to create such a document. So, how does an application reference that name in an XQuery expression? The answer involves keeping '/' characters out of the argument to fn:doc() and/or fn:collection(). The simplest way is to avoid using fn:doc() and instead use fn:collection() along with a predicate. For example, instead of a query like:

    
    exists(doc('mycol/myscheme://a/b/c'))
    
    
    exists(collection('mycol')/*[dbxml:metadata('dbxml:name')='myscheme://a/b/c'])
    

    Another, obvious option is to escape the problematic '/' characters in document names to avoid the problem altogether, but that can be cumbersome, as well.

  • What development tools work with Berkeley DB XML?

    The oXygen XML Editor has built in support for Berkeley DB XML.

  • What is the difference between delete() and close() in the Java API?

    There is no longer any difference between the delete() and close() methods in the Java API for the classes that have both.

    Installation and Build

  • How do I build BDB XML against a version of Berkeley DB that is not bundled?
    • If you have an already-built, compatible Berkeley DB installation use it. If not, download the version you want and build and install it where you'd like.
    • Build BDB XML this way: buildall.sh --with-berkeleydb=path_to_db_installation

    Windows The BDB XML project files directly reference the Berkeley DB libraries as well as the location of Berkeley DB header files. These locations need to change to reflect your Berkeley DB version and location.

    1. Modify the project files (*.dsp or *.vcproj) in dbxml-2.x.x/dbxml/build_windows to point the appropriate locations for Berkeley DB:
      1. The location of include files for DB
      2. The location and name of libraries for DB (e.g. libdb46* -> libdb47*)
      3. The simplest way to change these is using an editor, but the per-project properties can also be changed using Visual Studio.
    2. You will probably need to build individual projects rather than building the entire solution because the solution file refers to Berkeley DB project files.
    3. Be sure to change your PATH environment variable to point to the locations of all required DLLs.
  • How can I install the Berkeley DB XML PHP module on Linux?

    See the instructions in the source code in dbxml-2.x.y/dbxml/src/php/README.

  • How do I solve an unsatisfied link error when using DB XML's Java API?

    Often, DB XML's Java users come up with an issue like this:

    
    java.lang.UnsatisfiedLinkError: no libdb_java43 in java.library.path
    

    This is normally due to an improperly configured java.library.path

    
    java -Djava.library.path="/home/jpcs/dbxml-2.4.8/install/lib/" MyClass
    
  • On SUSE 10.0, why can't I get any Java code (samples or my own code) that calls Berkeley DB to run?

    The version of GCC shipped with SuSE 10 produces incorrect code when run with optimization level '-O2'. That is now the default when DB is configured. By changing the Makefile in between running configure and make, with:

    
    sed -i 's/O2/O/g' Makefile
    

    the error no longer occurs. The version of gcc that ships with SUSE 10 is:

    
    gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)
    
  • What file systems commonly available on Linux systems are optimal for Berkeley DB database storage?

    A common question we get is what file system to use under Linux? Currently, the best information we have is that ext2 is the best performing Linux file system for TP applications (but as it lacks ordered data mode, it's likely not to be reliable). Second best is ext3, and ReiserFS is last. We don't have performance measurement information for XFS, but we've seen failures in the field (XFS has problems with applications which repeatedly extend files, and that is a common usage pattern in Berkeley DB databases).

  • Why do I get a compilation error on Windows when I create a project using MFC or anything that includes oledb.h?

    Berkeley DB's header file db.h and Microsoft's header file oledb.h both define the symbol DBTYPE. Unfortunately, changing either use of this symbol would break existing code. The first and simplest solution to this problem is to organize your source code so that only one of these two header files is needed in any of your sources. In other words, separate the uses of Berkeley DB and the uses of Microsoft's OLE DB library so that they are not mixed in your code. Then, just choose either db.h or oledb.h, but do not mix them in one source file. If that is not possible, and you have to mix both headers, wrap one of the #include lines as follows. Find the line where oledb.h is included in your source code. This may be in the automatically-generated stdafx.h include file. Decide whether that header file is really needed. If it is, change the include line from this:

    
    #include <oledb.h>
    

    to

    
    /* Work around DBTYPE name conflict with Berkeley DB */
    #define DBTYPE MS_DBTYPE #include <oledb.h> #undef DBTYPE
    db_cxx.h
    oledb.h
    db_cxx.h
    oledb.h
    

    Troubleshooting

  • What do I do when I run out of lockers, locks, or lock objects?

    The Berkeley DB environment keeps memory for a fixed number of lockers, locks and lock objects -- so it is always possible to run out of these resources. The maximum amount of lock resources to be allocated is set when the database environment is created, so to change this number, you will need to increase the increase the number of lockers, locks and/or lock objects and re-create your environment.

  • I'm seeing database corruption when I run out of disk space.

    Berkeley DB can continue to run when when out-of-disk-space errors occur, but it requires the application to be transaction protected. Applications which do not enclose update operations in transactions cannot recover from out-of-disk-space errors, and the result of running out of disk space may be database corruption.

  • On FreeBSD 5.4, I'm seeing the error, "Fatal error 'Spinlock called when not threaded.'" What do I do?

    On some installations of FreeBSD 5.4, you may see this runtime error: Fatal error 'Spinlock called when not threaded.' at line 87 in file /usr/src/lib/libpthread/thread/thr_spinlock.c (errno = 0) If this error is seen, it is necessary to create an /etc/libmap.conf file to map libc_r to libpthread. See "man 4 libmap.conf" for details on how to do this.

  • What do I do in Berkeley DB XML when I get an error such as "Lock table is out of available locker entries," or "Lock table is out of locks?"

    In Berkeley DB terminlogy, a "locker" is something like a database, a transaction or a cursor. Berkeley DB "locks" are owned by a "locker" and generally lock pages of a database. There are other kinds of locks. Translating to BDB XML, lockers are associated with Containers, which own database handles, as well as documents, which may own cursors. Lockers and locks only exist if you are creating and using a Berkeley DB environment, and they exist even if your application does not use transactions, in order to support some level of concurrent access. You can change the number of lockers by (re)creating the environment, using the proper arguments: C++:

    
    DbEnv::set_lk_max_lockers(u_int32_t)
    DbEnv::set_lk_max_locks(u_int32_t)
    

    Java:

    
    EnvironmentConfig.setMaxLockers(int)
    EnvironmentConfig.setMaxLocks(int)
    

    These must be set *before* you create your environment using the open method. The obvious question is "what number should I use?" For lockers, you should use at least the maximum number of simultaneously referenced pages you may have. For a large, inefficient query it is possible to touch all pages in a container so the size of the container divided by its page size should be large enough. If you still run out you can resize again. If you want to change lock or locker parameters in an existing environment, it must be removed, then re-created. This does not affect the existing BDB XML containers or database files in that environment, as long as it's done properly. To remove an environment safely, do one of these two things:

    1. Call: Environment.remove() (Java) or DbEnv::remove() (C++) before you create it. When you open the environment later, make sure that environmentConfig.setAllowCreate(true) (Java) or DB_CREATE (C++) is set.
    2. Remove the __db.* files from your environment directory after you have shut down your application and before you re-start it. Again, be sure that your code will create a new environment upon restart.
  • Why do I get "Uncaught exception from C++ API" or "Out of memory" in my Java application?

    There could be a number of reasons for getting these exceptions, however, if you are using Java, it is probably because you have forgotten to call the delete() method on some of your DB XML objects. As of release 2.4.x this error will be far less likely. The likely cause is not deleting XmlResults objects. Underneath every DB XML Java object is a corresponding C++ object. At some point the Java garbage collector will delete the Java object which in turn deletes the C++ object. However, since you have no control over when Java will garbage collect your objects, it is always a wise idea and in some cases necessary to use the delete() method on DB XML objects when you have finished with them.

    • As of 2.4.x, the following classes should be deleted/closed: XmlContainer, XmlEventReader, XmlEventReaderToWriter, XmlEventWriter, XmlIndexLookup, XmlIndexSpecification, XmlInputStream, XmlManager, XmlModify, XmlQueryExpression, XmlResolver, XmlResults, and XmlStatisitics.
    • As of 2.4.x the following classes do not need to be deleted: XmlContainerConfig, XmlDocument, XmlDocumentConfig, XmlException, XmlIndexDeclaration, XmlManagerConfig, XmlMetaData, XmlMetaDataIterator, XmlQueryContext, XmlTransaction, XmlUpdateContext, and XmlValue.
  • Why do I get reports of uninitialized memory reads and writes when running software analysis tools (for example, Rational Software Corp.'s Purify tool)?

    For performance reasons, Berkeley DB does not write the unused portions of database pages or fill in unused structure fields.

    Access Methods

  • How do I efficiently look up all documents in a container?

    Use the call, XmlContainer::getAllDocuments(0); This is the same as querying for all documents in the name index. You can do anything with the result set you would do with query results, including using them as context items for an additional query.

    Transactions

  • Do I need to use transactions in my application?

    There are several reasons for using transactions. Any one of this is sufficient to require transactions:

    • You require read/write access to data and the CDS configuration (multiple readers, single writer) is not sufficiently concurrent
    • You need relatively fine-grained disaster recovery
    • You need a hot backup solution
    • You need replication for availability and/or performance
    • You need atomic, recoverable updates

    Whereas you may not need TDS if:

    • You are accessing a read-only application after the data is loaded
    • Your application's data can be recreated
    • You don't have a need for transactional backup
    • Your application is single threadded (not concurrent)
    • Your application doesn't need recoverability provided by transactions
  • How can I share a transaction between Berkeley DB and BDB XML?

    The transacted methods in BDB XML all take an XmlTransaction object. This object can be constructed using an already-created Berkeley DB transaction object. Another mechanism is to get the Berkeley DB object from the BDB XML XmlTransaction object. The thing to avoid is creating two unrelated transaction objects that may conflict with one another. In C++, to construct an XmlTransaction from DbTxn:

    
    DbTxn *dbtxn = 0; // use Berkeley DB environment to begin a transaction
    env.txn_begin(0, &dbtxn, 0); // create XmlTransaction from DbTxn
    XmlTransaction xmltxn = manager.createTransaction(dbtxn);
    

    In C++, to obtain a DbTxn from XmlTransaction:

    
    XmlTransaction xmltxn = manager.createTransaction(); // get DbTxn
    DbTxn *dbtxn = xmltxn.getDbTxn();
    

    When using DbTxn and XmlTransaction together, a couple of rules are important. Once a transaction is committed or aborted, the underlying DbTxn object will have been deleted, and must not be referenced again. Also, a DbTxn object may only be referenced by one XmlTransaction object at a time.

    Since in 2.5, BDB XML uses BDB C API instead of BDB C++ API, so you should use DB_TXN and getDB_TXN() to substituate for DbTxn and getDbTxn() respectively.

    In Java, to construct an XmlTransaction from a Transaction:

    
    Transaction dbtxn = env.beginTransaction(null,null);
    XmlTransaction xmltxn = manager.createTransaction(dbtxn);
    

    and

    
    XmlTransaction xmltxn = manager.createTransaction();
    Transaction dbtxn = xmltxn.getTransaction();
    
  • Do I have to run deadlock detection?

    Deadlock handling is necessary in all applications that use transactions and have concurrent read/write or write/write access to BDB XML. The Berkeley DB reference guide has more to say on the subject. In Java, one strategy for dealing with deadlocks is as follows:

    • Interface (to be implemented by method-local anonymous classes for the most part):
      
      public interface SleepycatXmlTransactionWrapper<T> {
               public T run() throws Exception;
      }
      
    • Class with static method and static ThreadLocal property which actually does the transaction + deadlock retries:
      
      public class SleepycatXmlTransaction {
        public static final int DEADLOCK_RETRIES = 3;
        private static ThreadLocal<XmlTransaction> tx = new ThreadLocal<XmlTransaction>();
        // static methods use ThreadLocal XmlTransaction
        private static void createTransaction() {/* etc. */}
        private static void commitTransaction() {/* etc. */}
        private static void abortTransaction() {/* etc. */}
          public static <T> T wrapWithDeadlockRetry(
          SleepycatXmlTransactionWrapper<T> txw) throws Exception {
          T returnValue = null;
          for(int i = 0; i &lt; DEADLOCK_RETRIES; i++) {
                 try {
                  createTransaction();
                  // Do the work, return whatever's needed
                  returnValue = txw.run();
                  commitTransaction();
                  return returnValue;
                 }
                 catch(Exception e) {
                  // Abort, but continue in the loop if our
                   // transaction was deadlocked
                  abortTransaction();
                  if(isDeadlock(e)) {
                   Loggers.XML_RUNTIME.info("Transaction deadlock " + (i + 1));
                   // Give other threads a chance
                   Thread.yield();
                   continue;
                  }
                  else
                   throw e;
                 }
                }
                // We're out of retries - too bad
                throw new OutOfRetriesException("Transaction failed after " +
                    DEADLOCK_RETRIES + " retries");
        }
      }
      
    • Use by creating method-local class implementing SleepycatXmlTransactionWrapper and passing to the static SleepycatXmlTransaction.wrapWithDeadlockTransaction(...) method:
      
      public String myMethod(final String bar) throws SomeException {
        String result = null;
        try {
                result = SleepycatXmlTransaction.wrapWithDeadlockRetry(
                 new SleepycatXmlTransactionWrapper<String>() {
                 public String run() throws Exception {
                  //...all your code...
                 }
                });
        }
        catch(Exception any) {
        }
        return result;
      }
      

    Querying

  • Can I write my own XQuery functions?

    XQuery functions are essential for performing more complicated recursive queries. The syntax do that is this:

    
    declare function local:times_two($value) { $value * 2 }
    

    XQuery specifies that function declarations must go in the query prolog, at the start of the query.

  • How can I address multiple containers in one query?

    You can use multiple fn:collection() functions in your XQuery expression. For instance, you can use the union operator ("|") to combine the results of different calls to fn:collection(), like this:

    
    (collection("A") | collection("B") | collection("C"))/foo/bar
    
  • How can I query a document without putting it in the database?

    There are various methods of acheiving this:

    • Specify the DBXML_ALLOW_EXTERNAL_ACCESS flag to the XmlManager when you create it. Then reference "file://" or "http://" URLs when using the XQuery function fn:doc().
    • Create an XmlDocument object using the XML that you wish to query. Then create an XmlValue from that XmlDocument, and specify this value as the contextItem parameter to XmlQueryExpression::execute() when you execute your query:
      
      XmlDocument doc = manager.createDocument();
      doc.setContent(myXmlContent);
      XmlValue val(doc);
      expression.execute(val, queryContext);
      

    This allows you to reference your XML document as the context item (".") in your query.

    • You can also derive a class from XmlResolver (in C++, Java and Python), and resolve any URI to the correct location (on disk, in memory, etc.). This is acheived by registering your custom XmlResover with the XmlManager object.
  • How can I reference an XQuery module in an XQuery expression?

    An XQuery module import statement looks something like this: import module namespace tm='test-module' at 'test-module.xq'; In BDB XML the default module resolution treats the "test-module.xq" as a path name in the file system. For example, the above statement would look for the file, test-module.xq, in the current directory. The resolution also pays attention to the base URI set in the XmlQueryContext object used for the query. For example, if the base URI is "file://tmp/" the resolution will look for the file "/tmp/test-module.xq" Yet another way to handle module import is to implement your own instance of the XmlResolver class, and register it using the method XmlManager::registerResolver() Module imports will call the XmlResolver::resolveEntity() method. This allows you to entirely control the location of modules, and place modules in the file system or in a Berkeley DB database or in BDB XML metadata item, or even construct them in code.

    
    class testResolver extends XmlResolver {
      public testResolver() throws XmlException {}
      public boolean resolveDocument(XmlTransaction txn, XmlManager mgr,
                 String uri, XmlValue val)
          throws XmlException {
          return false;
      }
      public boolean resolveCollection(XmlTransaction txn, XmlManager mgr,
                   String uri, XmlResults res)
          throws XmlException {
          return false;
      }
      public XmlInputStream resolveSchema(XmlTransaction txn, XmlManager mgr,
                 String location, String nameSpace)
          throws XmlException {
          return null;
      }
      public XmlInputStream resolveEntity(XmlTransaction txn, XmlManager mgr,
                 String systemId, String publicId)
          throws XmlException {
          return null;
      }
      public boolean resolveModuleLocation(XmlTransaction txn, XmlManager mgr,
                  String nameSpace, XmlResults result)
          throws XmlException {
             return false;
      }
      public XmlInputStream resolveModule(XmlTransaction txn, XmlManager mgr,
                 String moduleLocation, String nameSpace)
          throws XmlException {
             return null;
      }
    }
    

    If you want to return a specific XQuery module in your resolveModule() method, you can create an XmlInputStream object from a file or even from a java.io.InputStream object using XmlManager.createInputStream()The C++ code to do the same thing is similar.

  • How do I get the value of a node?

    Many people use this syntax: /foo/bar/text(). In the majority of cases, this is incorrect! The explanation follows. Consider this document:

    
    <foo>
      <bar>hello <baz>you</baz>there</bar>
    </foo>
    

    In XQuery, text() is a text node test. This means that in the example /foo/bar/text(), text() is short for child::text(), which tells you a little more about what it does. This expression returns all children of the current context that are text nodes. So the in this example, the aforementioned expression will return two text nodes, one with the value "hello ", and the second with the value "there". What's important to note here is that not only are you getting text nodes returned, rather than a string - but that the text nodes' combined value does not equal the value of the bar element! The XQuery specification defines the string value of this element as "hello you there". In other words the concatenation of all the values of the descendant text nodes. Another important issue is that attribute nodes don't even have any text node children. So if you wrote /foo/@bar/text() expecting to get the attributes value, you might be very surprised when the query engine quite rightly returned an empty sequence. Thirdly, BDB XML's query planner is not going to optimize any use of text(). It can't, as the BDB XML indexes deal with the value of elements and attributes, not their text node children. So you will lose out on valuable optimization if you use text(). Enough of why it's wrong. How do you get the value of a node? Here are some methods, and the differences between them:

    1. Use the fn:string() function. This returns the string value of the node - so no schema type information.
    2. Use the fn:data() function. This returns a typed value from the node - in other words, if there is a schema for the document you will get a value of the type the schema says it should be. If there isn't a schema, you will get a value of type xdt:untypedAtomic.
    3. Use casting: /foo/bar cast as xs:decimal or xs:date(/foo/bar). This can be used to get a value of a specific type.

    Whatever you do, try to get out of the habit of using text() unless you know precisely what you want from it.

  • How do I include a document name in a query?

    There are two ways to do this:

    1. Use the doc() function, with the "dbxml:" URI scheme. The dbxml URI scheme uses the format "dbxml:/<container_name_or_alias>(/<document_name>)?", and would typically be used like this:
      
      doc("dbxml:/myContainer/myDocument")
      

    Note that the default base URI is "dbxml:/", which means that you can often leave off the scheme specifier in the URI: doc("myContainer/myDocument")

    1. You can access the document name through the dbxml:metadata() function, as it is stored as document metadata, ie:
      
      for $a in collection() return dbxml:metadata("dbxml:name", $a)
      
  • How do I query for metadata?

    Document metadata is available in a query by using the dbxml:metadata() function. This function takes arguments of a the name of the metadata as a string, and an optional argument of a node from the document to look metadata up for. If the second argument is not specified, the context item is used instead. The dbxml:metadata() function returns the typed metadata from the document, by treating the first argument as a QName and resolving it's prefix as a URI. Some examples of usage: Return the names of all documents in the default collection:

    
    for $a in collection() return dbxml:metadata("dbxml:name", $a)
    

    Return documents with {http://timstamp.org}timestamp metadata less than 5:

    
    declare namespace ts = "http://timestamp.org";
    collection()[dbxml:metadata("ts:timestamp") < 5]
    
  • How do I use values of type xs:QName?

    There are a couple of ways to create an xs:QName item in a query:

    1. You can use the xs:QName() constructor function to create a QName in your query. In this case you will need to make sure you have bound a namespace URI for the prefix that you use:
      
      declare namespace foo="http://foo"; xs:QName("foo:bar")
      
    2. You can use the form of the xs:QName() function that takes two arguments. This allows you to specify the namespace URI for the QName, as well as the prefix and localname:
      
      xs:QName("http://foo", "foo:bar")
      

    Updating

  • How can I update or modify my documents?

    There are several methods of doing this:

    1. Modify a document's content using XmlDocument::setContent(), or XmlDocument::setContentAsXmlInputStream(). Then put the changes back into the XmlContainer using XmlContainer::updateDocument().
    2. Use an XQuery Update query to specify the changes required.
    3. The deprecated XmlModify class can also be used to specify updates, but XQuery Update should be used in preference to it.
  • Do you support the XML:DB Initiative's XUpdate specification?

    No. The XUpdate specification is not complete, and does not include support for important concepts upon which BDB XML relies, such as transactions. It has also remained unchanged and in working draft since September 14, 2000. From version 2.4, DB XML has supported the W3C's XQuery Update language, which is far more suitable for updating persistant XML documents.

  • Why does it take as long to delete a document as it does to insert it?

    The BDB XML data model saves database space (and time during updates) by not storing a mapping from document to index entries. This means that the document needs to be re-parsed during a delete, in order to calculate which index entries to remove.

  • XmlModify gives me the error "Cannot perform a modification on an XmlValue that isn't either Node or Document". What is the problem?

    This error usually means that the value you have selected to perform a modification on does not come from the exact same XmlDocument as the one given to XmlModify::execute(). This is based on reference equality, rather than the same document obtained from the database in two different ways. For instance, this will give you an error:

    
    modify.addRemoveStep(manager.prepare("doc('container/foo')//bar", qc));
    XmlDocument foo = container.getDocument("foo");
    modify.execute(XmlValue(foo), qc, uc);
    

    where as this will not:

    
    modify.addRemoveStep(manager.prepare(".//bar", qc));
    XmlDocument foo = container.getDocument("foo");
    modify.execute(XmlValue(foo), qc, uc);
    

    The simple rule of thumb should be that the modify step's query should always be relative - that is, it should navigate from the context item ("."), rather than using the fn:doc() or fn:collection().

    BDB XML Shell

  • How can I get command line editing for the DB XML shell?

    You can use a useful program called rlwrap to do this. This will give you command line history, and bash style command line editing. Use it like this:

    
    rlwrap dbxml [command line args here]
    

    Indexing

  • How and when should substring indexing be used?

    Consider a document set marked up using a complex DTD with deep nesting (14 or more levels deep in some cases). A typical "simple" search from a user's perspective is: Find me the word FOO anywhere in the corpus. The most intuitive query for this is: collection()//*[contains(., 'FOO')] Is there an index type that optimizes this case? BDB XML can not currently optimise a query of the sort above, since it only indexes values for named nodes. One way to acheive this is to add a substring index on the document element, and write a composite query like this:

    
    collection()/docElem[contains(., 'FOO')]//*[contains(., 'FOO')]
    
  • How can I see what is in a BDB XML index?

    BDB XML provides methods to directly access indexes using the object, XmlIndexLookup and its methods. Using an XmlIndexLookup object, it is possible to:

    1. Enumerate all values in an index
    2. Lookup a specific value in an index
    3. Perform an inequality lookup (e.g. all values GE X) in an index
    4. Perform range lookups in an index (e.g. all values GE X and LT Y)
    5. Get index lookup results in forward or reverse index order

    Here's a simple equality lookup example in C++:

    
    // assume cont and mgr are open container and manager...
    // perform lookup in a decimal equality index on the "age" element
    // look for all entries with age equal to 27.  The operation
    // is left off -- it defaults to equality if a value is provided.
    XmlIndexLookup il = mgr.createIndexLookup(cont, "", age,
          "node-element-equality-decimal",
          XmlValue(XmlValue::DECIMAL, 27));
    // the omnipresent query context
    XmlQueryContext qc = mgr.createQueryContext();
    // execute the lookup
    XmlResults res = il.execute(qc);
    // handle results ...
    

    Here's a more complex example, which uses the same index as above (decimal index on "age"), but finds all entries greater than 12 and less than or equal to 35, and returns them in reverse sort order (highest first):

    
    XmlQueryContext qc = mgr.createQueryContext();
    // create the basic XmlIndexLookup object, which sets
    // the lower-bound operation and value (gt, 12)
    XmlIndexLookup il = mgr.createIndexLookup(cont, "", age,
          "node-element-equality-decimal",
          XmlValue(XmlValue::DECIMAL, 12),
          XmlIndexLookup::GT);
    // now, set the upper bound and operation (35, LTE)
    il.setHighBound(XmlValue(XmlValue::DECIMAL, 35),
         XmlIndexLookup::LTE);
    // perform the operation, setting the reverse order flag
    XmlResults res = il.execute(qc, DBXML_REVERSE_ORDER);
    // handle results ...
    

    This interface is useful for sophisticated applications that can benefit from direct index lookup, with or without also using the XQuery interface.

  • How do I specify both names for an edge index?

    A lot of people try to specify the "parent/child" name combination for an edge index - in DB XML this is not possible. An edge index just records what the parent name is - you cannot specify that DB XML only make index entries for nodes with a specific named parent.

  • What is the best way to add indexes?

    In Berkeley DB XML it is best to specify indexes on a container before documents are inserted. When indexes in a populated container are changed, all documents in that container must be reindexed, and this can take a long time. If indexes must be changed after the fact, it is best to add/remove several at once, to amortize the cost of the reindex. This means using code that looks like this:

    
    // get the index specification
    XmlIndexSpecification ispec = container.getIndexSpecification();
    // modify the index specification, as desired ...
    // reset the index specification
    container.setIndexSpecification(ispec, ...);
    

    It is very simple to use the dbxml shell program to add or remove a single index, but because the shell can only manipulate one index at a time, using the shell to manage indexes on a populated container may be less efficient than using code.

    Development Tools

  • How can I develop in Java with Eclipse 3.3 and BDB XML?

    After you install the Java API and create a Java project in Eclipse you must add db.jar and dbxml.jar to the Build Path of your project, which is done as follows:

    1. Right click your project and select Build Path->Configure Build Path.
    2. Select Java Build Path then select the Libraries tab.
    3. Push the Add External Jars then add db.jar and dbxml.jar, which should be located in the folder (dbxml-root)/jar.
  • How can I debug my BDB XML Java application with Eclipse 3.3?

    In order enter the BDB XML Java libraries while debugging you must link to the BDB XML Java source as follows:

    1. Right click your project and select Build Path->Link Source.
    2. Click the Browse button and select (BDB XML-root)/dbxml/src/java.
    3. Press the Finish button.
  • When debugging my BDB XML Java application in Eclipse, how can I step into the native C++ libraries?

    To do this requires a source installation of BDB XML, Windows, and Visual Studio. The description below specifically uses Visual Studios 2005 Express Edition. There is a write up and video of this process at this blog , but you have to do some extra steps to get the process to work with BDB XML. The first step to getting Visual Studio and Eclipse to run together is to build the C++ libraries with the correct arguments. Set the build mode to Debug for the entire Solution. Then right click the dbxml_java project and select Properties from the menu. In the Properties window select Linker->Debugging, then set Generate Map File to Yes(/MAP) and Map Exports to Yes(/MAPINFO:EXPORTS). Repeat this process for the projects XercesLib, xqilla, dbxml, and db_java. Then, rebuild all of these libraries. Next you need to remove all release .dll files for the BDB XML libraries from your computer's PATH (for example, you would want to delete libdbxml24.dll, but keep libdbxml24d.dll). For example, I keep my release .dll files in dbxml/bin, so I moved all those files into a folder named HideDll. Remember, everytime you build the dbxml project it copies the release .dll files back into the dbxml/bin directory. If release .dll files are left in the Path then you will get illegal memory access exceptions, and execeptions for deleting illegal memory in the heap. This is because both release and debug libraries have been loaded, and they treat memory in different ways that cannot be combined. To be safe, you might want to delete the dbxml/build_windows/Release directory, that directory keeps copies off all the release libraries. Now you want to set Visual Studio up to run your Java programs as a server, this is how you do it. First, right click the dbxml_java project and select Properties from the menu. On the Properties window select Configuration Properties -> Debugging. Set Command to the path of your java executable, for example I set it to C:\jdk1.5.0\bin\java.exe. Then put the following in the Command Arguments section:

    
       -Xmx400m -Xms400m -Xdebug -Xnoagent \
       -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 \
       -classpath <paths-to-jar-files> <java-program-to-run>
    

    For example, to run the test program AutoOpenTest.java on my computer I put:

    
       -Xmx400m -Xms400m -Xdebug -Xnoagent \
       -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 \
       -classpath C:\Sleepycat\jar\junit-4.4.jar;C:\Sleepycat\jar\dbxmltest.jar;\
       C:\Sleepycat\jar\dbxml.jar;C:\Sleepycat\jar\db.jar \
       dbxmltest.XmlTestRunner
    

    Then set Working Directory to the directory you want as a working directory. Hit OK to save these settings. Then right click on dbxml_java and select Debug->Start New Instance. This will start your java program running as a server, but the program will wait for Eclipse to attach to it before it begins executing. Next go to Eclipse and set whatever breakpoints you want in your java program. Then go to Run->Open Debug Dialog. Select Remote Java Application and select to create a New one. Set Connection Type to Standard(Socket Attach), Host to localhost, and Port to 8000. Next hit Debug to to attach the debugger to the waiting Java server. Now the program should stop in Eclipse if you set any breakpoints in the Java, and it should stop in Visual Studio if you set any breakpoints in C++. Unfortunatlly you cannot step directly from Java to C++, or from C++ to Java, but if you want to enter XmlContainer_putDocument__SWIG_1 from Java, then just go to dbxml_java_wrap.cpp in Visual Studio and set a breakpoint in the function Java_com_sleepycat_dbxml_dbxml_1javaJNI_XmlContainer_1putDocument_1_1SWIG_10 (or whatever function SWIG made to answer a call to XmlContainer_putDocument__SWIG_1). If when you start Eclipse you get an error message like Cannot Find file MSVCP80D.dll or something similar, this is not because Windows cannot find that file, but has to do with how .dll files are configured. In this case try deleting your dbxml/bin/debug directory and your dbxml/build_windows/Win32/Debug directory and rebuilding so that the .dll files can be built and configured correctly.