DB XML uses XQilla which is an XQuery and XPath 2 library implemented on top of the Xerces-C library for a lot of the XML parsing. Out of the box, Xerces-C has the ability to parse XML document in a number of well known encodings, including (but not limited to) UTF-8, UTF-16 and ISO-8859-1. However, if you have documents that use an unsupported encoding (Big-5 for instance) there is still a solution. You can compile the Xerces-C library with ICU support, which allows BDB XML to transcode and parse over 500 different character encodings. Using the following options to the buildall.sh script
that comes with BDB XML is one way to do this:
./buildall.sh --with-xerces-conf="-t icu"
C++, Java, Perl, Python, and PHP. The binaries are compiled against specific versions of the scripting languages and will only work in those environments. See the Windows binaries documentation for details.
How can I tell if a document exists in a container? There is no direct "does this document exist" interface. Instead there are several ways to do this:
XmlContainer::getDocument()
interface. E.g. in C++: bool existsDoc(const std::string &docname, XmlContainer &cont) {
bool ret;
try {
XmlDocument doc = cont.getDocument(docname, DBXML_LAZY_DOCS);
ret = true;
} catch (XmlException &e) {
if (e.getExceptionCode() == XmlException::DOCUMENT_NOT_FOUND)
ret = false;
else
throw; // unknown error
} return ret;
}
XmlIndexLookup
object and the built-in index on document names. This example is in Java: public boolean existsDoc(String docname, XmlContainer cont,
XmlManager mgr) throws XmlException {
boolean ret = false;
XmlResults res = null;
XmlQueryContext context = null;
try {
XmlIndexLookup il = mgr.createIndexLookup(cont,
XmlManager.metaDataNamespace_uri,
XmlManager.metaDataName_name,
"metadata-equality-string",
new XmlValue(docname));
context = mgr.createQueryContext();
res = il.execute(context, new XmlDocumentConfig().setLazyDocs(true));
if (res.size() != 0)
ret = true;
} catch (XmlException e) {
if (e.getErrorCode() != XmlException.DOCUMENT_NOT_FOUND)
throw(e);
} finally {
if (res != null) res.delete();
} return ret;
}
exists(doc('mycollection.dbxml/mydocname'))
By default, BDB XML does not perform any DTD or schema validation of documents. However by specifying the DBXML_ALLOW_VALIDATION
} flag when you create or open a container, you can ask BDB XML to make sure that any document that you add to the container is DTD and schema valid based on DTD or schema references within the document. If BDB XML cannot locate the referenced DTD or it is on the local file system and DBXML_ALLOW_EXTERNAL_ACCESS
was not specified when creating your XmlManager
validation will not be performed and invalid (but well-formed) documents may be inserted. You should test that validation is happening as expected. Alternatively you can use whatever C++ validation tools you would normally use, and interface them to DB XML's event API to efficently store and retrieve XML.
Large documents are those which are of an unwieldy size. There is no specific size that means "large" but a 50MiB document can certainly be called large. The first thing to consider is whether such documents can be split up to make them more flexible to manage. For this discussion, let's assume not. Large documents should be stored:
Another consideration is output processing. Retrieving small parts of a large document is reasonably efficient, but serializing the entire document, or performing a query which must iterate the entire document is simply going to be slow. Such operations cannot be optimized.
XmlManager
, XmlContainer
, and XmlQueryExpression
are all completely thread-safe, and not only may be shared among threads, they should be shared for performance and memory management reasons. XmlModify
(deprecated in 2.5) and XmlIndexLookup
are both thread safe after construction - that is, they have get and set methods on them that aren't thread safe, but once these methods have been used to set up the object as required, they can be executed by multiple threads safely. The remaining objects are not thread safe and should only be used by a single thread at any given time. When using threads and any API other than Java be sure to specify the DB_THREAD
flag when opening your Berkeley DB environment object. Threading is on by default in Java.
The advantages of using an XML repository over an RDBMS come from the reduced mismatch between the application-programming model and the data storage model. In particular applications that deal with document content or non-tabular information benefit from using an XML database. Any information that has no schema, conforms only loosely to a schema, or conforms to a schema that is extensible, or changes frequently is a good candidate. Ronald Bourret writes a lot on this topic.
The delete()
methods in the Java API are needed to free the C++ resources associated via JNI with the Java objects. The delete()
method is called by an object's finalize()
method - with all the caveats that Java places on calling of finalize()
. In short, it makes no guarentees that finalize will be called on any object. Most Java DB XML object has a corresponding C++ object attached over JNI. Java is quite capable of calling garbage collection when it looks like it is running out of memory, but it doesn't know that these objects are much larger than they seem - because of the attached C++ object. Therefore it is always a good idea to call the delete()
method. Several objects use the delete()
method like a close()
similar to closing a file. These include XmlManager
, XmlContainer
, and XmlResults
. It is necessary to explicitly delete()
these to release locks and other database resources they may hold. As of release 2.4.x it is no longer necessary to call delete()
on these objects:
XmlValue
XmlDocument
XmlQueryContext
XmlUpdateContext
XmlMetaData
XmlMetaDataIterator
XmlTransaction
-- this is deleted if you commit()
or abort()
Limitations can be expressed in several dimensions:
BDB XML treats document names as a simple, null-terminated string, of any length. There are no restrictions imposed by BDB XML in general, however, there can be problems accessing a document by name in an XQuery expression. In an XQuery expression, a document may be named using fn:doc()
, which takes a URI argument. By default, the URI looks like "dbxml:/<container_name_or_alias>/<document_name>
". The "dbxml:" scheme is implicit and is the default Base URI for resolution. Because the '/' character is used as a pathname delimiter in the URI, using it in a document name creates a problem -- BDB XML does not know which part of the string is container name and which part is document name. It is possible to handle odd characters in a container name by using the XmlContainer::addAlias()
method to create a "safe" alias to use for the container. There is no corresponding interface for document names. Consider an application that wants to name documents using URIs, such as "myscheme://a/b/c
". BDB XML is perfectly happy to create such a document. So, how does an application reference that name in an XQuery expression? The answer involves keeping '/' characters out of the argument to fn:doc()
and/or fn:collection()
. The simplest way is to avoid using fn:doc()
and instead use fn:collection()
along with a predicate. For example, instead of a query like:
exists(doc('mycol/myscheme://a/b/c'))
exists(collection('mycol')/*[dbxml:metadata('dbxml:name')='myscheme://a/b/c'])
Another, obvious option is to escape the problematic '/' characters in document names to avoid the problem altogether, but that can be cumbersome, as well.
The oXygen XML Editor has built in support for Berkeley DB XML.
buildall.sh --with-berkeleydb=path_to_db_installation
Windows The BDB XML project files directly reference the Berkeley DB libraries as well as the location of Berkeley DB header files. These locations need to change to reflect your Berkeley DB version and location.
*.dsp
or *.vcproj
) in dbxml-2.x.x/dbxml/build_windows
to point the appropriate locations for Berkeley DB: PATH
environment variable to point to the locations of all required DLLs
.The version of GCC shipped with SuSE 10 produces incorrect code when run with optimization level '-O2'. That is now the default when DB is configured. By changing the Makefile in between running configure and make, with:
sed -i 's/O2/O/g' Makefile
the error no longer occurs. The version of gcc that ships with SUSE 10 is:
gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)
A common question we get is what file system to use under Linux? Currently, the best information we have is that ext2 is the best performing Linux file system for TP applications (but as it lacks ordered data mode, it's likely not to be reliable). Second best is ext3, and ReiserFS is last. We don't have performance measurement information for XFS, but we've seen failures in the field (XFS has problems with applications which repeatedly extend files, and that is a common usage pattern in Berkeley DB databases).
Berkeley DB's header file db.h
and Microsoft's header file oledb.h
both define the symbol DBTYPE
. Unfortunately, changing either use of this symbol would break existing code. The first and simplest solution to this problem is to organize your source code so that only one of these two header files is needed in any of your sources. In other words, separate the uses of Berkeley DB and the uses of Microsoft's OLE DB library so that they are not mixed in your code. Then, just choose either db.h
or oledb.h
, but do not mix them in one source file. If that is not possible, and you have to mix both headers, wrap one of the #include
lines as follows. Find the line where oledb.h
is included in your source code. This may be in the automatically-generated stdafx.h
include file. Decide whether that header file is really needed. If it is, change the include line from this:
#include <oledb.h>
to
/* Work around DBTYPE name conflict with Berkeley DB */
#define DBTYPE MS_DBTYPE #include <oledb.h> #undef DBTYPE
db_cxx.h
oledb.h
db_cxx.h
oledb.h
The Berkeley DB environment keeps memory for a fixed number of lockers, locks and lock objects -- so it is always possible to run out of these resources. The maximum amount of lock resources to be allocated is set when the database environment is created, so to change this number, you will need to increase the increase the number of lockers, locks and/or lock objects and re-create your environment.
Berkeley DB can continue to run when when out-of-disk-space errors occur, but it requires the application to be transaction protected. Applications which do not enclose update operations in transactions cannot recover from out-of-disk-space errors, and the result of running out of disk space may be database corruption.
On some installations of FreeBSD 5.4, you may see this runtime error: Fatal error 'Spinlock called when not threaded.' at line 87 in file /usr/src/lib/libpthread/thread/thr_spinlock.c (errno = 0)
If this error is seen, it is necessary to create an /etc/libmap.conf
file to map libc_r
to libpthread
. See "man 4 libmap.conf" for details on how to do this.
In Berkeley DB terminlogy, a "locker" is something like a database, a transaction or a cursor. Berkeley DB "locks" are owned by a "locker" and generally lock pages of a database. There are other kinds of locks. Translating to BDB XML, lockers are associated with Containers, which own database handles, as well as documents, which may own cursors. Lockers and locks only exist if you are creating and using a Berkeley DB environment, and they exist even if your application does not use transactions, in order to support some level of concurrent access. You can change the number of lockers by (re)creating the environment, using the proper arguments: C++:
DbEnv::set_lk_max_lockers(u_int32_t)
DbEnv::set_lk_max_locks(u_int32_t)
Java:
EnvironmentConfig.setMaxLockers(int)
EnvironmentConfig.setMaxLocks(int)
These must be set *before* you create your environment using the open method. The obvious question is "what number should I use?" For lockers, you should use at least the maximum number of simultaneously referenced pages you may have. For a large, inefficient query it is possible to touch all pages in a container so the size of the container divided by its page size should be large enough. If you still run out you can resize again. If you want to change lock or locker parameters in an existing environment, it must be removed, then re-created. This does not affect the existing BDB XML containers or database files in that environment, as long as it's done properly. To remove an environment safely, do one of these two things:
Environment.remove()
(Java) or DbEnv::remove()
(C++) before you create it. When you open the environment later, make sure that environmentConfig.setAllowCreate(true)
(Java) or DB_CREATE
(C++) is set.__db.*
files from your environment directory after you have shut down your application and before you re-start it. Again, be sure that your code will create a new environment upon restart.There could be a number of reasons for getting these exceptions, however, if you are using Java, it is probably because you have forgotten to call the delete()
method on some of your DB XML objects. As of release 2.4.x this error will be far less likely. The likely cause is not deleting XmlResults
objects. Underneath every DB XML Java object is a corresponding C++ object. At some point the Java garbage collector will delete the Java object which in turn deletes the C++ object. However, since you have no control over when Java will garbage collect your objects, it is always a wise idea and in some cases necessary to use the delete()
method on DB XML objects when you have finished with them.
There are several reasons for using transactions. Any one of this is sufficient to require transactions:
Whereas you may not need TDS if:
The transacted methods in BDB XML all take an XmlTransaction
object. This object can be constructed using an already-created Berkeley DB transaction object. Another mechanism is to get the Berkeley DB object from the BDB XML XmlTransaction
object. The thing to avoid is creating two unrelated transaction objects that may conflict with one another. In C++, to construct an XmlTransaction
from DbTxn
:
DbTxn *dbtxn = 0; // use Berkeley DB environment to begin a transaction
env.txn_begin(0, &dbtxn, 0); // create XmlTransaction from DbTxn
XmlTransaction xmltxn = manager.createTransaction(dbtxn);
In C++, to obtain a DbTxn from XmlTransaction:
XmlTransaction xmltxn = manager.createTransaction(); // get DbTxn
DbTxn *dbtxn = xmltxn.getDbTxn();
When using DbTxn and XmlTransaction together, a couple of rules are important. Once a transaction is committed or aborted, the underlying DbTxn
object will have been deleted, and must not be referenced again. Also, a DbTxn
object may only be referenced by one XmlTransaction
object at a time.
Since in 2.5, BDB XML uses BDB C API instead of BDB C++ API, so you should use DB_TXN and getDB_TXN() to substituate for DbTxn and getDbTxn() respectively.
In Java, to construct an XmlTransaction from a Transaction:
Transaction dbtxn = env.beginTransaction(null,null);
XmlTransaction xmltxn = manager.createTransaction(dbtxn);
and
XmlTransaction xmltxn = manager.createTransaction();
Transaction dbtxn = xmltxn.getTransaction();
Deadlock handling is necessary in all applications that use transactions and have concurrent read/write or write/write access to BDB XML. The Berkeley DB reference guide has more to say on the subject. In Java, one strategy for dealing with deadlocks is as follows:
public interface SleepycatXmlTransactionWrapper<T> {
public T run() throws Exception;
}
public class SleepycatXmlTransaction {
public static final int DEADLOCK_RETRIES = 3;
private static ThreadLocal<XmlTransaction> tx = new ThreadLocal<XmlTransaction>();
// static methods use ThreadLocal XmlTransaction
private static void createTransaction() {/* etc. */}
private static void commitTransaction() {/* etc. */}
private static void abortTransaction() {/* etc. */}
public static <T> T wrapWithDeadlockRetry(
SleepycatXmlTransactionWrapper<T> txw) throws Exception {
T returnValue = null;
for(int i = 0; i < DEADLOCK_RETRIES; i++) {
try {
createTransaction();
// Do the work, return whatever's needed
returnValue = txw.run();
commitTransaction();
return returnValue;
}
catch(Exception e) {
// Abort, but continue in the loop if our
// transaction was deadlocked
abortTransaction();
if(isDeadlock(e)) {
Loggers.XML_RUNTIME.info("Transaction deadlock " + (i + 1));
// Give other threads a chance
Thread.yield();
continue;
}
else
throw e;
}
}
// We're out of retries - too bad
throw new OutOfRetriesException("Transaction failed after " +
DEADLOCK_RETRIES + " retries");
}
}
SleepycatXmlTransactionWrapper
and passing to the static SleepycatXmlTransaction.wrapWithDeadlockTransaction(...) method:
public String myMethod(final String bar) throws SomeException {
String result = null;
try {
result = SleepycatXmlTransaction.wrapWithDeadlockRetry(
new SleepycatXmlTransactionWrapper<String>() {
public String run() throws Exception {
//...all your code...
}
});
}
catch(Exception any) {
}
return result;
}
There are various methods of acheiving this:
DBXML_ALLOW_EXTERNAL_ACCESS
flag to the XmlManager
when you create it. Then reference "file://
" or "http://
" URLs when using the XQuery function fn:doc().XmlDocument
object using the XML that you wish to query. Then create an XmlValue
from that XmlDocument
, and specify this value as the contextItem
parameter to XmlQueryExpression::execute()
when you execute your query: XmlDocument doc = manager.createDocument();
doc.setContent(myXmlContent);
XmlValue val(doc);
expression.execute(val, queryContext);
This allows you to reference your XML document as the context item (".
") in your query.
XmlResolver
(in C++, Java and Python), and resolve any URI to the correct location (on disk, in memory, etc.). This is acheived by registering your custom XmlResover
with the XmlManager
object.An XQuery module import statement looks something like this: import module namespace tm='test-module' at 'test-module.xq';
In BDB XML the default module resolution treats the "test-module.xq
" as a path name in the file system. For example, the above statement would look for the file, test-module.xq
, in the current directory. The resolution also pays attention to the base URI set in the XmlQueryContext
object used for the query. For example, if the base URI is "file://tmp/
" the resolution will look for the file "/tmp/test-module.xq
" Yet another way to handle module import is to implement your own instance of the XmlResolver
class, and register it using the method XmlManager::registerResolver()
Module imports will call the XmlResolver::resolveEntity()
method. This allows you to entirely control the location of modules, and place modules in the file system or in a Berkeley DB database or in BDB XML metadata item, or even construct them in code.
class testResolver extends XmlResolver {
public testResolver() throws XmlException {}
public boolean resolveDocument(XmlTransaction txn, XmlManager mgr,
String uri, XmlValue val)
throws XmlException {
return false;
}
public boolean resolveCollection(XmlTransaction txn, XmlManager mgr,
String uri, XmlResults res)
throws XmlException {
return false;
}
public XmlInputStream resolveSchema(XmlTransaction txn, XmlManager mgr,
String location, String nameSpace)
throws XmlException {
return null;
}
public XmlInputStream resolveEntity(XmlTransaction txn, XmlManager mgr,
String systemId, String publicId)
throws XmlException {
return null;
}
public boolean resolveModuleLocation(XmlTransaction txn, XmlManager mgr,
String nameSpace, XmlResults result)
throws XmlException {
return false;
}
public XmlInputStream resolveModule(XmlTransaction txn, XmlManager mgr,
String moduleLocation, String nameSpace)
throws XmlException {
return null;
}
}
If you want to return a specific XQuery module in your resolveModule()
method, you can create an XmlInputStream
object from a file or even from a java.io.InputStream
object using XmlManager.createInputStream()
The C++ code to do the same thing is similar.
Many people use this syntax: /foo/bar/text()
. In the majority of cases, this is incorrect! The explanation follows. Consider this document:
<foo>
<bar>hello <baz>you</baz>there</bar>
</foo>
In XQuery, text()
is a text node test. This means that in the example /foo/bar/text()
, text()
is short for child::text()
, which tells you a little more about what it does. This expression returns all children of the current context that are text nodes. So the in this example, the aforementioned expression will return two text nodes, one with the value "hello ", and the second with the value "there". What's important to note here is that not only are you getting text nodes returned, rather than a string - but that the text nodes' combined value does not equal the value of the bar element! The XQuery specification defines the string value of this element as "hello you there". In other words the concatenation of all the values of the descendant text nodes. Another important issue is that attribute nodes don't even have any text node children. So if you wrote /foo/@bar/text()
expecting to get the attributes value, you might be very surprised when the query engine quite rightly returned an empty sequence. Thirdly, BDB XML's query planner is not going to optimize any use of text()
. It can't, as the BDB XML indexes deal with the value of elements and attributes, not their text node children. So you will lose out on valuable optimization if you use text()
. Enough of why it's wrong. How do you get the value of a node? Here are some methods, and the differences between them:
fn:string()
function. This returns the string value of the node - so no schema type information.fn:data()
function. This returns a typed value from the node - in other words, if there is a schema for the document you will get a value of the type the schema says it should be. If there isn't a schema, you will get a value of type xdt:untypedAtomic
./foo/bar
cast as xs:decimal
or xs:date(/foo/bar)
. This can be used to get a value of a specific type.Whatever you do, try to get out of the habit of using text()
unless you know precisely what you want from it.
There are two ways to do this:
doc()
function, with the "dbxml:
" URI scheme. The dbxml URI scheme uses the format "dbxml:/<container_name_or_alias>(/<document_name>)?
", and would typically be used like this: doc("dbxml:/myContainer/myDocument")
Note that the default base URI is "dbxml:/
", which means that you can often leave off the scheme specifier in the URI: doc("myContainer/myDocument")
dbxml:metadata()
function, as it is stored as document metadata, ie: for $a in collection() return dbxml:metadata("dbxml:name", $a)
Document metadata is available in a query by using the dbxml:metadata()
function. This function takes arguments of a the name of the metadata as a string, and an optional argument of a node from the document to look metadata up for. If the second argument is not specified, the context item is used instead. The dbxml:metadata()
function returns the typed metadata from the document, by treating the first argument as a QName and resolving it's prefix as a URI. Some examples of usage: Return the names of all documents in the default collection:
for $a in collection() return dbxml:metadata("dbxml:name", $a)
Return documents with {http://timstamp.org}timestamp
metadata less than 5:
declare namespace ts = "http://timestamp.org";
collection()[dbxml:metadata("ts:timestamp") < 5]
There are a couple of ways to create an xs:QName item in a query:
xs:QName()
constructor function to create a QName in your query. In this case you will need to make sure you have bound a namespace URI for the prefix that you use: declare namespace foo="http://foo"; xs:QName("foo:bar")
xs:QName()
function that takes two arguments. This allows you to specify the namespace URI for the QName, as well as the prefix and localname: xs:QName("http://foo", "foo:bar")
There are several methods of doing this:
XmlDocument::setContent()
, or XmlDocument::setContentAsXmlInputStream()
. Then put the changes back into the XmlContainer
using XmlContainer::updateDocument()
.XmlModify
class can also be used to specify updates, but XQuery Update should be used in preference to it.No. The XUpdate specification is not complete, and does not include support for important concepts upon which BDB XML relies, such as transactions. It has also remained unchanged and in working draft since September 14, 2000. From version 2.4, DB XML has supported the W3C's XQuery Update language, which is far more suitable for updating persistant XML documents.
This error usually means that the value you have selected to perform a modification on does not come from the exact same XmlDocument
as the one given to XmlModify::execute()
. This is based on reference equality, rather than the same document obtained from the database in two different ways. For instance, this will give you an error:
modify.addRemoveStep(manager.prepare("doc('container/foo')//bar", qc));
XmlDocument foo = container.getDocument("foo");
modify.execute(XmlValue(foo), qc, uc);
where as this will not:
modify.addRemoveStep(manager.prepare(".//bar", qc));
XmlDocument foo = container.getDocument("foo");
modify.execute(XmlValue(foo), qc, uc);
The simple rule of thumb should be that the modify step's query should always be relative - that is, it should navigate from the context item ("."), rather than using the fn:doc()
or fn:collection()
.
You can use a useful program called rlwrap to do this. This will give you command line history, and bash style command line editing. Use it like this:
rlwrap dbxml [command line args here]
Consider a document set marked up using a complex DTD with deep nesting (14 or more levels deep in some cases). A typical "simple" search from a user's perspective is: Find me the word FOO anywhere in the corpus. The most intuitive query for this is: collection()//*[contains(., 'FOO')]
Is there an index type that optimizes this case? BDB XML can not currently optimise a query of the sort above, since it only indexes values for named nodes. One way to acheive this is to add a substring index on the document element, and write a composite query like this:
collection()/docElem[contains(., 'FOO')]//*[contains(., 'FOO')]
BDB XML provides methods to directly access indexes using the object, XmlIndexLookup
and its methods. Using an XmlIndexLookup
object, it is possible to:
Here's a simple equality lookup example in C++:
// assume cont and mgr are open container and manager...
// perform lookup in a decimal equality index on the "age" element
// look for all entries with age equal to 27. The operation
// is left off -- it defaults to equality if a value is provided.
XmlIndexLookup il = mgr.createIndexLookup(cont, "", age,
"node-element-equality-decimal",
XmlValue(XmlValue::DECIMAL, 27));
// the omnipresent query context
XmlQueryContext qc = mgr.createQueryContext();
// execute the lookup
XmlResults res = il.execute(qc);
// handle results ...
Here's a more complex example, which uses the same index as above (decimal index on "age"), but finds all entries greater than 12 and less than or equal to 35, and returns them in reverse sort order (highest first):
XmlQueryContext qc = mgr.createQueryContext();
// create the basic XmlIndexLookup object, which sets
// the lower-bound operation and value (gt, 12)
XmlIndexLookup il = mgr.createIndexLookup(cont, "", age,
"node-element-equality-decimal",
XmlValue(XmlValue::DECIMAL, 12),
XmlIndexLookup::GT);
// now, set the upper bound and operation (35, LTE)
il.setHighBound(XmlValue(XmlValue::DECIMAL, 35),
XmlIndexLookup::LTE);
// perform the operation, setting the reverse order flag
XmlResults res = il.execute(qc, DBXML_REVERSE_ORDER);
// handle results ...
This interface is useful for sophisticated applications that can benefit from direct index lookup, with or without also using the XQuery interface.
In Berkeley DB XML it is best to specify indexes on a container before documents are inserted. When indexes in a populated container are changed, all documents in that container must be reindexed, and this can take a long time. If indexes must be changed after the fact, it is best to add/remove several at once, to amortize the cost of the reindex. This means using code that looks like this:
// get the index specification
XmlIndexSpecification ispec = container.getIndexSpecification();
// modify the index specification, as desired ...
// reset the index specification
container.setIndexSpecification(ispec, ...);
It is very simple to use the dbxml
shell program to add or remove a single index, but because the shell can only manipulate one index at a time, using the shell to manage indexes on a populated container may be less efficient than using code.
After you install the Java API and create a Java project in Eclipse you must add db.jar and dbxml.jar to the Build Path of your project, which is done as follows:
db.jar
and dbxml.jar
, which should be located in the folder (dbxml-root)/jar
.To do this requires a source installation of BDB XML, Windows, and Visual Studio. The description below specifically uses Visual Studios 2005 Express Edition. There is a write up and video of this process at this blog , but you have to do some extra steps to get the process to work with BDB XML. The first step to getting Visual Studio and Eclipse to run together is to build the C++ libraries with the correct arguments. Set the build mode to Debug for the entire Solution. Then right click the dbxml_java project and select Properties from the menu. In the Properties window select Linker->Debugging, then set Generate Map File to Yes(/MAP) and Map Exports to Yes(/MAPINFO:EXPORTS). Repeat this process for the projects XercesLib, xqilla, dbxml, and db_java. Then, rebuild all of these libraries. Next you need to remove all release .dll files for the BDB XML libraries from your computer's PATH (for example, you would want to delete libdbxml24.dll, but keep libdbxml24d.dll). For example, I keep my release .dll files in dbxml/bin, so I moved all those files into a folder named HideDll. Remember, everytime you build the dbxml project it copies the release .dll files back into the dbxml/bin directory. If release .dll files are left in the Path then you will get illegal memory access exceptions, and execeptions for deleting illegal memory in the heap. This is because both release and debug libraries have been loaded, and they treat memory in different ways that cannot be combined. To be safe, you might want to delete the dbxml/build_windows/Release directory, that directory keeps copies off all the release libraries. Now you want to set Visual Studio up to run your Java programs as a server, this is how you do it. First, right click the dbxml_java project and select Properties from the menu. On the Properties window select Configuration Properties -> Debugging. Set Command to the path of your java executable, for example I set it to C:\jdk1.5.0\bin\java.exe. Then put the following in the Command Arguments section:
-Xmx400m -Xms400m -Xdebug -Xnoagent \
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 \
-classpath <paths-to-jar-files> <java-program-to-run>
For example, to run the test program AutoOpenTest.java
on my computer I put:
-Xmx400m -Xms400m -Xdebug -Xnoagent \
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 \
-classpath C:\Sleepycat\jar\junit-4.4.jar;C:\Sleepycat\jar\dbxmltest.jar;\
C:\Sleepycat\jar\dbxml.jar;C:\Sleepycat\jar\db.jar \
dbxmltest.XmlTestRunner
Then set Working Directory to the directory you want as a working directory. Hit OK to save these settings. Then right click on dbxml_java and select Debug->Start New Instance. This will start your java program running as a server, but the program will wait for Eclipse to attach to it before it begins executing. Next go to Eclipse and set whatever breakpoints you want in your java program. Then go to Run->Open Debug Dialog. Select Remote Java Application and select to create a New one. Set Connection Type to Standard(Socket Attach), Host to localhost, and Port to 8000. Next hit Debug to to attach the debugger to the waiting Java server. Now the program should stop in Eclipse if you set any breakpoints in the Java, and it should stop in Visual Studio if you set any breakpoints in C++. Unfortunatlly you cannot step directly from Java to C++, or from C++ to Java, but if you want to enter XmlContainer_putDocument__SWIG_1
from Java, then just go to dbxml_java_wrap.cpp
in Visual Studio and set a breakpoint in the function Java_com_sleepycat_dbxml_dbxml_1javaJNI_XmlContainer_1putDocument_1_1SWIG_10
(or whatever function SWIG made to answer a call to XmlContainer_putDocument__SWIG_1
). If when you start Eclipse you get an error message like Cannot Find file MSVCP80D.dll
or something similar, this is not because Windows cannot find that file, but has to do with how .dll files are configured. In this case try deleting your dbxml/bin/debug
directory and your dbxml/build_windows/Win32/Debug
directory and rebuilding so that the .dll files can be built and configured correctly.