By Re Lai
December 2012
Against the backdrop of big data and cloud computing, NoSQL databases have attracted much attention. While SQL and RDBMS remain the mainstay of enterprise storage, NoSQL databases have become an increasingly important tool. This is sometimes referred to as polyglot persistence.
The recent launch of Oracle NoSQL Database has further spurred interest and excitement. Oracle NoSQL Database is a horizontally scalable key-value database. Built by the acclaimed Berkley DB team, it features excellent performance, tunable consistency, integration with Hadoop, with a simple but powerful client API.
This article explores application development using Oracle NoSQL Database. As application developers of our time cut our teeth on SQL-based RDBMS, building a well-designed enterprise application using NoSQL databases represents a new challenge.
For illustrative purposes, this article also presents Kvitter, yet another Twitter-like microblog sample application. Twitter-clone has been a favorite theme to create NoSQL samples. Our sample application was build with two goals in mind: first to showcase Oracle NoSQL Database, second to build the application using concepts familiar to most Java enterprise developers: JavaServer Faces (JSF) 2.0, Java Contextual and dependency injection ( CDI ), and Java enterprise design patterns.
Kvitter is a Twitter-like microblog sample application. A user is uniquely identified by his/her user name. A user signs in using his/her password. A blog is uniquely identified by the blog Id. A blog is created by a user. A user can follow other users.
The application supports the following queries of blogs:
The sample application is developed on Oracle NoSQL Database Community Edition 1.2.123. Follow the official Quick Start Guide to install and start KVLight, a single-process version for developers. If you run KVLight using a custom configuration other than the default, modify the parameters here.
NetBeans is used as the IDE. Both NetBeans Java EE 7.1 and 7.2 are verified to work with the sample. Make sure to install the GlassFish server bundled with the installation. After launching NetBeans, add a GlassFish server instance to the IDE.
Next, to refer to the Oracle NoSQL Database installation inside NetBeans, open Tools > Variable (or Ant Variables in NetBeans 7.2). Add a variable name KVHOME, pointing to the root location of the Oracle NoSQL Database installation.
Figure 1 : Manage Variables.
The sample code is attached as two NetBeans projects: kvitterService and kvitterWeb. After unzipping the projects, open them from File > Open Project. Project kvitterService is a model project. You can right-click it in Project Navigator and execute Test
to run Junit tests. The tests under sample
package contain the example code used throughout this article. Project kvitterWeb is a JSF web application. You can right-click it to run the application.
Oracle NoSQL Database stores data as key-value pairs. Keys consist of a list of Java Strings, grouping into two parts: major and minor components. A key must have at least one major component. Values, on the other hand, are simply stored as arrays of bytes opaquely. Conversion between bytes and Java objects is handled by clients.
To help visualize the data modeling, this article adopts and extends the file system path metaphor used in the official Getting Started Guide. It designates key-value pairs as the following example:
<code> /Majorkey1/Majorkey2/-/MinorKey1/MinorKey2: $Valuefield1 $Valuefield2</code>
A forward slash(/
) delimits key components. A slash-dash-slash(/-/
) separates the major key path from the minor one. We further use a colon(:
) to divide the key components from the value. Key components and value fields can be either String literals or variables. A variable is designated by a preceding dollar sign ($
), otherwise a String literal is implied.
Under this representation, Login and Follower can be expressed as:
/Login/$userName: $password
/Follower/$blogger/-/$follower
Login has only the major key components: String literal Login
, serving more or less as a tag or classifier, and variable $userName. The value part contains only one field: $password.
Follower consists of both major and minor keys. The major keys are String literal tag Follower
and variable $blogger (blogger user name). The minor key is variable $follower (follower user name). Follower does not have meaningful value parts, which will be discussed later.
UML class diagrams can also be used to represent the data modeling. In particular, attribute prototypes are used to designate major and minor key components. For example, we can represent Login and Follower as
Figure 2: Login & Follower.
A word of caution is that key-value NoSQL databases are inherently schema-less. As a result, the schema is more logical than physical. This is particularly true for the value part. How multiple fields are stored into a single value byte array depends on the serialization scheme, which is not captured here. Still, these representations are helpful for visualization and communication of data modeling, and will used extensively in this article.
Clients connect to Oracle NoSQL Database by creating an instance of KVStore. This can be done as follows:
Java Code Snippet (demo.kvitter.applicationService.DataStoreFactory)
String storeName = "kvstore";
String hostName = "localhost";
String hostPort = "5000";
KVStoreConfig config = new KVStoreConfig(storeName,
hostName + ":" + hostPort);
KVStore kvstore = KVStoreFactory.getStore(config);
The setting uses the default configuration of KVLite.
KVStore is to Oracle NoSQL Database as JDBC is to RDBMS. It is also worthwhile to note that KVStore is thread-safe. A single instance of KVStore, therefore, can serve multiple web sessions. This simplifies resource management.
Create, read, update and delete (CRUD) operations are fully supported. The following sample code shows the operations on Login:
// Login modeled as /Login/$userName: $password
final String userName = "Jasper";
final String password = "welcome";
// Create a login for Jasper
List<String> majorPath = Arrays.asList("Login", userName);
Key key = Key.createKey(majorPath);
byte[] bytes = password.getBytes();
Value value = Value.createValue(bytes);
kvStore.putIfAbsent(key, value);
// Read
ValueVersion vv2 = kvStore.get(key);
Value value2 = vv2.getValue();
byte[] bytes2 = value2.getValue();
String password2 = new String(bytes);
// Update
Value value3 = Value.createValue("welcome3".getBytes());
kvStore.put(key, value3);
// Delete
kvStore.delete(key);
For more details, refer to the JavaDoc and Getting Started Guide.
A quite appealing feature of Oracle NoSQL Database is composite keys. It frees us from resorting to String concatenation to create compound keys. More importantly, it turns out to be a versatile modeling tool.
First, distribution of data across multiple partitions, or sharding, is based on the hash of the major key components. This offers us a simple approach to control data locality. Items of the same major key path are guaranteed to be stored in the same partition. Examine the following two ways to model Blog and Follower:
/Blog/$blogId: $blogger $content $blogTime
/Follower/$blogger/-/$follower
Every Blog has a unique major key. In this way, blogs are distributed across multiple partitions. This is reasonable since blogs constitute the bulk of data in our data store. Attempts to store all blogs in a single partition would certainly overwhelm it. On the other hand, the list of follower names of a given blogger is guaranteed to store in a same partition. We therefore can retrieve all followers of a blogger quickly.
Second, KVStore provides a number of ways to query data based on partial key match:
Last, a sequence of write operations can be applied as a single atomic unit if all records share the same major key path.
Composite keys serve as an important modeling facility. They are intuitively simple and powerful in the time, and can find many good uses in data modeling.
Values are stored as bytes in Oracle NoSQL Database. A large number of simple Java types support conversation to and from byte arrays. More complicated types need to be managed at the client side. The following is sample code on how to convert a blog into a value object:
final String blogger = "Jasper";
final String blogContent = "Hello World!";
final Date blogTime = Calendar.getInstance().getTime();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
dos.writeUTF(blogger);
dos.writeUTF(blogContent);
dos.writeLong(blogTime.getTime());
byte[] bytes = baos.toByteArray();
Value value = Value.createValue(bytes);
Reversely, to convert a byte array to a blog
ValueVersion valueVersion = kvStore.get(key);
byte[] bytes = valueVersion.getValue().getValue();
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
DataInputStream dis = new DataInputStream(bais);
String blogger = dis.readUTF();
String blogContent = dis.readUTF();
Date blogTime = new Date();
blogTime.setTime(dis.readLong());
Note that we do not use Java serialization, the default mechanism to serialize objects in Java. While simple and straightforward to use, Java serialization is not optimized for compactness and performance. For example, the class name is stored along with the data, which would be duplicated each time an entry is persisted. Consequently, Java serialization is generally not used in massive storage. For this simple example, we handcraft the conversion based on byte array streams. For more extensive or complicated usages, you may consider a formal serialization framework such as Apache Avro and Kryo.
Entities can generally be modeled in two ways in Oracle NoSQL Database: structured values and name-value pairs.
In the structured value approach, a key-value pair resembles a record in RDBMS, with the key representing the primary key, and the value the serialization of attributes of the record. The Blog entity we saw in the previous section is a good example. There is an implicit structure of the stored value, which has to be followed in both writing and reading operations.
Alternatively, taking advantage of the key-value store, we can simply save the data as multiple name-value pairs. For example, we can model the UserProfile entity as
/UserProfile/$userName/-/$profileName: $profileValue
A user profile is stored in multiple records as name-value pairs. As profiles of a given user have the same major key path, they can be queried or iterated through easily, and can even be updated atomically by batched operations. The following is a sample code to update the profile of user Jason in a single atomic transaction:
// User Profile as a Map
Map<String, String> profile = new HashMap<String, String>();
profile.put("Gender", "Male");
profile.put("Hobbies", "Hiking");
profile.put("Profession", "Engineer");
// Create a batch of operations
List<Operation> batch = new LinkedList<Operation>();
List<String> majorPath = Arrays.asList("UserProfile", "Jasper");
for (Map.Entry<String, String> entry : profile.entrySet()){
Key key = Key.createKey(majorPath, entry.getKey());
Value value = Value.createValue(entry.getValue().getBytes());
Operation op = kvStore.getOperationFactory().createPut(key, value);
batch.add(op);
}
// Execute the operation batch
kvStore.execute(batch);
To retrieve the user profile,
List<String> majorPath = Arrays.asList("UserProfile", "Jasper");
Key matchKey = Key.createKey(majorPath);
Map<Key,Valueversion> resultMap;
resultMap = kvStore.multiGet(matchKey, null, null);
Entities in general can be modeled in either way. Structured storage is favored when the structure is static and the attributes tend to be accessed together. On the other hand, the name-value approach should be considered if the structure is dynamic, or if the attributes are generally accessed separately.
The two approaches do not need to be mutually exclusive. They can be used to complement each other. For example, we can store a core portion of user profiles by structured storage, and ad-hoc or dynamic part of profiles as name-value pairs.
Relational databases (RDBMS) rely on indexes to speed up queries. For example, to expedite the retrieval of the latest ten blogs, RDBMS creates a composite index to sort the blog IDs by time. This is sometimes referred to as a secondary index, as opposite to the primary key.
NoSQL databases, including Oracle NoSQL Database, in general do not support secondary indexes. The task is instead shifted to clients. Fortunately it is straightforward to model an index by composite keys. In our example of Blog, whenever a Blog item is inserted, we also insert a record into Timeline, a time-sorted index of blog IDs.
// Timeline modeled as /Timeline/-/$blogTime/$blogId
String majorPath = "Timeline";
String time = Long.toHexString(blogTime.getTime());
List<String> minorPath = Arrays.asList(time, blogId);
Key key = Key.createKey(majorPath, minorPath);
// Empty value
Value value = Value.createValue(new byte[0]);
kvStore.putIfAbsent(key, value);
As the value part is really of no interest to us, we put an empty array into it.
To retrieve the timeline, we use query APIs based on partial key match. Unlike normal key-value pairs, Timeline is entirely stored on keys, and can be retrieved by three flavors of API from KVStore: multiGetKeys, multiGetKeysIterator or storeKeysIterator. The following shows how to get a reversely ordered iterator:
Key matchKey = Key.createKey("Timeline");
KeyRange subRange = null;
Iterator<Key> it = kvStore.multiGetKeysIterator(Direction.REVERSE,
0, matchKey, subRange, null);
Note that once we retrieve a primary key from an index, we need to issue another query to fetch the object based on the primary key. In other words, join is done at the client side, in contrast to RDBMS. This is understandably implied by the premise of NoSQL.
Multi-values are everywhere. For example, a blogger can have many blogs, followed by many followers. They manifest an underneath relationship of multiplicity: one-to-many and many-to-many relationship. Most of us first learn relationships from SQL, but the concept itself is universal.
Handling multi-values in Oracle NoSQL Database is not too much different from modeling entities. For simple cases, structured storage can be used. A collection object can be serialized into a byte array and stored into a single value field. In more fluid scenarios, name-value pairs are preferred. For example, followers of bloggers can be modeled as,
/Follower/$blogger/-/$follower
As Follower relationships are entirely defined on the keys, they are in essence indexes and can be easily retrieved by key multi-get APIs mentioned before. Modeling this way also allows a follower to be added, removed or queried without affecting other follower records. For example,
// Create: Jerry is a follower of Jason.
List<String> majorPath = Arrays.asList("Follower", "Jason");
String minorPath = "Jerry";
Key key = Key.createKey(majorPath, minorPath);
Value value = Value.createValue(new byte[0]);
kvStore.putIfAbsent(key, value);
//Read: is Jerry a follower of Jason?
boolean isFollower = (null != kvStore.get(key));
//Delete: unfollow
kvStore.delete(key);
Evolution of schemas in application development is a fact of life, thanks to requirement changes, enhancements, or bugs fixing. NoSQL databases are intrinsically schema-less, which makes it relatively easier to support dynamic schema.
If we store data as name-value pairs, enhancing the schema is fairly simple. To accommodate a new attribute, we simply add a new record of name-value pairs.
If we opt to use structured storage, it is still relatively easy to evolve schemas. Use Blog we worked on before as an example to add a new optional attribute. After deploying the application, we decide to introduce a new boolean attribute called isPrivate, to designate whether a blog is private or not. This field will be available to new blogs, but does not exist for all previous entries. To make the read backward compatible, add the following code after the Blog reading snippet in Section Handle Values:
// dynamically read from DataInputStream
boolean isPrivate = false;
if (dis.available() > 0) {
isPrivate = dis.readBoolean();
}
For more complicated scenarios, we can make the schema version numbers as part of the storage, and handle multiple versions in our code
We have also seen how to model indexes in Oracle NoSQL Databas. Its flexibility allows us to add or remove indexes easily, without having to lock a table. Bret Taylor presents an interesting case study on how schema-less indexes benefited FriendFeed, albeit on schema-less MySql.
Now it is time to put everything together. The following is the schema of the Kvitter application:
Figure 3: Kvitter Schema.
Userline, UserBlog and Timeline are in essence the secondary indexes on Blog. Follower and Following are many-to-many relationships between Logins.
For brevity, the functionality is kept to minimum. Beloved features like retweet, reply, mention and hashtag are not included. In addition, a login (user) is solely identified by the user name, and, the name, therefore, cannot be changed.
Java enterprise design patterns are mostly perfected from development works done on SQL-based RDBMS. A main theme of these patterns is to achieve database agnostics by proper encapsulation, which, interestingly, lends them well to NoSQL databases.
In Kvitter, Data access objects (DAO) are internally used to abstract access to the persistent store. Two application services, UserService and BlogService, further centralize the business logics. Clients interact with the two business services and are, therefore, completely unaware of the data store.
Figure 4: DAO.
Note that while we have Follower and Following, there is only a single FollowDao class. This is because Follower and Following are just mirrors of each other, and single class can be used to handle both. In fact, it will not be too hard to generalize FollowDao to handle generic M:M relationships.
One technique that is not used in this paper but worth mentioning is object-relationship mapping (ORM). In NoSQL databases, this probably should be referred to as object-persistency mapping instead. It is one area that has seen much progress. EclipseLink recently adds support to JPA access to NoSQL databases, including Oracle NoSQL. DataNucleus is another popular open source data access platform, which can potentially be leveraged to implement a custom JPA mapping.
The Kvitter web application is built on JSF 2.0. Kvitter uses Facelets templating to define the page layout in /templates/main.xhtml. A number of custom tags are created under /resources/kvitter.
Kvitter also leverages the integration of JSF with CDI. CDI provides a powerful dependency inject standard. In Kvitter, ServiceProducer emits application services: KVStore, UserService and BlogService. The database query data are produced by DataProducer.
The following is a sample screen shot of the user line page:
Figure 5: KVitter Example.
This article explores how to build a Java enterprise web application on Oracle NoSQL Database. It surveys major features and data modeling approaches in Oracle NoSQL Database. Kvitter, a JSF sample Twitter-clone application, is also presented.