Overview of TopLink Caching and Locking

By Gordon Yorke and Darren Melanson

Database calls are one of the most expensive operations executed by J2EE applications and inefficient database access inevitably results in poor performance. Caching is a very powerful mechanism that can optimize performance by reducing calls to the database. However there's a fine balance between the performance benefits of caching and the consequences of having stale data.

TopLink provides several caching options that are configurable at a class-level to provide maximum flexibility. It also provides rich locking and refreshing mechanisms to address data integrity while leveraging the performance benefits of caching.

Configuring your cache to optimize performance and manage stale state involves addressing concurrency protection using locking, appropriate cache configuration, and selective refreshing.

Concurrency Protection Using Locking

Any time multiple clients, or applications, are reading and writing to the same database, stale data is an issue. Caching can increase the likelihood of stale data but TopLink provides several locking options to manage concurrency. Locking prevents updating an object based on stale data. In an application where concurrent modification of data is possible a locking strategy is essential.

Locking - Pessimistically

Pessimistic locking is the most restrictive form of locking but guarantees no changes are performed on the data during your transaction. The database physically locks the row upon a select (SELECT . . . FOR UPDATE [NOWAIT]) and prevents others from altering that row.

This reassurance comes at a cost as pessimistic locks cannot be held across transactions and only one user can access the underlying data. Pessimistic locking should be used carefully as it limits concurrent access to the data and may cause deadlocks. In TopLink the following API is used to acquire a pessimistic lock within a transaction:

unitOfWork.refreshAndLockObject(Object objectToLock, short lockMode)

Lock mode is one of the following:

ObjectLevelReadQuery.NO_LOCK, LOCK or LOCK_NOWAIT

Locking - Optimistically

Optimistic locking permits all users to read and attempt to update the same data, concurrently. It does not prevent others from changing the same data, but it does guarantee the database will not be updated based on stale data.

During an update attempt, optimistic locking strategies detect if any changes have been made since the data was read and if so the update fails and an exception is returned. The client application can then determine how to address the conflict according to its business rules.

Optimistic locking is implemented at the database and DBAs have many different designs and strategies. TopLink provides complete flexibility for application developers by supporting multiple options for optimistic locking, which are all easily configured in the TopLink Workbench or JDeveloper.

Locking Policy When to Use How it Works

Version

When dedicated numeric field is available on table

Versions are compared and incremented on successful modification.

Timestamp

When dedicated timestamp field is available on table

Versions are compared and set to current time (of JVM or database) on successful modification.

All Fields

When a version field is not an option and you have great variety in what fields are changed by the application.

Compares if any fields have changed.

Changed Fields

When a version field is not an option and the application typically changes the same fields

Compares if any of the modified fields have been changed.

Selected Fields

When a version field is not an option and a specific set of fields optimizes the locking comparison.

Compares if any of the specified fields have changed

Version/Timestamp Locking

The version or the timestamp locking policies are used when a dedicated version field exists in the database table. When an object is updated, the policy will write the new version (i.e. from 5 to 6 or a fresh timestamp) to the database and compare the old version to the one in the table. A mismatch of version numbers is an obvious locking conflict and TopLink will throw an OptimisticLockException. This exception should be caught when committing a UnitOfWork.

These policies allow the version to be stored in the cache or mapped in the object. It is recommended that the version be stored in the object if the object is to be serialized to another tier or presented in a disconnected client, such as a browser. Keeping the version number with the data of the object will allow for a stateless application to function across multiple server instances while obeying the version locking policy of the database.

Even if pessimistic locking is being used it is strongly recommended that an optimistic locking policy be used as well. This allows for situations where not all application use cases require pessimistic locking. It also addresses the case where a disconnected client operation spans the life cycle of the pessimistic lock.

Through the use of API on the UnitOfWork these locking policies can verify version on associated objects as well.

unitOfWork.forceUpdateToVersionField(Object myObj, Boolean updateVersion)

Non-Version Locking

In some databases no dedicated version field exists nor can be added. In these situations the AllFieldsLockingPolicy, SelectedFieldsLockingPolicy or ChangedFieldsLockingPolicy are used.

The AllFieldLockingPolicy sends all of the mapped fields for every update or delete to the database for verification; potentially an enormous amount of data.

The ChangedFieldsLockingPolicy sends only the fields for the attributes that were changed. The ChangedFieldsLockingPolicy does not protect from other clients that may have changed other fields.

The SelectedFieldsLockingPolicy sends of the explicitly specified fields for verification.

None of these policies provide support for verifying version when relationships change.

Cache Configuration

TopLink's cache choices leverage Java's built in garbage collection and object reference types, with each cache option type utilizing a particular Java reference type. Caching is configured at the class level in TopLink, this allows developers granular control based on the type of data encapsulated by each class and not on the application as a whole. When choosing cache types and sizes developers need to consider the data usage. Factors to consider are:

  • Volatility
  • Volume
  • Amount of sharing between clients
  • Application lifecycle
  • Relationships between objects

A potential drawback of caching is overloading the middle-tier. It is important to choose the right type of cache and where applicable, to set the appropriate target size.

Cache Type Usage Size and Growth

Soft-Cache-Weak

Hard-Cache-Weak

(default)

Read-mostly

Shared

These cache types hold up to the provided size as the quantity of cached objects in soft/hard references based on most recently used. All others will be pushed into weak references for garbage collection when they are no longer references by the application.

The hard reference option is provided for JVM's where soft references are collected too aggressively.

Weak

Write-mostly

Minimal Sharing

Holds all objects currently in use in the application relying on JVM garbage collection to remove cached objects held by weak references when they are no longer being used.

Full

Read-only/mostly

Shared

Contains all objects read. The size determines the initial size of the identity map and thus the hashing efficiency.

None

Read-Only

Caches no objects and does not maintain object identity. Should only be used for unrelated, highly volatile objects.

Using no identity map will not eliminate caching issues. No identity map eliminates the ability to manage identity and resolve relationships. If modifiable objects change frequently outside of the application's control it is best to use the weak identity map and possibly object refreshing as well.

Example - Auction House

It is important to understand when and why each of the cache types should be used. To illustrate this lets consider an application like an auction house, containing the following objects:

  • Item - Thousands of items in the database viewed by many clients.
  • ItemCategory - Fixed number of categories to facilitate finding items
  • User - Lives for the life of the client's web session
  • Bid - Hundreds of thousands regularly created, deleted and updated.
  • ShippingAgent- Fixed and static -ie: UPS, FedEx, US Post Office

Item - Hard/SoftCacheWeakIdentityMap

Items may be accessed by numerous clients, so to increase a chance of a cache hit, SoftCacheWeakIdentityMaps are recommended. This keeps a minimum number of objects in the cache based on the configured size of the cache and the amount of free memory available.

Some VMs have overly aggressive garbage collectors and database hits may occur for things that seemly should be cached. In these cases, use HardCacheWeakIdentityMap that utilizes a stronger reference to ensure objects stay in memory.

ItemCategory - FullIdentityMap

The use of a FullIdentityMap allows for all read categories to be cached. A full cache is used since the quantity of categories is fixed and these are primarily read-only.

It is important to make sure that types like this do not have relationships to objects cached in weaker types. Object types stored in a FullIdentityMap cache holding references to types in a weak cache effectively make the weak cached objects held indefinitely.

User - WeakIdentityMap

This cache type uses weak references, which will be garbage collected (i.e. removed from the cache) once the application references have ended. It should be used when the classes are long lived in the application and not shared among numerous clients such as a User.

Bid - WeakIdentityMap

Weak caches are also appropriate for data that is regularly updated, have short lifecycles and not accessed repeatedly by numerous clients. Bids are unique to a client and new ones are created regularly.

ShippingAgent - FullIdentityMap

There are a fixed number of ShippingAgents and they rarely change. A FullIdentityMap is suitable for this type of static data. The ShippingAgent class represents data that does not change and is limited in volume so that it can be completely loaded into cache to avoid database access entirely.

Refreshing

To minimize the chance of using stale cached data, a refresh from the database can be forced.If a client already has a reference to an object and wants to immediately refresh the latest version from the database it can call:

session.refreshObject(myObj);

A TopLink query can be defined and configured to refresh the data. For example:

ReadObjectQuery roq = new ReadObjectQuery(Flight.class);

ExpressionBuilder builder = roq.getExpressionBuiler(); raq.setSelectionCriteria(builder.get("flightNumber").equal("UA 755"));

// Forces TopLink to refresh with data from database

roq.refreshIdentityMapResult();

The advantage of defining a query in this manner is that it can be re-used to refresh data on a regular interval.

Leveraging Optimistic Locking

If you are using version or timestamp optimistic locking it is possible to optimize refreshing. By enabling descriptor.onlyRefreshIfNewerVersion() the refresh operation will compare the version retrieved from the database and only if the database version is more recent will it update the cached object.

Cascading Refresh

Refreshing can also be cascaded to associated objects. There are various cascade options. This more advanced topic is not covered here but is included in the TopLink documentation.

Summary

Configuring an object-relational cache to work optimally with your application requires good knowledge of your application domain model and usage. As this is better understood the configuration of the cache should be adjusted. The proper configuration and support of locking, cache, and refreshing queries that have been discussed here are the essential elements needed to build efficient J2EE applications.

Beyond these capabilities TopLink also offers cache coordination where changes made in one node can be synchronized, replicated, or invalidated across multiple nodes of the same application forming a cluster or grid. This capability is intended to allow developers to expand caching benefits beyond a single node. This subject is the topic for another paper.

Author Bios

Gordon Yorke is a Principal Software Developer on the TopLink team.

Darren Melanson is a Technical Solutions Architect at Oracle Corporation