Java DB Table Functions

Rick Hillegas
November 2012

Abstract

This white paper explores table functions, a feature introduced in Java DB release 10.4.1.3. The original version of this paper appeared in 2008. This revised version includes new material on restricted table functions, a follow-on feature introduced in Java DB release 10.6.1.0.

Rich and expressive, SQL is the world's most popular database query language. Table functions let you take SQL outside its traditional home in the RDBMS, setting SQL loose on data in the wild.

Introduction
Simple Example
Fancier SQL
More Fancy SQL
Writing Our Own Table Function
Quick Review
Wrapping
Declaring
Invoking
Views
Restricted Table Functions
Summary

Introduction

Table functions are just what their name implies, viz., functions which return tabular data sets. These tabular data sets can then be queried just like ordinary tables, via the full, expressive power of SQL. For this reason, table functions are sometimes called virtual tables. The data returned by these special functions can come from anywhere:

files - files and web resources
collections - in-memory collections
foreign data - other databases, including non-relational sources
streams - transient information streams, including data feeds and device outputs

It's easy to write table functions. All you have to do is:

wrap – code a JDBC ResultSet to wrap your data
declare – register your ResultSet in Java DB
invoke – from then on treat your foreign data like a table in SQL queries

This white paper presents some sample table functions and shows you how to write your own.

The author would like to thank Michelle Caisse, Francois Orsini, and Dag Wanvik for their help in reviewing early drafts of this tutorial.

Simple Example

Suppose we wanted to import data directly from another database. We would start out by wrapping the foreign data in a public static method which returns a ResultSet. All of the hard work happens in the other database:

package com.acme.hrSchema;
import java.sql.*;
public class EmployeeTable
{
   public  static  ResultSet   read()
        throws SQLException
    {
        Connection          conn = DriverManager.getConnection
            ( "jdbc:mysql://localhost/hr?user=root&password=mysql-passwd" );
        PreparedStatement   ps = conn.prepareStatement( 
           "select * from hrSchema.EmployeeTable" );
        return ps.executeQuery();
    }
}

Next, we would put the compiled class on Java DB's classpath and declare a Java DB table function (here done through Java DB's ij query tool):

connect 'jdbc:derby:myDatabase;create=true';
create function employeeTable()
returns table
(
    employeeID int,
    firstName  varchar( 50 ),
    lastName   varchar( 50 ),
    birthday   date
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'com.acme.hrSchema.EmployeeTable.read';

Finally, we would use the foreign data in Java DB queries. For instance, here's how we would invoke the function to siphon the foreign data into a local Java DB table:

connect 'jdbc:derby:myDatabase;create=true';
insert into employee_table select * from table( employeeTable() ) s;

Let's recap what just happened here:

wrap – First we created a public static method which returns a JDBC ResultSet. A table function is just a public static method which returns java.sql.ResultSet.
declare – Next, we told Java DB where the table function lived.
invoke – From then on, we were able to use the foreign data just as though it were a local Java DB table. We did this by invoking our function inside a table constructor in the FROM list of our query.

Fancier SQL

Now let's try some fancier SQL. We'll use a table function supplied in the tar file that accompanies this white paper. For more information about this table function, please see the accompanying javadoc. This simple table function presents a Java properties file as a table with two columns: id and text. We're going to suppose that your Java application follows the common convention of storing its user-visible messages in properties files consisting of text lines which have the form 'messageID=messageText' and that you have separate localized versions of these files for each language that your application supports. For more information on this coding convention, see the PropertyResourceBundle javadoc. We're going to run a table function against the message files in your product and ask the question: “What recently added English text still needs to be translated into Spanish?”

Because we're using a table function which has already been written, we can skip the wrap step. That is, the accompanying jar file already supplies a public static method which returns the ResultSet we need. We just have to make sure that our classpath contains that jar file. We declare the table function:

connect 'jdbc:derby:myDatabase;create=true';
create function properties( fileName varchar( 32672 ) )
returns table
  (
     id   varchar( 50 ),
     text varchar( 1000 )
  )
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.PropertiesTableFunction.properties';

Then we invoke the table function. In order to answer our question, we look for message ids which appear in the English message file but not the Spanish message file:

connect 'jdbc:derby:myDatabase;create=true';
select id
from table( properties( '/MySourceCode/messages_en.properties' ) ) english
where id not in
(
   select id
   from  table( properties( '/MySourceCode/messages_es.properties' ) ) spanish
);

More Fancy SQL

Now let's try a grouped aggregate, one of the more powerful features of SQL. We'll ask the question: “What are the sizes of the packages in my jar file?” We're going to take advantage of the fact that a jar file is just a special kind of zip file. Again, we'll skip the wrap step because we will use a table function supplied in the accompanying tar file. This table function loops through the entries in a java.util.zip.ZipFile, treating each java.util.zip.ZipEntry as a separate row. First, we declare the table function:

connect 'jdbc:derby:myDatabase;create=true';
create function zipFile
( fileName varchar( 32672 ) )
returns table
(
    name               varchar( 100 ),
    directory          varchar( 200 ),
    comment            varchar( 1000 ),
    compressed_size    bigint,
    crc                bigint,
    size               bigint,
    modification_time  bigint
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.ZipFileTableFunction.zipFile';

Then we invoke the table function, summing up the sizes of all files per directory. This answers our question:

connect 'jdbc:derby:myDatabase;create=true';
select directory, sum( compressed_size ) package_size
  from table ( zipFile( '/MyApplication/lib/product.jar' ) ) s
  group by directory
  order by package_size desc;

Here are the first four result rows when we run this query against the derby.jar file supplied with Java DB release 10.9.1.0:

DIRECTORY	PACKAGE_SIZE
`org/apache/derby/impl/sql/compile`	583212
`org/apache/derby/impl/sql/execute`	277965
`org/apache/derby/impl/store/raw/data`	139014
`org/apache/derby/iapi/types`	128477

Writing Our Own Table Function

The last example used ZipFileTableFunction. That class extends EnumeratorTableFunction, another helper class in the accompanying tar file. EnumeratorTableFunction helps you present any Java collection as a table. Like all of the classes in the accompanying tar file, EnumeratorTableFunction is an implementation of ResultSet. You can extend EnumeratorTableFunction to make a table out of any of the following types of objects:

java.util.Enumeration
java.util.Iterator
java.lang.Iterable – including any kind of java.util.Collection.
arrays

EnumeratorTableFunction treats each Object in your collection as a row in the returned table. Your class which extends EnumeratorTableFunction just needs to supply the following:

constructor – The constructor declares the names of the columns in the returned ResultSet. The constructor is handed the Iterable that it will traverse.
entry point – This is the ResultSet-returning public static method which you register with Java DB. The entry point invokes the constructor and returns a new instance of your class.
row maker – The makeRow() method, which you implement, is handed an Object from the collection. This method turns the Object into an array of Strings, one for each exposed field in your Object. There should be one cell for each of the column names you supplied above. EnumeratorTableFunction's next() method will call your makeRow() method in order to turn an Object into the next row.

It is really quite simple. For instance, here's how easy it is to write a table function which wraps the collection of Locales supported by the VM. Note the call to setEnumeration() in the constructor: this is how we pass the collection of Locales to EnumeratorTableFunction's machinery. When the table function is invoked, EnumeratorTableFunction loops through the collection, calling our makeRow() method in order to turn each Object in the collection into a row:

import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Locale;
import oracle.javadb.vti.core.EnumeratorTableFunction;
public class LocaleTableFunction extends EnumeratorTableFunction
{
    public  LocaleTableFunction() throws SQLException
    {
        super( new String[] { "country", "language", "variant", 
               "country_code", "language_code", "variant_code" } );
        setEnumeration( Locale.getAvailableLocales() );
    }
    public  static  ResultSet   locales() throws SQLException { 
       return new LocaleTableFunction(); }

    public  String[]    makeRow( Object obj ) throws SQLException
    {
        int     col = 0;
        Locale  locale = (Locale) obj;
        String[] row = new String[ getColumnCount() ];

        row[ col++ ] = locale.getDisplayCountry();
        row[ col++ ] = locale.getDisplayLanguage();
        row[ col++ ] = locale.getDisplayVariant();
        row[ col++ ] = locale.getCountry();
        row[ col++ ] = locale.getLanguage();
        row[ col++ ] = locale.getVariant();

        return row;
    }
}

We declare this table function. The column names we declare here DON'T have to match the names in LocaleTableFunction's constructor. However, the number of column names should agree. Note also that Java DB will truncate the strings returned by LocaleTableFunction if they exceed the lengths declared here:

connect 'jdbc:derby:myDatabase;create=true';
create function locales()
returns table
(
    country         varchar( 50 ),
    language        varchar( 50 ),
    variant         varchar( 50 ),
    country_code    varchar( 2 ),
    language_code   varchar( 2 ),
    variant_code    varchar( 50 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'LocaleTableFunction.locales';

Now we invoke the table function. We list the supported languages, counting how many variants each has:

connect 'jdbc:derby:myDatabase;create=true';
select language, count(*) as variant_count
  from table ( locales() ) s
  where country <> ''
  group by language
  order by count(*) desc;

Here's the partial output of this query, run on a Java 7 VM:

LANGUAGE	VARIANT_COUNT
Spanish	20
Arabic	17
English	11
Serbian	7
French	5
German	4
Chinese	4

Quick Review

Let's step back a moment and quickly review what we've learned so far:

A table function, like any other Java DB function, is just a public static method. What distinguishes table functions is their return values. Table functions return JDBC ResultSets rather than scalar values. These ResultSets wrap external data. The external data could come from anywhere—it's up to the developer who writes the ResultSet.
Also like other Java DB functions, a table function must be declared using the CREATE FUNCTION statement. The static method's class must be visible on the database classpath.
You invoke your table function in the FROM clause of SQL queries. You can SELECT from a table function as though it were a table.

The next three sections discuss wrapping, declaring, and invoking in greater detail.

Wrapping

You have very little work to do as long as you extend one of the classes supplied in the accompanying tar file. The more refined the supplied class, the less work you have to do. Here's an overview of that work, proceeding from most to least refined class.

Extending EnumeratorTableFunction

We have already seen the code for LocaleTableFunction, a class which extends EnumeratorTableFunction. Here's how you extend EnumeratorTableFunction directly:

constructor – Write a constructor which calls the superclass constructor, supplying column names. The constructor should call setEnumeration(), handing it a collection, i.e., an Enumeration, Iterator, Iterable, or array.
entry point – Write a public static method which calls the constructor and returns a new instance. This is the method which you will declare to Java DB later on.
row maker – Implement the abstract method makeRow(). This method turns an Object in the collection into an array of String fields, one for each column in the ResultSet.

ZipFileTableFunction, supplied with the accompanying tar file, is another example of a class which extends EnumeratorTableFunction.

Extending FlatFileTableFunction

This is a base class for table functions which read files of structured records. Here's how you extend FlatFileTableFunction:

constructor – Write a constructor which calls the superclass constructor, supplying column names.
entry point – Write a public static method which calls the constructor and returns an instance. This is the method which you will declare to Java DB later on.
row maker – Implement the abstract method parseRow(). Your parseRow() method can advance through the file a line at a time by calling FlatFileTableFunction's readLine() method. Once a whole record has been gobbled up, your parseRow() method returns the record as an array of Strings, one cell for each column in the returned row.

Extending StringColumnVTI

The previous two helper classes are fairly refined. This means that you have very little work to do if you want to extend them. In particular, those two helper classes implement the ResultSet.next() and ResultSet.getXXX() methods. Those methods are the basic machinery for advancing to the next row in the external data set and then retrieving its columns. The next helper class we will describe, StringColumnVTI, is a superclass of the previous classes. StringColumnVTI still does a lot of work for you, but extending it involves a little more work than extending the previous classes did. This class is supplied by Java DB as part of its public api.

StringColumnVTI is a superclass of table functions which can represent their rows as String arrays. Its concrete table function subclasses, however, are not limited to String types.StringColumnVTI knows how to cast Strings to other datatypes and it implements the ResultSet getXXX() methods for all of the datatypes which Java DB supports. This in turn lets you declare columns as numeric, date/time, and binary types. For instance, take a look at the CREATE FUNCTION statement in the preceding “More Fancy SQL” section. That sample table function mixes String and integer column types. For examples of how to extend StringColumnVTI, see its concrete subclass PropertiesTableFunction. Here's how you extend StringColumnVTI:

constructor – Write a constructor which calls the superclass constructor, supplying column names.
entry point – Write a public static method which calls the constructor and returns a new instance. This is the method which you will declare to Java DB later on.
positioning – Implement the next() method, advancing the table function to the next row of external data.
column reading – Implement the getRawColumn() method. This method is handed a column index and returns the String value of the corresponding column in the current row.
cleanup – Implement the close() method, releasing resources.

Extending VTITemplate

Finally, we describe the most abstract of the helper classes. This class, VTITemplate, provides a stub implementation of the ResultSet interface. Like StringColumnVTI, this class is supplied by Java DB as part of its public api. To extend VTITemplate, you must provide real implementations for the methods which Java DB will invoke as it loops through your table function. In most situations, this will only be a handful of methods. Here's how you extend VTITemplate:

entry point – Write a public static method which returns a new instance. This is the method which you will declare to Java DB later on.
positioning – Implement the next() method, advancing the table function to the next row of external data.
accessors – Override the getXXX() stubs for the datatypes returned by your table function. For instance, if your table function declares character and integer columns, then you should override the getString() and getInt() stubs.
finish – Implement the close() method, releasing resources.

Declaring

So far, we have explored the details of wrapping external data in a table function. Next, you need to make your table function visible to Java DB. Your table function class must appear on the database classpath. The simplest way to do this is to just include the class on the VM classpath. You can also store the class inside a jar file in the database. For details on how to do this, please consult the section titled “Loading classes from a database” in Java DB's Developer's Guide.

Next you must declare the table function via the CREATE FUNCTION statement. Let's revisit the example from the previous “More Fancy SQL” section. The parts in bold black are constant boilerplate included in all table function declarations. The colored parts vary depending on your table function:

create function zipFile
( fileName varchar( 32672 ) )
returns table ( name varchar( 100 ), directory varchar( 200 ), comment varchar( 1000 ),   
  compressed_size bigint, crc bigint, size bigint, modification_time  bigint
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.ZipFileTableFunction.zipFile';

Here's what the variable parts mean:

name – In this case zipfile. This is the name that you will use when you invoke the function in SQL queries.
arguments – In this case fileName varchar( 32672 ). These are the names and Java DB types of the arguments to your public static method.
returned columns – These are the names and Java DB types of the columns in your ResultSet; in this case there are seven columns.
sql access – In this case no sql. For truly external data, this is always no sql. However, if your function reads data out of the session's Java DB database, then this should be reads sql data. If your function writes to the session's Java DB database, then this should be contains sql.
method name – In this case oracle.javadb.vti.example.ZipFileTableFunction.zipFile. This is the package, class, and method name of your public static method which returns a ResultSet.

Invoking

In the previous two sections, we explored how you wrap external data in a table function and then declare that function to Java DB. Java DB treats a declared function like other permanent objects such as tables and views. Now you can use your function in a query just like a table. You simply need to invoke your table function in the FROM clause of your query via a table constructor clause. Here again is the query from the previous “More Fancy SQL” section. The part in bold black part is the constant table constructor boilerplate. It is simply the keyword table plus a set of parentheses which enclose the function call. The other parts vary depending on your table function:

select directory, sum( compressed_size ) package_size
from table ( zipFile( '/MyApplication/lib/product.jar' ) ) s
group by directory
order by package_size desc

Here's what the variable parts mean:

columns – In this case directory and compressed_size. These are the names of the returned columns which you defined when you declared the table function originally.
name – In this case zipFile. This is the name which you gave your table function when you declared it.
arguments – In this case ( '/MyApplication/lib/product.jar' ). These are the argument values which are passed to your public static method.
correlation – In this case s. This is a short name for the table returned by your function. You use this, like other SQL correlation names, to clarify your meaning when the same column name appears in more than one table.

Views

Sometimes you find that you are invoking the same table function with the same arguments repeatedly. In this situation, you can simplify your queries by defining helper views. For instance, the following helper views would have been useful in the previous “Fancier SQL” section:

connect 'jdbc:derby:myDatabase;create=true';
create view english as
  select * from table( properties( '/MySourceCode/messages_en.properties' ) ) s;
create view spanish as
  select * from table( properties( '/MySourceCode/messages_es.properties' ) ) s;

Such views would let us express the query in that section compactly:

select id from english where id not in (select id from spanish);

Restricted Table Functions

All of the simple table functions above fully read entire foreign data sources. This is not a problem if the foreign data sources are small. However, it can lead to significant performance issues for table functions which wrap big files, collections, remote tables, or data streams. To handle big foreign data sources, Java DB supports restricted table functions. A restricted table function is a table function which can be told to produce a small, rectangular subset of its possible data. That is, a restricted table function can produce just a few columns and rows. The accompanying tar file contains a restricted table function called ForeignTable. This is a more advanced example of the first use-case discussed by this whitepaper: a table function which siphons rows out of a table in a remote database. Here's a script showing this table function in action:

-- create a table in another database
connect 'jdbc:derby:memory:db1;user=db1_dbo;create=true';

create table employee
(
    emp_id int primary key,
    first_name varchar( 50 ),
    last_name varchar( 50 ),
    birthday date,
    salary decimal( 9, 2 )
);

insert into employee values
( 1, 'Fred', 'Argle', date( '1954-02-23' ), 120000.34 ),
( 2, 'Mary', 'Bargle', date( '1950-02-23' ), 140000.50 );

-- now create a table function in a Java DB database.
-- the table function wraps the table in the other database.
connect 'jdbc:derby:memory:db2;user=db2_dbo;create=true';

create function db1_employees
(
    foreignSchemaName varchar( 50 ),
    foreignTableName varchar( 50 ),
    foreignDBConnectionURL varchar( 1000 )
)
returns table
(
    emp_id int,
    first_name varchar( 50 ),
    last_name varchar( 50 ),
    birthday date,
    salary decimal( 9, 2 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.ForeignTable.readForeignTable';

-- select a rectangular subset of the foreign table
select first_name, last_name
from table( db1_employees( 'DB1_DBO', 'EMPLOYEE', 'jdbc:derby:memory:db1' ) ) s
where emp_id = 2;

Don't be fooled by the simplicity of this example. What's going on here can be very powerful:

foreignDBConnectionURL – Our toy script invokes db1_employees() with a connection URL to another Java DB database in the same JVM. However, in a real life example, this table function could be invoked just as easily with a connection URL to an Oracle or MySQL database located across the network.
rectangular subset – Our toy table has few columns and rows. However, in a real-life example, the foreign table could have arbitrarily many columns and rows. Java DB tells the ForeignTable to limit itself to a small rectangular subset and ForeignTable passes those instructions on to the remote Oracle or MySQL database. The query runs fast because the foreign database selects and sends a small block of data.

So what's going on here? It's very simple. The declared return type of ForeignTable.readForeignTable() is ForeignTable, which implements RestrictedVTI. That means that ForeignTable supplies an initScan() method which Java DB exploits. Java DB calls this method before invoking any ResultSet methods. Via initScan(), Java DB tells the table function which columns and rows it should bother fetching. The initScan() method takes the following arguments:

column names – This is an array of Strings, one cell for each column name declared in the TABLE clause of the table function's CREATE FUNCTION statement. Java DB fills a cell with its column name only if the column is referenced by the invoking query. This includes references in the SELECT list, the WHERE clause, or any other query clause. The query above references EMP_ID, FIRST_NAME, and LAST_NAME. So when Java DB calls initScan(), Java DB passes a String array with 5 cells and Java DB only fills in the first 3 cells, corresponding to the columns mentioned in the query. The trailing 2 cells are filled with NULL.
fragments from the WHERE clause – Java DB examines the WHERE clause of the query, looking for fragments which compare table function columns to constant values. The query above has one such fragment: "emp_id = 2". Java DB wraps these fragments in a tree-shaped structure called a Restriction and passes the Restriction to initScan(). (Note that Java DB considers ? parameters to be constant values since the ?s have to be filled by the time that Java DB runs the query.)

For more information on restricted table functions, please consult the following resources:

Developer's Guide – See the section titled "Writing restricted table functions".
Java DB javadoc – See the javadoc for RestrictedVTI and Restriction.
ForeignTable – In particular, see the implementation of the initScan() method in this class which is bundled in the accompanying tar file.

A more sophisticated version of ForeignTable can be found in the ForeignTableVTI class attached to DERBY-4962. ForeignTableVTI is useful for data migration as well as ongoing integration of data from a foreign database. That class supplies the following functionality:

Foreign schema introspection – ForeignTableVTI supplies a database procedure which examines all of the tables in a schema in a foreign database. The procedure then declares a restricted table function for each foreign table and a view which hides the parameters needed to invoke the table function.
Restricted table function – ForeignTableVTI also supplies a slightly more sophisticated version of ForeignTable.

Summary

In this whitepaper we explored table functions, a feature introduced in Java DB release 10.4.1.3. We also explored restricted table functions, a feature introduced in Java DB release 10.6.1.0. Table functions let you run expressive SQL queries against arbitrary data. It's easy to write table functions. All you have to do is:

wrap – Write a public static method which returns a JDBC ResultSet. This is particularly simple if the enclosing class extends one of the helper classes supplied in the accompanying jar file. The returned ResultSet loops through your external data, making rows out of records or objects.
declare – Use the CREATE FUNCTION statement to tell Java DB where your public static method lives.
invoke – From then on, query your external data just like a table or view in your Java DB database.

For more information on table functions, please see the section titled “Programming Derby-style table functions” in Java DB's Developer's Guide.