Rick Hillegas
November 2012
This white paper explores table functions, a feature introduced in Java DB release 10.4.1.3. The original version of this paper appeared in 2008. This revised version includes new material on restricted table functions, a follow-on feature introduced in Java DB release 10.6.1.0.
Rich and expressive, SQL is the world's most popular database query language. Table functions let you take SQL outside its traditional home in the RDBMS, setting SQL loose on data in the wild.
Table functions are just what their name implies, viz., functions which return tabular data sets. These tabular data sets can then be queried just like ordinary tables, via the full, expressive power of SQL. For this reason, table functions are sometimes called virtual tables. The data returned by these special functions can come from anywhere:
It's easy to write table functions. All you have to do is:
This white paper presents some sample table functions and shows you how to write your own.
The author would like to thank Michelle Caisse, Francois Orsini, and Dag Wanvik for their help in reviewing early drafts of this tutorial.
Suppose we wanted to import data directly from another database. We would start out by wrapping the foreign data in a public static method which returns a ResultSet. All of the hard work happens in the other database:
package com.acme.hrSchema;
import java.sql.*;
public class EmployeeTable
{
public static ResultSet read()
throws SQLException
{
Connection conn = DriverManager.getConnection
( "jdbc:mysql://localhost/hr?user=root&password=mysql-passwd" );
PreparedStatement ps = conn.prepareStatement(
"select * from hrSchema.EmployeeTable" );
return ps.executeQuery();
}
}
Next, we would put the compiled class on Java DB's classpath and declare a Java DB table function (here done through Java DB's ij query tool):
connect 'jdbc:derby:myDatabase;create=true';
create function employeeTable()
returns table
(
employeeID int,
firstName varchar( 50 ),
lastName varchar( 50 ),
birthday date
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'com.acme.hrSchema.EmployeeTable.read';
Finally, we would use the foreign data in Java DB queries. For instance, here's how we would invoke the function to siphon the foreign data into a local Java DB table:
connect 'jdbc:derby:myDatabase;create=true';
insert into employee_table select * from table( employeeTable() ) s;
Let's recap what just happened here:
Now let's try some fancier SQL. We'll use a table function supplied in the tar file that accompanies this white paper. For more information about this table function, please see the accompanying javadoc. This simple table function presents a Java properties file as a table with two columns: id and text. We're going to suppose that your Java application follows the common convention of storing its user-visible messages in properties files consisting of text lines which have the form 'messageID=messageText' and that you have separate localized versions of these files for each language that your application supports. For more information on this coding convention, see the PropertyResourceBundle javadoc. We're going to run a table function against the message files in your product and ask the question: “What recently added English text still needs to be translated into Spanish?”
Because we're using a table function which has already been written, we can skip the wrap step. That is, the accompanying jar file already supplies a public static method which returns the ResultSet we need. We just have to make sure that our classpath contains that jar file. We declare the table function:
connect 'jdbc:derby:myDatabase;create=true';
create function properties( fileName varchar( 32672 ) )
returns table
(
id varchar( 50 ),
text varchar( 1000 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.PropertiesTableFunction.properties';
Then we invoke the table function. In order to answer our question, we look for message ids which appear in the English message file but not the Spanish message file:
connect 'jdbc:derby:myDatabase;create=true';
select id
from table( properties( '/MySourceCode/messages_en.properties' ) ) english
where id not in
(
select id
from table( properties( '/MySourceCode/messages_es.properties' ) ) spanish
);
Now let's try a grouped aggregate, one of the more powerful features of SQL. We'll ask the question: “What are the sizes of the packages in my jar file?” We're going to take advantage of the fact that a jar file is just a special kind of zip file. Again, we'll skip the wrap step because we will use a table function supplied in the accompanying tar file. This table function loops through the entries in a java.util.zip.ZipFile, treating each java.util.zip.ZipEntry as a separate row. First, we declare the table function:
connect 'jdbc:derby:myDatabase;create=true';
create function zipFile
( fileName varchar( 32672 ) )
returns table
(
name varchar( 100 ),
directory varchar( 200 ),
comment varchar( 1000 ),
compressed_size bigint,
crc bigint,
size bigint,
modification_time bigint
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.ZipFileTableFunction.zipFile';
Then we invoke the table function, summing up the sizes of all files per directory. This answers our question:
connect 'jdbc:derby:myDatabase;create=true';
select directory, sum( compressed_size ) package_size
from table ( zipFile( '/MyApplication/lib/product.jar' ) ) s
group by directory
order by package_size desc;
Here are the first four result rows when we run this query against the derby.jar file supplied with Java DB release 10.9.1.0:
DIRECTORY | PACKAGE_SIZE |
---|---|
org/apache/derby/impl/sql/compile |
583212 |
org/apache/derby/impl/sql/execute |
277965 |
org/apache/derby/impl/store/raw/data |
139014 |
org/apache/derby/iapi/types |
128477 |
The last example used ZipFileTableFunction. That class extends EnumeratorTableFunction, another helper class in the accompanying tar file. EnumeratorTableFunction helps you present any Java collection as a table. Like all of the classes in the accompanying tar file, EnumeratorTableFunction is an implementation of ResultSet. You can extend EnumeratorTableFunction to make a table out of any of the following types of objects:
EnumeratorTableFunction treats each Object in your collection as a row in the returned table. Your class which extends EnumeratorTableFunction just needs to supply the following:
It is really quite simple. For instance, here's how easy it is to write a table function which wraps the collection of Locales supported by the VM. Note the call to setEnumeration() in the constructor: this is how we pass the collection of Locales to EnumeratorTableFunction's machinery. When the table function is invoked, EnumeratorTableFunction loops through the collection, calling our makeRow() method in order to turn each Object in the collection into a row:
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Locale;
import oracle.javadb.vti.core.EnumeratorTableFunction;
public class LocaleTableFunction extends EnumeratorTableFunction
{
public LocaleTableFunction() throws SQLException
{
super( new String[] { "country", "language", "variant",
"country_code", "language_code", "variant_code" } );
setEnumeration( Locale.getAvailableLocales() );
}
public static ResultSet locales() throws SQLException {
return new LocaleTableFunction(); }
public String[] makeRow( Object obj ) throws SQLException
{
int col = 0;
Locale locale = (Locale) obj;
String[] row = new String[ getColumnCount() ];
row[ col++ ] = locale.getDisplayCountry();
row[ col++ ] = locale.getDisplayLanguage();
row[ col++ ] = locale.getDisplayVariant();
row[ col++ ] = locale.getCountry();
row[ col++ ] = locale.getLanguage();
row[ col++ ] = locale.getVariant();
return row;
}
}
We declare this table function. The column names we declare here DON'T have to match the names in LocaleTableFunction's constructor. However, the number of column names should agree. Note also that Java DB will truncate the strings returned by LocaleTableFunction if they exceed the lengths declared here:
connect 'jdbc:derby:myDatabase;create=true';
create function locales()
returns table
(
country varchar( 50 ),
language varchar( 50 ),
variant varchar( 50 ),
country_code varchar( 2 ),
language_code varchar( 2 ),
variant_code varchar( 50 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'LocaleTableFunction.locales';
Now we invoke the table function. We list the supported languages, counting how many variants each has:
connect 'jdbc:derby:myDatabase;create=true';
select language, count(*) as variant_count
from table ( locales() ) s
where country <> ''
group by language
order by count(*) desc;
Here's the partial output of this query, run on a Java 7 VM:
LANGUAGE | VARIANT_COUNT |
---|---|
Spanish | 20 |
Arabic | 17 |
English | 11 |
Serbian | 7 |
French | 5 |
German | 4 |
Chinese | 4 |
Let's step back a moment and quickly review what we've learned so far:
The next three sections discuss wrapping, declaring, and invoking in greater detail.
You have very little work to do as long as you extend one of the classes supplied in the accompanying tar file. The more refined the supplied class, the less work you have to do. Here's an overview of that work, proceeding from most to least refined class.
Extending EnumeratorTableFunction
We have already seen the code for LocaleTableFunction, a class which extends EnumeratorTableFunction. Here's how you extend EnumeratorTableFunction directly:
ZipFileTableFunction, supplied with the accompanying tar file, is another example of a class which extends EnumeratorTableFunction.
Extending FlatFileTableFunction
This is a base class for table functions which read files of structured records. Here's how you extend FlatFileTableFunction:
The previous two helper classes are fairly refined. This means that you have very little work to do if you want to extend them. In particular, those two helper classes implement the ResultSet.next() and ResultSet.getXXX() methods. Those methods are the basic machinery for advancing to the next row in the external data set and then retrieving its columns. The next helper class we will describe, StringColumnVTI, is a superclass of the previous classes. StringColumnVTI still does a lot of work for you, but extending it involves a little more work than extending the previous classes did. This class is supplied by Java DB as part of its public api.
StringColumnVTI is a superclass of table functions which can represent their rows as String arrays. Its concrete table function subclasses, however, are not limited to String types.StringColumnVTI knows how to cast Strings to other datatypes and it implements the ResultSet getXXX() methods for all of the datatypes which Java DB supports. This in turn lets you declare columns as numeric, date/time, and binary types. For instance, take a look at the CREATE FUNCTION statement in the preceding “More Fancy SQL” section. That sample table function mixes String and integer column types. For examples of how to extend StringColumnVTI, see its concrete subclass PropertiesTableFunction. Here's how you extend StringColumnVTI:
Finally, we describe the most abstract of the helper classes. This class, VTITemplate, provides a stub implementation of the ResultSet interface. Like StringColumnVTI, this class is supplied by Java DB as part of its public api. To extend VTITemplate, you must provide real implementations for the methods which Java DB will invoke as it loops through your table function. In most situations, this will only be a handful of methods. Here's how you extend VTITemplate:
So far, we have explored the details of wrapping external data in a table function. Next, you need to make your table function visible to Java DB. Your table function class must appear on the database classpath. The simplest way to do this is to just include the class on the VM classpath. You can also store the class inside a jar file in the database. For details on how to do this, please consult the section titled “Loading classes from a database” in Java DB's Developer's Guide.
Next you must declare the table function via the CREATE FUNCTION statement. Let's revisit the example from the previous “More Fancy SQL” section. The parts in bold black
are constant boilerplate included in all table function declarations. The colored parts vary depending on your table function:
create function zipFile
( fileName varchar( 32672 ) )
returns table ( name varchar( 100 ), directory varchar( 200 ), comment varchar( 1000 ),
compressed_size bigint, crc bigint, size bigint, modification_time bigint
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.ZipFileTableFunction.zipFile';
Here's what the variable parts mean:
name
– In this case zipfile
. This is the name that you will use when you invoke the function in SQL queries.arguments
– In this case fileName varchar( 32672 )
. These are the names and Java DB types of the arguments to your public static method.returned columns
– These are the names and Java DB types of the columns in your ResultSet; in this case there are seven columns.sql access
– In this case no sql
. For truly external data, this is always no sql
. However, if your function reads data out of the session's Java DB database, then this should be reads sql data
. If your function writes to the session's Java DB database, then this should be contains sql
.method name
– In this case oracle.javadb.vti.example.ZipFileTableFunction.zipFile
. This is the package, class, and method name of your public static method which returns a ResultSet.In the previous two sections, we explored how you wrap external data in a table function and then declare that function to Java DB. Java DB treats a declared function like other permanent objects such as tables and views. Now you can use your function in a query just like a table. You simply need to invoke your table function in the FROM clause of your query via a table constructor clause. Here again is the query from the previous “More Fancy SQL” section. The part in bold black
part is the constant table constructor boilerplate. It is simply the keyword table
plus a set of parentheses which enclose the function call. The other parts vary depending on your table function:
select directory, sum( compressed_size ) package_size
from table ( zipFile( '/MyApplication/lib/product.jar' ) ) s
group by directory
order by package_size desc
Here's what the variable parts mean:
columns
– In this case directory
and compressed_size
. These are the names of the returned columns which you defined when you declared the table function originally.name
– In this case zipFile
. This is the name which you gave your table function when you declared it.arguments
– In this case ( '/MyApplication/lib/product.jar' )
. These are the argument values which are passed to your public static method.correlation
– In this case s
. This is a short name for the table returned by your function. You use this, like other SQL correlation names, to clarify your meaning when the same column name appears in more than one table.Sometimes you find that you are invoking the same table function with the same arguments repeatedly. In this situation, you can simplify your queries by defining helper views. For instance, the following helper views would have been useful in the previous “Fancier SQL” section:
connect 'jdbc:derby:myDatabase;create=true';
create view english as
select * from table( properties( '/MySourceCode/messages_en.properties' ) ) s;
create view spanish as
select * from table( properties( '/MySourceCode/messages_es.properties' ) ) s;
Such views would let us express the query in that section compactly:
select id from english where id not in (select id from spanish);
All of the simple table functions above fully read entire foreign data sources. This is not a problem if the foreign data sources are small. However, it can lead to significant performance issues for table functions which wrap big files, collections, remote tables, or data streams. To handle big foreign data sources, Java DB supports restricted table functions. A restricted table function is a table function which can be told to produce a small, rectangular subset of its possible data. That is, a restricted table function can produce just a few columns and rows. The accompanying tar file contains a restricted table function called ForeignTable. This is a more advanced example of the first use-case discussed by this whitepaper: a table function which siphons rows out of a table in a remote database. Here's a script showing this table function in action:
-- create a table in another database
connect 'jdbc:derby:memory:db1;user=db1_dbo;create=true';
create table employee
(
emp_id int primary key,
first_name varchar( 50 ),
last_name varchar( 50 ),
birthday date,
salary decimal( 9, 2 )
);
insert into employee values
( 1, 'Fred', 'Argle', date( '1954-02-23' ), 120000.34 ),
( 2, 'Mary', 'Bargle', date( '1950-02-23' ), 140000.50 );
-- now create a table function in a Java DB database.
-- the table function wraps the table in the other database.
connect 'jdbc:derby:memory:db2;user=db2_dbo;create=true';
create function db1_employees
(
foreignSchemaName varchar( 50 ),
foreignTableName varchar( 50 ),
foreignDBConnectionURL varchar( 1000 )
)
returns table
(
emp_id int,
first_name varchar( 50 ),
last_name varchar( 50 ),
birthday date,
salary decimal( 9, 2 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'oracle.javadb.vti.example.ForeignTable.readForeignTable';
-- select a rectangular subset of the foreign table
select first_name, last_name
from table( db1_employees( 'DB1_DBO', 'EMPLOYEE', 'jdbc:derby:memory:db1' ) ) s
where emp_id = 2;
Don't be fooled by the simplicity of this example. What's going on here can be very powerful:
So what's going on here? It's very simple. The declared return type of ForeignTable.readForeignTable() is ForeignTable, which implements RestrictedVTI. That means that ForeignTable supplies an initScan() method which Java DB exploits. Java DB calls this method before invoking any ResultSet methods. Via initScan(), Java DB tells the table function which columns and rows it should bother fetching. The initScan() method takes the following arguments:
For more information on restricted table functions, please consult the following resources:
A more sophisticated version of ForeignTable can be found in the ForeignTableVTI class attached to DERBY-4962. ForeignTableVTI is useful for data migration as well as ongoing integration of data from a foreign database. That class supplies the following functionality:
In this whitepaper we explored table functions, a feature introduced in Java DB release 10.4.1.3. We also explored restricted table functions, a feature introduced in Java DB release 10.6.1.0. Table functions let you run expressive SQL queries against arbitrary data. It's easy to write table functions. All you have to do is:
For more information on table functions, please see the section titled “Programming Derby-style table functions” in Java DB's Developer's Guide.