Using SQL for Pattern Matching

Overview

Purpose

This tutorial covers SQL for Pattern Matching. Row pattern matching in native SQL improves application and development productivity and query efficiency for row-sequence analysis.

Time to Complete

Approximately one to two hours

Introduction

Recognizing patterns in a sequence of rows has been a capability that was widely desired, but not possible with SQL until now. There were many workarounds, but these were difficult to write, hard to understand, and inefficient to execute. With Oracle Database 12c Release 1 (12.1), you can use the MATCH_RECOGNIZE clause to perform pattern matching in SQL to do the following:

Logically partition and order the data that is used in the MATCH_RECOGNIZE clause with its PARTITION BY and ORDER BY clauses.
Define patterns of rows to seek using the PATTERN clause of the MATCH_RECOGNIZE clause. These patterns use regular expressions syntax, a powerful and expressive feature, applied to the pattern variables you define.
Specify the logical conditions required to map a row to a row pattern variable in the DEFINE clause.
Define measures, which are expressions usable in the MEASURES clause of the SQL query.

Scenario (Optional)

The syntax for pattern expressions incorporates regular expressions and full conditional logic, enabling precise and flexible pattern definition. Whatever the domain is (for example, financial market prices, internet clicks, or security sensor output), applications analyzing row sequences can benefit from MATCH_RECOGNIZE.

Hardware and Software Requirements (Optional)

The following is a list of hardware and software requirements:

Oracle Database 12c Release 1 (12.1) Enterprise Edition for linux
Oracle SQL Developer version 3.0 or higher (optional)
Mozilla Firefox Web browser
A linux text editor such as gedit

Prerequisites

Before starting this tutorial, you should:

Install Oracle Database 12c Release 1 (12.1) Enterprise Edition for linux
Have access to or have installed the sample schemas
Install Oracle SQL Developer
Download the files.zip file which contains the script files used in this tutorial to your local drive. In this tutorial, we copied the downloaded required files.zip file to the /home/oracle/files folder on a linux machine
The code examples in this tutorial use the OE schema included in the Oracle Database 12c Release 1 (12.1) . The Oracle Database Sample Schemas 12c Release 1 (12.1) is part of the Oracle Database 12c Release 1 (12.1) documentation set

Using Pattern Matching

In this section, you learn the benefits of using pattern matching and some of the tasks and keywords you can use.

Why Use Pattern Matching?

The ability to recognize patterns found across multiple rows is important for many types of applications. Examples include all kinds of business processes driven by sequences of events. Examples are:

Financial applications seeking patterns of pricing, trading volume, and other behavior
Security applications where unusual behavior must be detected
Fraud detection applications
Sensor data analysis. One term describing this general area is complex event processing, and pattern matching is a powerful aid to this activity.

Tasks and Keywords in Pattern Matching

Let us go over some of the tasks and keywords used in pattern matching.

Task/Keyword	Description
PARTITION BY	Logically divide the rows into groups at a high level
ORDER BY	Logically order the rows in partition
ONE ROW PER MATCH	Returns one summary row of output for each match
ALL ROWS PER MATCH	Returns one detail row of output for each row of each match
MEASURES	Defines row pattern measure columns
PATTERN	Defines which pattern variables must be matched, the sequence in which they must be matched, and the quantity of rows which must be matched
DEFINE	Specifies the conditions that defines a pattern variable
AFTER MATCH	Restart the matching process after a match is found
MATCH_NUMBER	Finds which pattern variable applies to which rows
CLASSIFIER	Identifies which component of a pattern applies to a specific row

PARTITION BY: Typically, you want to divide your input data into logical groups for analysis. In the stock examples, you want to divide the pattern matching so that it applies to just one stock at a time. You do that with PARTITION BY which specifies that the rows of the input table are to be partitioned by one or more columns. Matches are found within partitions and do not cross partition boundaries. If there is no PARTITION BY, then all rows of the row pattern input table constitute a single row pattern partition.

ORDER BY: Used to specify the order of rows within a row pattern partition. If the order of two rows in a row pattern partition is not determined by the ORDER BY, then the result of MATCH_RECOGNIZE is non-deterministic: it may not give consistent results each time the query is run.

[ONE ROW | ALL ROWS] PER MATCH]: Sometimes you may need summary data about the matches while other times, you may need details. You can do that as follows: ONE ROW PER MATCH gives you one row of output for each match. ALL ROWS PER MATCH gives you one row of output for each row of each match. This is the default.

MEASURES: The MEASURES clause defines a list of columns for the pattern output table. Each pattern measure column is defined with a column name whose value is specified by a corresponding pattern measure expression.

PATTERN: Specifies the pattern to be recognized in the ordered sequence of rows in a partition. Each variable name in a pattern corresponds to a Boolean condition, which is specified later using the DEFINE component of the syntax. The PATTERN clause is used to specify a regular expression. The regular expression is enclosed in parentheses.

DEFINE: Because the PATTERN clause depends on pattern variables, you must have a clause to define these variables. They are specified in the DEFINE clause (required), and are used to specify the conditions that defines a pattern variable.

AFTER MATCH: Once the query finds a match, it is vital that it begins looking for the next match at exactly the right point. Do you want to find matches where the end of the earlier match overlaps the start of the next match? Or do you want some other variation? Pattern matching provides great flexibility in specifying the restart point.

MATCH_NUMBER: You might have a large number of matches for your pattern inside a given row partition. How do you tell all these matches apart? This is done with the MATCH_NUMBER function. Matches within a row pattern partition are numbered sequentially starting with 1 in the order they are found. Note that match numbering starts over again at 1 in each row pattern partition, because there is no inherent ordering between row pattern partitions.

CLASSIFIER: Along with knowing which MATCH_NUMBER you are seeing, you may want to know which component of a pattern applies to a specific row. This is done using the CLASSIFIER function. The classifier of a row is the pattern variable that the row is mapped to by a row pattern match. The classifier of a row that is not mapped by a row pattern match is null. The CLASSIFIER function returns a character string whose value is the classifier of a row.

Pattern Match Example: Stock Chart

Pattern matching enables you to identify price patterns, such as V-shapes and W-shapes illustrated in the following chart along with performing many types of calculations. For example, your calculations might include the count of observations or the average value on a downward or upward slope. The following chart uses the stock price which you can load into your database with the CREATE and INSERT statements. You can then issue a query to find all cases where stock prices dipped to a bottom price and then rose. This is generally called a V-shape.

The following steps use the stock price graph. You can load the stock chart example into your database using the SQL CREATE and INSERT statements that follow. The scripts are also found in the files.zip file.

Create the ticker table as follows:

CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

Next, populate the newly created ticker table with some data using the SQL INSERT statements as follows:

INSERT INTO Ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO Ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO Ticker VALUES('ACME', '03-Apr-11', 19);
INSERT INTO Ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO Ticker VALUES('ACME', '05-Apr-11', 25);
INSERT INTO Ticker VALUES('ACME', '06-Apr-11', 12);
INSERT INTO Ticker VALUES('ACME', '07-Apr-11', 15);
INSERT INTO Ticker VALUES('ACME', '08-Apr-11', 20);
INSERT INTO Ticker VALUES('ACME', '09-Apr-11', 24);
INSERT INTO Ticker VALUES('ACME', '10-Apr-11', 25);
INSERT INTO Ticker VALUES('ACME', '11-Apr-11', 19);
INSERT INTO Ticker VALUES('ACME', '12-Apr-11', 15);
INSERT INTO Ticker VALUES('ACME', '13-Apr-11', 25);
INSERT INTO Ticker VALUES('ACME', '14-Apr-11', 25);
INSERT INTO Ticker VALUES('ACME', '15-Apr-11', 14);
INSERT INTO Ticker VALUES('ACME', '16-Apr-11', 12);
INSERT INTO Ticker VALUES('ACME', '17-Apr-11', 14);
INSERT INTO Ticker VALUES('ACME', '18-Apr-11', 24);
INSERT INTO Ticker VALUES('ACME', '19-Apr-11', 23);
INSERT INTO Ticker VALUES('ACME', '20-Apr-11', 22);

Enter the following query which demonstrates Simple V Shape With 1 Row Output per Match:

SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
LAST(DOWN.tstamp) AS bottom_tstamp,
LAST(UP.tstamp) AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+)
DEFINE
DOWN AS DOWN.price < PREV(DOWN.price),
UP AS UP.price > PREV(UP.price) ) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;

The output of the preceding code is displayed:

Let us look at two charts to better understand the results returned the previous query. The following chart helps in identifying which dates are mapped to which pattern variables.

The chart shows the dates mapped to specific pattern variables, as specified in the PATTERN clause. Once the mappings of pattern variables to dates are available, that information is used by the MEASURES clause to calculate the measure values. The chart labels every date mapped to a pattern variable. The mapping is based on the pattern specified in the PATTERN clause and the logical conditions specified in then DEFINE clause. The thin vertical lines show the borders of the three matches that were found for the pattern. In each match, the first date has the STRT pattern variable mapped to it (labeled as "Start"), followed by one or more dates mapped to the DOWN pattern variable, and finally, one or more dates mapped to the UP pattern variable. Since the AFTER MATCH SKIP TO LAST UP in was specified in the query, two adjacent matches can share a row. That means a single date can have two variables mapped to it. For example, 10-April has both the pattern variables UP and STRT mapped to it: April 10 is the end of Match 1 and the start of Match 2.

Let us look at the second chart which helps identifying *to which dates the measures correspond*:

In the chart, the measures defined in the MEASURES clause of the query are labeled solely: START (start_tstamp in the query), BOTTOM (bottom_tstamp in the query), and END (end_tstamp in the query). As in the previous chart, the thin vertical lines show the borders of the three matches found for the pattern. Every match has a Start date, a Bottom date, and an End date. As with the previous chart, the date 10-April is found in two matches:

It is the END measure for Match 1 and the START measure for Match 2. The labeled dates in the chart show which dates correspond to the measure definitions, which are in turn based on the pattern variable mappings shown in the first chart.

Note that the dates labeled in the chart correspond exactly to the nine dates shown earlier in the output of the example. The first row of the output has the dates shown in Match 1, the second row of the output has the dates shown in Match 2, and the third row of the output has the dates shown in Match 3.

How is Data Processed in Pattern Matching?

Let us identify how data is processed in pattern matching using the MATCH_RECOGNIZE clause.

The row pattern input table is partitioned according to the PARTITION BY clause. Each row pattern partition consists of the set of rows of the row pattern input table that are equal on the row pattern partitioning columns.

Each row pattern partition is ordered according to the ORDER BY clause.

Each ordered row pattern partition is searched for matches to the PATTERN.

Pattern matching operates by seeking the match at the earliest row, considering the rows in a row pattern partition in the order specified by the ORDER BY. Pattern matching in a sequence of rows is an incremental process, with one row after another examined to see if it fits the pattern. With this incremental processing model, at any step until the complete pattern has been recognized, we only have a partial match and we do not know what rows might be added in the future, nor what variables those future rows might be mapped to. If no match is found at the earliest row, the search moves to the next row in the partition, checking to see if a match can be found starting with that row.

After a match is found, row pattern matching calculates the row pattern measure columns, which are expressions defined by the MEASURES clause.

Using ONE ROW PER MATCH, as shown in the first example, pattern matching generates one row for each match that is found. If you use ALL ROWS PER MATCH, every row which is matched is included in the pattern match output.

The AFTER MATCH SKIP clause determines where row pattern matching resumes within a row pattern partition after a non-empty match has been found. In the example above, row pattern matching resumes at the next row after the rows mapped by a match (AFTER MATCH SKIP TO LAST UP).

Additional Pattern Matching Examples and Features

In this section, we will look at some additional pattern matching examples and some (not all) additional features of pattern matching.

Pattern Match for a Simple V-Shape With All Rows Output Per Match

Enter the following query which demonstrates pattern match for a Simple V-Shape With All Rows Output Per Match.

-- The first line in the example is used to improve formatting if you are using SQL*Plus.
COLUMN var_match FORMAT A4
SELECT * FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
FINAL LAST(DOWN.tstamp) AS bottom_tstamp,
FINAL LAST(UP.tstamp) AS end_tstamp,
MATCH_NUMBER() AS match_num,
CLASSIFIER() AS var_match
ALL ROWS PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+)
DEFINE
DOWN AS DOWN.price < PREV(DOWN.price),
UP AS UP.price > PREV(UP.price) ) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.match_num, MR.tstamp;

What does this query do? It's very similar to the earlier query except for items in the MEASURES clause, the change to , and a change to the ORDER BY at the end of the query. In the MEASURES clause, there are the following additions:

MATCH_NUMBER() AS match_num - Because this example returns multiple rows per match, we want to know which rows are members of which match. MATCH_NUMBER assigns the same number to each row of a specific match. For instance, all the rows in the first match found in a row pattern partition are assigned the MATCH_NUMBER value of 1. Note that match numbering starts over again at 1 in each row pattern partition.
CLASSIFIER() AS var_match - We would like to know which rows map to which variable, and the CLASSIFIER function lets us do this. In this example, some will map to the STRT variable, some rows the DOWN variable, and others, the UP variable.
FINAL LAST() syntax was added to the bottom_tstamp and end_tstamp measures. By specifying FINAL and using the LAST() function for bottom_tstamp, every row inside each match shows the same date for the bottom of its V-shape. Similarly, applying FINAL LAST() to the end_tstamp measure makes every row in each match show the same date for the end of its V-shape. Without this syntax, the dates shown would be the running value for each row.

Changes were made in two other lines:

ALL ROWS PER MATCH - While the earlier example gave us a summary with just 1 row about each match using the line ONE ROW PER MATCH, this example asks to show every row of each match.
ORDER BY on the last line - This was changed to take advantage of the MATCH_NUM. We want to all rows in the same match together and in chronological order.

Note that the row for April 10 appears twice because it is in two pattern matches:

It is the last day of the first match, and
the first day of the second match

Pattern Match for a W-Shape

Enter the following query which demonstrates pattern match for a W -Shape:

SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
UP.tstamp AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
DEFINE
DOWN AS DOWN.price < PREV(DOWN.price),
UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;

This query is identical to the query in first query except for the line defining the PATTERN. The regular expression has been modified to seek the pattern DOWN followed by UP two consecutive times: PATTERN (STRT DOWN+ UP+ DOWN+ UP+).

This pattern specification means it can only match a W-shape where the two V-shapes have no separation between them. For instance, if there is a flat interval with the price not changing, and that interval occurs between two V-shapes, the pattern will not match that data.

Advanced Topics

Pattern Matching

Oracle Database 12c Release 1 (12.1) Data Warehousing Guide

Nesting FIRST and LAST Within PREV and NEXT

FIRST

LAST

FIRST

LAST

PREV (LAST (A.Price + A.Tax, 1), 3)

Handling Empty Matches or Unmatched Rows

ALL ROWS PER MATCH has three sub options:

ALL ROWS PER MATCH SHOW EMPTY MATCHES: With this option, an empty match generates a single row in the row pattern output table.
ALL ROWS PER MATCH OMIT EMPTY MATCHES: With this option, an empty match is omitted from the row pattern output table.
ALL ROWS PER MATCH WITH UNMATCHED ROWS: This option shows both empty matches and unmatched rows.

Excluding Portions of the Pattern from the Output

When using ALL ROWS PER MATCH with either the OMIT EMPTY MATCHES or SHOW EMPTY MATCHES options, rows matching a portion of the PATTERN may be excluded from the row pattern output table. The excluded portion is bracketed between {- and -} in the PATTERN clause.

Handling Patterns Where Multiple Orderings are Satisfactory Matches

You can handle patterns where multiple orderings are satisfactory matches using the PERMUTE syntax. For example PATTERN (PERMUTE (A, B, C)) is equivalent to an alternation of all permutations of three pattern variables A, B, and C, as follows:

PATTERN (A B C | A C B | B A C | B C A | C A B | C B A)

Rules and Restrictions in Pattern Matching

Input Table Requirements: The row pattern input table is the input argument to MATCH_RECOGNIZE. You can use a table or view, or a named query (defined using the WITH clause). The row pattern input table can also be a derived table (also known as in-line view).
Prohibited Nesting in the MATCH_RECOGNIZE Clause: The following kinds of nesting are prohibited in pattern matching:

Nesting one pattern matching clause within another.
Outer references in MEASURES or DEFINE. This means that a pattern matching may not reference any table in an outer query block except the row pattern input table.
Correlated subqueries may not be used in MEASURES or DEFINE. In addition, subqueries in MEASURES or DEFINE cannot reference pattern variables.
Pattern matching may not be used in recursive queries.
SELECT FOR UPDATE cannot use MATCH_RECOGNIZE.

Concatenated Pattern Matching: It is not prohibited to feed the output of one pattern matching into the input of another.
Aggregate Restrictions: Only the following aggregate functions can be used in the MEASURES and DEFINE clauses: COUNT, SUM, AVG, MAX, and MIN.

Summary

In this tutorial, you have learned how to:

Use the MATCH_RECOGNIZE clause to perform pattern matching in SQL
Use the tasks and keywords that are used in pattern matching
Use pattern matching to identify price patterns such as V-shapes and W-shapes
Use pattern matching for a Simple V-Shape With All Rows Output Per Match
Use pattern matching for a W-Shape

Resources

For detailed information on pattern matching, see the Pattern Matching chapter in the Oracle Database 12c Release 1 (12.1) Data Warehousing Guide reference guide. Among other things, this chapter contains the following detailed examples:

Stock Market Examples: The examples are based on common tasks involving share prices and patterns.
Security Log Analysis Examples: The examples deal with a computer system that issues error messages and authentication checks, and stores the events in a system file.
Sessionization Examples: Sessionization is the process of defining distinct sessions of user activity, typically involving multiple events in a single session. Pattern matching makes it easy to express queries for sessionization.

Credits

Lead Curriculum Developer: Lauran K. Serhal
Other Contributors: Paul Lane, John Haydu

To help navigate this Oracle by Example, note the following:

Hiding Header Buttons:: Click the Title to hide the buttons in the header. To show the buttons again, simply click the Title again.
Topic List Button:: A list of all the topics. Click one of the topics to navigate to that section.
Expand/Collapse All Topics:: To show/hide all the detail for all the sections. By default, all topics are collapsed
Show/Hide All Images:: To show/hide all the screenshots. By default, all images are displayed.
Print:: To print the content. The content currently displayed or hidden will be printed.

To navigate to a particular section in this tutorial, select the topic from the list.