Using Regular Expressions
In this tutorial, you learn how to use regular expression support. This new feature is introduced in Oracle Database 10g.
Approximately 30 minutes
This tutorial covers the following topics:
Overview | |
Prerequisites | |
Using Regular Expressions in Oracle Database 10g | |
Using Basic Searches | |
Using Multilingual Capabilities | |
Regular Expressions and Check Constraints | |
Summary |
Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
In Oracle Database 10g, you can use both SQL and PL/SQL to implement regular expression support. Regular expressions are a method of describing both simple and complex patterns for searching and manipulating. String manipulation and searching contribute to a large percentage of the logic in a Web-based application. Usage ranges from the simple (for example, find the word San Francisco in a specified text) to the complex (for example, extract all URLs from the text) to the more complex (for instance, find all words in which every second character is a vowel).
Oracle Database 10g introduces support for regular expressions. The implementation complies with the Portable Operating System for UNIX (POSIX) standard, issued by the Institute of Electrical and Electronics Engineers (IEEE), for ASCII data-matching semantics and syntax. Oracle's multilingual capabilities extend the matching capabilities of the operators beyond the POSIX standard.
When coupled with native SQL, the use of regular expressions makes it possible to perform powerful search and manipulation operations on any data stored in an Oracle database. You can use this feature to easily solve problems that would otherwise be very complex to program.
Before you perform this tutorial, you should:
1. |
Perform the Installing Oracle Database 10g on Windows tutorial. |
|
2. |
Download and unzip the regexp.zip file into your working directory(c:\wkdir). |
If you have a string aabcd and you specify a search for a(b|c)d, the search looks for a followed by either b or c, which is then followed by d.
Regular Expression: 'a(b|c)d' |
a | a | b | c | d | Description | Result |
* | Look for a and succeed |
Match | ||||
* | Look for b and fail | No match | ||||
* | Look for c and fail, reset, and advance | No match | ||||
* | Look for a and succeed |
Match | ||||
* | Look for b and succeed; remember c as alternative | Match | ||||
* | Look for d and fail | No match | ||||
* | Look for c as last remembered alternative and fail, reset, and advance | No match | ||||
* | Look for a and fail, reset, and advance | No match | ||||
* | Look for a and fail, reset, and advance |
No match | ||||
* | Look for a and fail, reset, and advance |
No match |
Given the string aabcd, a(b|c)d does not match it.
To implement regular expression support in either SQL or PL/SQL, you use a new set of functions. These functions are:
Function Name | Description |
REGEXP_LIKE | Similar to the LIKE operator, but performs regular expression matching instead of simple pattern matching |
REGEXP_INSTR | Searches for a given string for a regular expression pattern and returns the position were the match is found |
REGEXP_REPLACE | Searches for a regular expression pattern and replaces it with a replacement string |
REGEXP_SUBSTR | Searches for a regular expression pattern within a given string and returns the matched substring |
Metacharacters are special characters that have a special meaning, such as a wild card character, a repeating character, a nonmatching character, or a range of characters.
You can use several predefined metacharacter symbols in the pattern matching with the functions.
Symbol | Description |
* |
Matches zero or more occurrences |
| |
Alternation operator for specifying alternative matches |
^/$ |
Matches the start of line and the end of line |
[] |
Bracket expression for a matching list matching any one of the expressions represented in the list |
[^exp] |
If the caret is inside the bracket, it negates the expression. |
{m} |
Matches exactly m times |
{m,n} |
Matches at least m times but no more than n times |
[: :] |
Specifies a character class and matches any character in that class |
\ |
Can have four different meanings: (1) stand for itself; (2) quote the next character; (3) introduce an operator; (4) do nothing |
+ |
Matches one or more occurrences |
? |
Matches zero or one occurrence |
. |
Matches any character in the supported character set (except NULL) |
() |
Grouping expression (treated as a single subexpression) |
\n |
Backreference expression |
[==] |
Specifies equivalence classes |
[..] |
Specifies one collation element (such as a multicharacter element) |
Perl Regular Expression Extensions
In addition to the POSIX standard, Oracle supports the common Perl-influenced meta characters. If you are a life sciences developer who relies on Perl to do pattern analysis on bioinformatics data stored in huge databases of DNAs and proteins, you can use SQL Regular Expression support directly on the data rather than from the middle tier. This provides you with a more efficient solution. The metacharacters added for Perl compatability are:
Operator | Description |
\d |
Match a digit character |
\D |
Match a non-digit character |
\w |
Match a word character |
\W |
Match a non-word character |
\s |
Match a whitespace character |
\S |
Match a non-whitespace character |
\A |
Match only at beginning of string |
\Z |
Match only at end of string, or before newline at the end |
\z |
Match only at end of string |
*? |
Match 0 or more times (non-greedy) |
+? |
Match 1 or more times (non-greedy) |
?? |
Match 0 or 1 time ( non-greedy) |
{n}? |
Match exactly n times (non-greedy) |
{n,}? |
Match at least n times (non-greedy) |
{n,m}? |
Match at least n but not more than m times (non-greedy) |
For more information on Perl-Influenced Extensions in Oracle Regular Expressions, refer to the Oracle Database Application Developer's Guide - Fundamentals 10g Release 2 (10.2), chapter 4 "Using Regular Expressions in Oracle Database".
You can use the regular expression functions to perform basic searches.
1. |
From a terminal window, change to the wkdir directory. cd\wkdir Set your NLS_LANG environment setting as follows: set NLS_LANG=.AL32UTF8 Start SQL*Plus, connect to Oracle with user ID and password oe/oe. sqlplus oe/oe
|
||||||||||||
2. |
Examine the syntax of the REGEXP_LIKE function:
In this function: srcstr: is the search value
To locate all products with names containing SSP/S, SSP/V, SSS/V, or SSS/S in the PRODUCT_NAME column from the PRODUCT_INFORMATION table, execute the following script: @relike.sql SELECT product_name
FROM oe.product_information
WHERE regexp_like (product_name, 'SS[PS]/[VS]');
|
||||||||||||
3. |
The REGEXP_INSTR function returns the position of a given pattern within a string. Examine the syntax:
In this function: position: is the search starting position
To search the product names to find the location of the first nonalphabetic character (regardless of whether it is uppercase or lowercase), execute the following script: @reinstr.sql COLUMN product_name FORMAT a45 COLUMN non_alpha FORMAT 9999999999 SELECT product_name, REGEXP_INSTR(product_name, '[^[:alpha:]]') non_alpha FROM oe.product_information ; Note that [^[:<class>:]] implies a character class and matches any character from within that class; [:alpha:] matches any alphabetic character. In this case, you are negating this expression by using ^.
|
||||||||||||
4. |
The REGEXP_SUBSTR function returns a given string based on a pattern of occurrence. Examine the syntax:
In this function: position: is the search starting position
You want to extract the e-mail names from the CUSTOMERS table, but extract only those e-mail names for the customers located in Switzerland. To do this, you need to return the contents in the CUST_EMAIL column that precedes the @ symbol for customers with an NLS_TERRITORY of Switzerland. Execute the following script: @resubstr.sql SELECT REGEXP_SUBSTR(cust_email, '[^@]+') FROM oe.customers WHERE nls_territory = 'SWITZERLAND' ; Note that in this example, the result returns the first substring that does not have the @ symbol.
|
||||||||||||
5. |
The REGEXP_REPLACE function returns a given string with every occurrence of the pattern replaced with a replacement string. Examine the syntax:
In this function: position: is the search starting position
To return information from the CATALOG_URL column in the PRODUCT_INFORMATION table, you can perform a full scan on the column. However, this would result in hundreds of rows being returned, because it lists a specific HTML page location within a number of catalog domains. In this example, you want to find only the individual domain names themselves and not the lower-level pages they contain. To find the domain names without all the extraneous information, use the REGEXP_REPLACE function. Execute the following script: @rereplace.sql SELECT UNIQUE REGEXP_REPLACE (catalog_url, 'http://([^/]+).*', '\1') FROM oe.product_information ; Here is an explanation of how the string was processed:
|
1. |
Execute the following script to find the product description in the Portuguese language: @multiport.sql SELECT regexp_substr(translated_name, '^[a-z]+') FROM oe.product_descriptions WHERE language_id = 'PT' AND translated_name like 'G%' ; Note that the data is not displayed. The ^ is outside the bracket, which means that you are searching for any strings or substrings that start with any character from a to z.
|
2. | Perform the same query, but this time use the case-insensitive 'i' switched on. Execute the following script: @multiport2.sql SELECT regexp_substr(translated_name, '^[a-z]+', 1, 1, 'i') FROM oe.product_descriptions WHERE language_id = 'PT' AND translated_name like 'G%' ;
|
3. |
The results are still incomplete because the returned strings are trimmed as soon as a non-English character is encountered. This is because the range [a-z] is sensitive to NLS_LANGUAGE. Range expressions are sensitive to NLS_SORT, which in this example, is derived from NLS_LANGUAGE. You need to set the NLS_LANGUAGE parameter appropriately to return the complete results. Execute the following query: @multiport3.sql ALTER SESSION SET NLS_LANGUAGE=PORTUGUESE; SELECT regexp_substr(translated_name, '^[a-z]+', 1, 1, 'i') FROM oe.product_descriptions WHERE language_id = 'PT' AND translated_name like 'G%' ;
|
4. |
The final step is to view the results in both English and Portuguese to ensure that the translation has taken place. Execute the following script: @multiport4.sql SELECT REGEXP_SUBSTR(i.product_name, '^[a-z]+', 1, 1, 'i') || ' = ' || regexp_substr(d.translated_name, '^[a-z]+', 1, 1, 'i') FROM oe.product_descriptions d, oe.product_information i WHERE d.language_id = 'PT' AND d.translated_name like 'G%' AND i.product_id = d.product_id ; ALTER SESSION SET NLS_LANGUAGE=AMERICAN;
|
1. |
Adding a check constraint on the CUST_EMAIL column of the CUSTOMERS table ensures that only strings containing an @ symbol are accepted. Execute the following script: @chkemail.sql INSERT INTO customers VALUES (9999,'Christian','Patel', cust_address_typ ('1003 Canyon Road','87501', 'Santa Fe','NM','US'), phone_list_typ ('+1 505 243 4144'),'us','AMERICA','100', 'ChrisP+creme.com', 149, null, null, null, null, null) ; Because there was no validation being performed, an e-mail address not containing an @ symbol was accepted. After the script runs, issue the ROLLBACK statement at the SQL> prompt before starting the next step. ROLLBACK;
|
2. | Implement the constraint by executing the following script:
@chkemail2.sql ALTER TABLE customers ADD CONSTRAINT cust_email_addr CHECK(REGEXP_LIKE(cust_email,'@'))NOVALIDATE ;
|
3. | Test the constraint by executing chkemail.sql
again.
@chkemail.sql The check constraint is violated because the e-mail address does not contain the required symbol. The NOVALIDATE clause ensures that existing data is not checked.
|
4. |
Remove the constraint by executing the following script: @chkemail3.sql ALTER TABLE customers DROP CONSTRAINT cust_email_addr ;
|
In this tutorial, you learned how to:
Perform a Variety of Searches to Access Data by Using Regular Expressions | ||
Access Data in Other Languages by Using Regular Expressions | ||
Place the cursor over this icon to hide all screenshots.