Creating a CONTEXT Index and
Querying with CONTAINS operator
Creating a CTXCAT Index and
Querying with CATSEARCH
operator
Creating a CTXRULE Index and
Querying with MATCHES operator
Time to Complete
Approximately 1 hour
Introduction
An Oracle Text index is an Oracle Database domain index. To build
your query application, you can create an index of type CONTEXT
with a mixture of text and structured data columns, and query it
with the CONTAINS operator.
You create an index from a populated text table. In a query
application, the table must contain the text or pointers to where
the text is stored. Text is usually a collection of documents, but
can also be small text fragments.
You can use the CTXCAT index
type when your application relies heavily on mixed queries to
search small documents or descriptive text fragments based on
related criteria such as dates or prices. Query this index with
the CATSEARCH operator.
You can create an index of type CTXRULE
to build a document classification application using simple or
rule-based classification.
The benefits of a creating an Oracle Text index include fast
response time for text queries with the CONTAINS,
CATSEARCH, and MATCHES
Oracle Text operators. With Oracle Text, you can create indexes of
several types using CREATE INDEX.
Index Type
Description
Query Operator
Notes
CONTEXT
Use this index to build a text retrieval application when
your text consists of large coherent documents.
You can index documents of different formats such as MS
Word, HTML or plain text.
You can customize the index in a variety of ways. This index
type requires CTX_DDL.SYNC_INDEX
after DML on base table.
CONTAINS Grammar is
called the CONTEXT grammar,
which supports a rich set of operations.
The CTXCAT grammar can be
used with query templating.
Supports all documents services and query services.
Supports indexing of partitioned text tables. Supports FILTER BY and ORDER
BY clauses of CREATE
INDEX to also index structured column values for
more efficient processing of mixed queries.
CTXCAT
Use this index type for better mixed query performance.
Typically, with this index type, you index small documents
or text fragments.
Other columns in the base table, such as item names, prices,
and descriptions can be included in the index to improve
mixed query performance. This index type is transactional,
automatically updating itself after DML to base table. No CTX_DDL.SYNC_INDEX is
necessary.
CATSEARCH Grammar is
called CTXCAT, which
supports logical operations, phrase queries, and
wildcarding. The CONTEXT
grammar can be used with query templating. Theme querying is
supported.
This index is larger and takes longer to build than a CONTEXT index. The size of a CTXCAT index is related to the
total amount of text to be indexed, the number of indexes in
the index set, and the number of columns indexed. Carefully
consider your queries and your resources before adding
indexes to the index set. The CTXCAT
index does not support index partitioning, documents
services (highlighting, markup, themes, and gists) or query
services (explain, query feedback, and browse words.)
CTXRULE
Use CTXRULE index to
build a document classification or routing application. This
index is created on a table of queries, where the queries
define the classification or routing criteria.
MATCHES
Single documents (plain text, HTML, or XML) can be
classified using the MATCHES
operator, which turns a document into a set of queries and
finds the matching rows in the CTXRULE
index.
Creating a CONTEXT Index and
Querying with CONTAINS operator
To create a CONTEXT index,
perform the following steps:
Open a SQL Worksheet using myuser_conn and enter the
following code to create a table called QUICK
having a single column called TEXT
which is VARCHAR2(80).
Click theRun
Statementicon to run the query.
Note: Here, you can see the words from the
"the cat sat on the mat" document. But "the"
and "on" did not appear. They did not appear because
they are called the stop words - words that are
considered not useful for searching, so they are not
indexed. You can customize the list of stopwords for
an index. The $I
table is sometimes known as the "word list" table.
It contains a list of words, each associated with
some binary data (in the token_info column) which
represents the locations of each word within the
document set.
Synchronizing a Text Index
Synchronizing the index involves processing all pending
updates, inserts, and deletes to the base table. You can do
this in PL/SQL with the CTX_DDL.SYNC_INDEX
procedure.
Add a new row to the QUICK
table using the following INSERT
statement.
INSERT INTO quick VALUES ('the cat sat on the
dog');
Note: The word "dog" appeared in our index
because the index has been synced. But there's
something else to notice here - do you see that
there are multiple rows for "CAT" and "SAT"? This is
index fragmentation - due to updates, our index is
no longer in an optimal state. You can fix this by
running another PL/SQL procedure optimize_index.
Optimizing an Index
Frequent index synchronization can fragment your CONTEXT
index. Index fragmentation can adversely affect query
response time. You can optimize your CONTEXT
index to reduce fragmentation and index size and so
improve query performance.
This comes in various modes, which are beyond the scope of
this tutorial, but for now we will just run it in FULL
mode
Execute the following PL/SQL procedure to optimize an
index.
Note: The index is now in an optimal state -
best for space usage and for query performance. The
more frequently you sync and index, the more
fragmented it will become and the longer it will
take to optimize. So any large scale Oracle Text
implementation will have to plan how often to sync
indexes, and how frequently to run optimize.A common
scenario is to sync indexes every five minutes, and
optimize them daily.
Querying with CONTAINS
operator
When you create an index of type CONTEXT,
you must use the CONTAINS
operator to enter your query.
Add two more rows to your QUICK
table using the following
INSERT statements:
INSERT INTO quick VALUES ('the cat sat on the
cat');
INSERT INTO quick VALUES ('the dog sat on the cat');
Note: cat sat' is a phrase search. Both
words must appear together in the right order. Here,
you did not find 'the dog sat on the cat'. If you
want to find the words wherever they are we must use
the AND operator (which can also be expressed as
"&"). Here, the text 'the dog sat on the cat'
did not appear.
Run the following SELECT
statement, specify the query in the WHERE
clause with the CONTAINS
operator and the AND operator:
SELECT text FROM quick WHERE CONTAINS (text, 'cat
AND sat') > 0;
CONTAINS returns a
relevance score for every row selected. You obtain this
score with the SCORE
operator. The CONTAINS
operator must always be followed by the > 0 syntax, which
specifies that the score value returned by the CONTAINS
operator must be greater than zero for the row to be
returned.
When the SCORE operator is
called in the SELECT
statement, the CONTAINS
operator must reference the score label value in the third
parameter.
Run the following query specifying the SCORE()
operator.
SELECT text, score(99) FROM quick WHERE CONTAINS
(text, 'cat', 99) > 0;
Note: Here, the score for the second row is
higher - there are two occurrences of "cat" in that
one, so it is considered more 'relevant' to the
search.
You can order the results from the highest scoring
documents to the lowest scoring documents using the ORDER BY clause as follows:
SELECT text, score(99) FROM quick WHERE CONTAINS
(text, 'cat', 99) > 0 ORDER BY SCORE(99) DESC;;
A CTXCAT index (like a CONTEXT index) is a "domain"
index. Therefore it supports the "PARAMETERS"
clause. A number of possible parameter settings are shared
with CONTEXT indexes. These
are: LEXER, MEMORY,
STOPLIST, STORAGE
and WORDLIST (no other CONTEXT parameters are
supported). Howevwer, the most important parameter is a new
one: INDEX SET.
INDEX SET defines the structured columns that are to
be included in the CTXCAT
index.
To create the sub-indexes, create an index set to contain
them using the following code:
begin
ctx_ddl.create_index_set('auction_set');
ctx_ddl.add_index('auction_set', 'price');
ctx_ddl.add_index('auction_set', 'bid_close');
end;
The CATSEARCH query operator
takes a mandatory text clause and optional structured clause.
Note that the title column is NOT
part of the INDEX SET.
In this tutorial, the title column is included in all the
queries. Hence, you need to create an index on the title
column. Create the combined catalog index using the following
CREATE INDEX statement as
follows:
CREATE INDEX auction_index ON auction (title)
INDEXTYPE IS CTXCAT
PARAMETERS ('INDEX SET auction_set');
Creating a CTXRULE Index and
Querying with MATCHES Operator
To create a CTXRULE
index, perform the following steps:
Open a SQL Worksheet using myuser_conn and enter the
following code to create a table called myqueries
to hold the category name and query text. Click theRun
Statementicon to run the
query.