oracle 全文检索
来源:互联网 发布:八叉树算法 编辑:程序博客网 时间:2024/04/30 10:25
PL/SQL procedure successfully completed
SQL> EXEC ctx_ddl.create_preference('my_test_mds', 'MULTI_COLUMN_DATASTORE');
PL/SQL procedure successfully completed
SQL> exec ctx_ddl.set_attribute('my_test_mds', 'columns', 'base_keyword, base_title, base_content');
PL/SQL procedure successfully completed
SQL>
CREATE INDEX mc_idx ON tb_knowledge_base_bk(base_title) INDEXTYPE IS ctxsys.CONTEXT PARAMETERS('datastore my_test_mds LEXER my_test_lexer')
select * from tb_knowledge_base_bk t where contains(t.base_title,'转速',1) >0;
select * from DR$MC_IDX$I;
原文:
http://gjbxx110.blog.51cto.com/794841/264880
数据库版本:10g
数据库操作系统: linux as4
表test1,
CREATE TABLE TEST1
(
CHINAME VARCHAR2(4000 CHAR),
ABNAME VARCHAR2(50 CHAR),
ADDRESS VARCHAR2(200 CHAR),
CITY VARCHAR2(30 CHAR),
X NUMBER(10,5),
Y NUMBER(10,5),
POIID NUMBER,
ID NUMBER
)。
指定分词
参考资料:
http://docs.oracle.com/cd/B19306_01/text.102/b14218/cdatadic.htm
Datastore Types
Use the datastore types to specify how your text is stored. To create a datastore preference, you must use one of the following datastore types:
Table 2-1 Datastore Types
10g Release 2 (10.2)
Part Number B14218-01
Home
Book List
Contents
Index
Master Index
Contact Us
Previous
NextPDF ·Mobi · ePub
2 Oracle Text Indexing Elements
This chapter describes the various elements you can use to create your Oracle Text index.
The following topics are discussed in this chapter:
Overview
Datastore Types
Filter Types
Lexer Types
Wordlist Type
Storage Types
Section Group Types
Classifier Types
Cluster Types
Stoplists
System-Defined Preferences
System Parameters
Overview
When you useCREATE INDEX to create an index orALTER INDEX to manage an index, you can optionally specify indexing preferences, stoplists, and section groups in the parameter string. Specifying a preference, stoplist, or section group answers one of the following questions about the way Oracle Text indexes text:
This chapter describes how to set each preference. You enable an option by creating a preference with one of the types described in this chapter.
For example, to specify that your documents are stored in external files, you can create a datastore preference calledmydatastore
using the FILE_DATASTORE type. You specify mydatastore
as the datastore preference in the parameter clause ofCREATE
INDEX
.
Creating Preferences
To create a datastore, lexer, filter, classifier, wordlist, or storage preference, you use the CTX_DDL.CREATE_PREFERENCE procedure and specify one of the types described in this chapter. For some types, you can also set attributes with the CTX_DDL.SET_ATTRIBUTE procedure.
An indexingtype names a class of indexing objects that you can use to create an indexpreference. A type, therefore, is an abstract ID, while a preference is an entity that corresponds to a type. Many system-defined preferences have the same name as types (for example,BASIC_LEXER
), but exact correspondence is not guaranteed (for example, theDEFAULT_DATASTORE
preference uses the DIRECT_DATASTORE
type, and there is no system preference corresponding to theCHARSET_FILTER
type). Be careful in assuming the existence or nature of either indexing types or system preferences.
You specify indexing preferences with CREATE INDEX
and ALTER INDEX
; indexing preferences determine how your index is created. For example, lexer preferences indicate the language of the text to be indexed. You can create and specify your own (user-defined) preferences or you can utilize system-defined preferences.
To create a stoplist, use CTX_DDL.CREATE_STOPLIST. You can add stopwords to a stoplist withCTX_DDL.ADD_STOPWORD
.
To create section groups, use CTX_DDL.CREATE_SECTION_GROUP and specify a section group type. You can add sections to section groups withCTX_DDL. ADD_ZONE_SECTION
or CTX_DDL.ADD_FIELD_SECTION
.
Datastore Types
Use the datastore types to specify how your text is stored. To create a datastore preference, you must use one of the following datastore types:
Table 2-1 Datastore Types
DIRECT_DATASTORE
Use the DIRECT_DATASTORE
type for text stored directly in the text column, one document for each row.DIRECT_DATASTORE
has no attributes.
The following columns types are supported: CHAR
, VARCHAR
,VARCHAR2
, BLOB
, CLOB
, BFILE
, orXMLType
.
Note:
If your column is aBFILE
, the index owner must have read permission on all directories used by theBFILEs
.DIRECT_DATASTORE CLOB Example
The following example creates a table with a CLOB column to store text data. It then populates two rows with text data and indexes the table using the system-defined preferenceCTXSYS.DEFAULT_DATASTORE
.
create table mytable(id number primary key, docs clob); insert into mytable values(111555,'this text will be indexed');insert into mytable values(111556,'this is a direct_datastore example');commit;create index myindex on mytable(docs) indextype is ctxsys.context parameters ('DATASTORE CTXSYS.DEFAULT_DATASTORE');
MULTI_COLUMN_DATASTORE
Use this datastore when your text is stored in more than one column. During indexing, the system concatenates the text columns, tagging the column text, and indexes the text as a single document. The XML-like tagging is optional. You can also set the system to filter and concatenate binary columns.
MULTI_COLUMN_DATASTORE
has the following attributes:
Table 2-2 MULTI_COLUMN_DATASTORE Attributes
NUMBER
and DATE
column types are supported. They are converted to text before indexing using the default format mask. TheTO_CHAR
function can be used in the column list for formatting.
RAW
and BLOB
columns are directly concatenated as binary data.
LONG
, LONG
RAW
, NCHAR
, andNCLOB
, nested table columns and collections are not supported.
The column list is limited to 500 bytes.
filterSpecify a comma-delimited list of Y/N flags. Each flag corresponds to a column in the COLUMNS list and denotes whether to filter the column using theAUTO_FILTER
.Specify one of the following allowable values:
Y: Column is to be filtered with AUTO_FILTER
N or no value: Column is not be filtered (Default)
delimiterSpecify the delimiter that separates column text. Use one of the following:COLUMN_NAME_TAG: Column text is set off by XML-like open and close tags (default behavior).
NEWLINE: Column text is separated with a newline.
Indexing and DML
To index, you must create a dummy column to specify in the CREATE
INDEX
statement. This column's contents are not made part of the virtual document, unless its name is specified in the columns attribute.
The index is synchronized only when the dummy column is updated. You can create triggers to propagate changes if needed.
MULTI_COLUMN_DATASTORE Example
The following example creates a multi-column datastore preference called my_multi
with three text columns:
begin
ctx_ddl.create_preference('my_multi', 'MULTI_COLUMN_DATASTORE');ctx_ddl.set_attribute('my_multi', 'columns', 'column1, column2, column3');
end;
MULTI_COLUMN_DATASTORE Filter Example
The following example creates a multi-column datastore preference and denotes that thebar
column is to be filtered with the AUTO_FILTER.
ctx_ddl.create_preference('MY_MULTI','MULTI_COLUMN_DATASTORE');ctx_ddl.set_attribute('MY_MULTI', 'COLUMNS','foo,bar');ctx_ddl.set_attribute('MY_MULTI','FILTER','N,Y');
The multi-column datastore fetches the content of the foo
and bar
columns, filters bar
, then composes the compound document as:
<FOO>foo contents</FOO><BAR>bar filtered contents (probably originally HTML)</BAR>
The N's need not be specified, and there need not be a flag for every column. Only the Y's need to be specified, with commas to denote which column they apply to. For instance:
ctx_ddl.create_preference('MY_MULTI','MULTI_COLUMN_DATASTORE');ctx_ddl.set_attribute('MY_MULTI', 'COLUMNS','foo,bar,zoo,jar');ctx_ddl.set_attribute('MY_MULTI','FILTER',',,Y');
This filters only the column zoo
.
Tagging Behavior
During indexing, the system creates a virtual document for each row. The virtual document is composed of the contents of the columns concatenated in the listing order with column name tags automatically added. For example:
create table mc(id number primary key, name varchar2(10), address varchar2(80));insert into mc values(1, 'John Smith', '123 Main Street');exec ctx_ddl.create_preference('mymds', 'MULTI_COLUMN_DATASTORE');exec ctx_ddl.set_attibute('mymds', 'columns', 'name, address');
This produces the following virtual text for indexing:
<NAME>John Smith</NAME><ADDRESS>123 Main Street</ADDRESS>
The system indexes the text between the tags, ignoring the tags themselves.
Indexing Columns as Sections
To index these tags as sections, you can optionally create field sections with theBASIC_SECTION_GROUP
.
Note:
No section group is created when you use theMULTI_COLUMN_DATASTORE
. To create sections for these tags, you must create a section group.When you use expressions or functions, the tag is composed of the first 30 characters of the expression unless a column alias is used.
For example, if your expression is as follows:
exec ctx_ddl.set_attibute('mymds', 'columns', '4 + 17');
then it produces the following virtual text:
<4 + 17>21</4 + 17>
If your expression is as follows:
exec ctx_ddl.set_attibute('mymds', 'columns', '4 + 17 col1');
then it produces the following virtual text:
<col1>21<col1>
The tags are in uppercase unless the column name or column alias is in lowercase and surrounded by double quotes. For example:
exec ctx_ddl.set_attibute('mymds', 'COLUMNS', 'foo');
produces the following virtual text:
<FOO>content of foo</FOO>
For lowercase tags, use the following:
exec ctx_ddl.set_attibute('mymds', 'COLUMNS', 'foo "foo"');
This expression produces:
<foo>content of foo</foo>
DETAIL_DATASTORE
Use the DETAIL_DATASTORE
type for text stored directly in the database in detail tables, with the indexed text column located in the master table.
DETAIL_DATASTORE
has the following attributes:
Table 2-3 DETAIL_DATASTORE Attributes
Specify FALSE for Oracle Text to add a newline character (\n) after each detail row automatically.
detail_tableSpecify the name of the detail table (OWNER.TABLE if necessary)detail_keySpecify the name of the detail table foreign key column(s)detail_linenoSpecify the name of the detail table sequence column.detail_textSpecify the name of the detail table text column.Synchronizing Master/Detail Indexes
Changes to the detail table do not trigger re-indexing when you synchronize the index. Only changes to the indexed column in the master table triggers a re-index when you synchronize the index.
You can create triggers on the detail table to propagate changes to the indexed column in the master table row.
Example Master/Detail Tables
This example illustrates how master and detail tables are related to each other.
Master Table Example
Master tables define the documents in a master/detail relationship. You assign an identifying number to each document. The following table is an example master table, calledmy_master
:
Note:
Your master table must include a primary key column when you use theDETAIL_DATASTORE
type.Detail Table Example
Detail tables contain the text for a document, whose content is usually stored across a number of rows. The following detail tablemy_detail
is related to the master table my_master
with thearticle_id
column. This column identifies the master document to which each detail row (sub-document) belongs.
Detail Table Example Attributes
In this example, the DETAIL_DATASTORE
attributes have the following values:
You use CTX_DDL.CREATE_PREFERENCE to create a preference withDETAIL_DATASTORE
. You use CTX_DDL.SET_ATTRIBUTE to set the attributes for this preference as described earlier. The following example shows how this is done:
begin
ctx_ddl.create_preference('my_detail_pref', 'DETAIL_DATASTORE');ctx_ddl.set_attribute('my_detail_pref', 'binary', 'true');ctx_ddl.set_attribute('my_detail_pref', 'detail_table', 'my_detail');ctx_ddl.set_attribute('my_detail_pref', 'detail_key', 'article_id');ctx_ddl.set_attribute('my_detail_pref', 'detail_lineno', 'seq');ctx_ddl.set_attribute('my_detail_pref', 'detail_text', 'text');
end;
Master/Detail Index Example
To index the document defined in this master/detail relationship, you specify a column in the master table withCREATE
INDEX
. The column you specify must be one of the allowable types.
This example uses the body
column, whose function is to enable the creation of the master/detail index and to improve readability of the code. Themy_detail_pref
preference is set to DETAIL_DATASTORE
with the required attributes:
CREATE INDEX myindex on my_master(body) indextype is ctxsys.contextparameters('datastore my_detail_pref');
In this example, you can also specify the title
or author
column to create the index. However, if you do so, changes to these columns will trigger a re-index operation.
FILE_DATASTORE
The FILE_DATASTORE
type is used for text stored in files accessed through the local file system.
Note:
FILE_DATASTORE
may not work with certain types of remote mounted file systems.FILE_DATASTORE
has the following attribute(s):
Table 2-4 FILE_DATASTORE Attributes
- path
Specify the full directory path name of the files stored externally in a file system. When you specify the full directory path as such, you need only include file names in your text column.
You can specify multiple paths for path, with each path separated by a colon (:) on UNIX and semicolon(;) on Windows. File names are stored in the text column in the text table.
If you do not specify a path for external files with this attribute, Oracle Text requires that the path be included in the file names stored in the text column.
PATH Attribute Limitations
The PATH attribute has the following limitations:
If you specify a PATH attribute, you can only use a simple filename in the indexed column. You cannot combine the PATH attribute with a path as part of the filename. If the files exist in multiple folders or directories, you must leave the PATH attribute unset, and include the full file name, with PATH, in the indexed column.
On Windows systems, the files must be located on a local drive. They cannot be on a remote drive, whether the remote drive is mapped to a local drive letter.
FILE_DATASTORE Example
This example creates a file datastore preference called COMMON_DIR
that has a path of/mydocs
:
begin ctx_ddl.create_preference('COMMON_DIR','FILE_DATASTORE'); ctx_ddl.set_attribute('COMMON_DIR','PATH','/mydocs');end;
When you populate the table mytable
, you need only insert filenames. Thepath attribute tells the system where to look during the indexing operation.
create table mytable(id number primary key, docs varchar2(2000)); insert into mytable values(111555,'first.txt');insert into mytable values(111556,'second.txt');commit;
Create the index as follows:
create index myindex on mytable(docs) indextype is ctxsys.context parameters ('datastore COMMON_DIR');
URL_DATASTORE
Use the URL_DATASTORE
type for text stored:
In files on the World Wide Web (accessed through HTTP or FTP)
In files in the local file system (accessed through the file protocol)
You store each URL in a single text field.
URL Syntax
The syntax of a URL you store in a text field is as follows (with brackets indicating optional parameters):
[URL:]<access_scheme>://<host_name>[:<port_number>]/[<url_path>]
The access_scheme string you specify can be eitherftp, http, or file. For example:
http://mymachine.us.oracle.com/home.html
As this syntax is partially compliant with the RFC 1738 specification, the following restriction holds for the URL syntax:
The URL must contain only printable ASCII characters. Non printable ASCII characters and multibyte characters must be escaped with the %xx notation, wherexx is the hexadecimal representation of the special character.
Note:
Thelogin:password@
syntax within the URL is supported only for the ftp access scheme.
URL_DATASTORE Attributes
URL_DATASTORE
has the following attributes:
Table 2-5 URL_DATASTORE Attributes
- timeout
Specify the length of time, in seconds, that a network operation such as a connect or read waits before timing out and returning a timeout error to the application. The valid range fortimeout is 15 to 3600 and the default is 30.
Note:
Since timeout is at the network operation level, the total timeout may be longer than the time specified fortimeout.- maxthreads
Specify the maximum number of threads that can be running at the same time. The valid range formaxthreads is 1 to 1024 and the default is 8.
- urlsize
Specify the maximum length, in bytes, that the URL data store supports for URLs stored in the database. If a URL is over the maximum length, an error is returned. The valid range forurlsize is 32 to 65535 and the default is 256.
Note:
The product values specified for maxurls and urlsize cannot exceed 5,000,000.In other words, the maximum size of the memory buffer (maxurls *urlsize) for the URL is approximately 5 megabytes.
- maxurls
Specify the maximum number of rows that the internal buffer can hold for HTML documents (rows) retrieved from the text table. The valid range formaxurls is 32 to 65535 and the default is 256.
Note:
The product values specified for maxurls and urlsize cannot exceed 5,000,000.In other words, the maximum size of the memory buffer (maxurls *urlsize) for the URL is approximately 5 megabytes.
- http_proxy
Specify the fully qualified name of the host machine that serves as the HTTP proxy (gateway) for the machine on which Oracle Text is installed. You can optionally specify port number with a colon in the formhostname:port.
You must set this attribute if the machine is in an intranet that requires authentication through a proxy server to access Web files located outside the firewall.
- ftp_proxy
Specify the fully-qualified name of the host machine that serves as the FTP proxy (gateway) for the machine on which Oracle Text is installed. You can optionally specify a port number with a colon in the formhostname:port.
This attribute must be set if the machine is in an intranet that requires authentication through a proxy server to access Web files located outside the firewall.
- no_proxy
Specify a string of domains (up to sixteen, separate by commas) which are found in most, if not all, of the machines in your intranet. When one of the domains is encountered in a host name, no request is sent to the machine(s) specified forftp_proxy and http_proxy. Instead, the request is processed directly by the host machine identified in the URL.
For example, if the string us.oracle.com, uk.oracle.com is entered for no_proxy, any URL requests to machines that contain either of these domains in their host names are not processed by your proxy server(s).
URL_DATASTORE Example
This example creates a URL_DATASTORE
preference called URL_PREF
for which the http_proxy, no_proxy, and timeout attributes are set. The defaults are used for the attributes that are not set.
begin ctx_ddl.create_preference('URL_PREF','URL_DATASTORE'); ctx_ddl.set_attribute('URL_PREF','HTTP_PROXY','www-proxy.us.oracle.com'); ctx_ddl.set_attribute('URL_PREF','NO_PROXY','us.oracle.com'); ctx_ddl.set_attribute('URL_PREF','Timeout','300');end;
Create the table and insert values into it:
create table urls(id number primary key, docs varchar2(2000));insert into urls values(111555,'http://context.us.oracle.com');insert into urls values(111556,'http://www.sun.com');commit;
To create the index, specify URL_PREF as the datastore:
create index datastores_text on urls ( docs ) indextype is ctxsys.context parameters ( 'Datastore URL_PREF' );
USER_DATASTORE
Use the USER_DATASTORE
type to define stored procedures that synthesize documents during indexing. For example, a user procedure might synthesize author, date, and text columns into one document to have the author and date information be part of the indexed text.
USER_DATASTORE
has the following attributes:
Table 2-6 USER_DATASTORE Attributes
This procedure can be owned by any user and must be executable by the index owner.
output_typeSpecify the data type of the second argument toprocedure. Valid values areCLOB
, BLOB
, CLOB_LOC
, BLOB_LOC
, or VARCHAR2
. The default is CLOB.When you specify CLOB_LOC
, BLOB_LOC
, you indicate that no temporaryCLOB
or BLOB
is needed, since your procedure copies a locator to theIN
/OUT
second parameter.
- procedure
Specify the name of the procedure that synthesizes the document to be indexed. This specification must be in the form PROCEDURENAME or PACKAGENAME.PROCEDURENAME. You can also specify the schema owner name.
The procedure you specify must have two arguments defined as follows:
procedure (r IN ROWID, c IN OUT NOCOPY <output_type>)
The first argument r must be of type
ROWID
. The second argumentc must be of type output_type.NOCOPY
is a compiler hint that instructs Oracle Text to pass parameter c by reference if possible.Note::
The procedure name and its arguments can be named anything. The arguments r and c are used in this example for simplicity.The stored procedure is called once for each row indexed. Given the rowid of the current row,procedure must write the text of the document into its second argument, whose type you specify withoutput_type.
Constraints
The following constraints apply to procedure:
procedure can be owned by any user, but the user must have database permissions to executeprocedure correctly
procedure must be executable by the index owner
procedure must not issue DDL or transaction control statements like
COMMIT
Editing Procedure after Indexing
If you change or edit the stored procedure, indexes based upon it will not be notified, so you must manually re-create such indexes. So if the stored procedure makes use of other columns, and those column values change, the row will not be re-indexed. The row is re-indexed only when the indexed column changes.
- output_type
Specify the datatype of the second argument to procedure. You can use either
CLOB
,BLOB
,CLOB_LOC
,BLOB_LOC
, orVARCHAR2
.
USER_DATASTORE with CLOB Example
Consider a table in which the author, title, and text fields are separate, as in thearticles
table defined as follows:
create table articles( id number, author varchar2(80), title varchar2(120), text clob );
The author and title fields are to be part of the indexed document text. Assume userappowner
writes a stored procedure with the user datastore interface that synthesizes a document from the text, author, and title fields:
create procedure myproc(rid in rowid, tlob in out clob nocopy) is begin for c1 in (select author, title, text from articles where rowid = rid) loop
dbms_lob.writeappend(tlob, length(c1.title), c1.title); dbms_lob.writeappend(tlob, length(c1.author), c1.author); dbms_lob.writeappend(tlob, length(c1.text), c1.text);
end loop; end;
This procedure takes in a rowid and a temporary CLOB
locator, and concatenates all the article's columns into the temporaryCLOB
. The for loop executes only once.
The user appowner
creates the preference as follows:
begin
ctx_ddl.create_preference('myud', 'user_datastore'); ctx_ddl.set_attribute('myud', 'procedure', 'myproc'); ctx_ddl.set_attribute('myud', 'output_type', 'CLOB');
end;
When appowner
creates the index on articles(text)
using this preference, the indexing operation sees author and title in the document text.
USER_DATASTORE with BLOB_LOC Example
The following procedure might be used with OUTPUT_TYPE
BLOB_LOC
:
procedure myds(rid in rowid, dataout in out nocopy blob)is l_dtype varchar2(10); l_pk number;begin select dtype, pk into l_dtype, l_pk from mytable where rowid = rid; if (l_dtype = 'MOVIE') then select movie_data into dataout from movietab where fk = l_pk; elsif (l_dtype = 'SOUND') then select sound_data into dataout from soundtab where fk = l_pk; end if;end;
The user appowner
creates the preference as follows:
begin
ctx_ddl.create_preference('myud', 'user_datastore'); ctx_ddl.set_attribute('myud', 'procedure', 'myproc'); ctx_ddl.set_attribute('myud', 'output_type', 'blob_loc');
end;
NESTED_DATASTORE
Use the nested datastore type to index documents stored as rows in a nested table.
Table 2-7 NESTED_DATASTORE Attributes
When using the nested table datastore, you must index a dummy column, because the extensible indexing framework disallows indexing the nested table column. See the example.
DML on the nested table is not automatically propagated to the dummy column used for indexing. For DML on the nested table to be propagated to the dummy column, your application code or trigger must explicitly update the dummy column.
Filter defaults for the index are based on the type of the nested_text
column.
During validation, Oracle Text checks that the type exists and that the attributes you specify for nested_lineno and nested_text exist in the nested table type. Oracle Text does not check that the named nested table column exists in the indexed table.
NESTED_DATASTORE Example
This section shows an example of using the NESTED_DATASTORE
type to index documents stored as rows in a nested table.
Create the Nested Table
The following code creates a nested table and a storage table mytab for the nested table:
create type nt_rec as object ( lno number, -- line number ltxt varchar2(80) -- text of line);create type nt_tab as table of nt_rec;create table mytab ( id number primary key, -- primary key dummy char(1), -- dummy column for indexing doc nt_tab -- nested table)nested table doc store as myntab;
Insert Values into Nested Table
The following code inserts values into the nested table for the parent row with id equal to 1.
insert into mytab values (1, null, nt_tab());insert into table(select doc from mytab where id=1) values (1, 'the dog');insert into table(select doc from mytab where id=1) values (2, 'sat on mat ');commit;
Create Nested Table Preferences
The following code sets the preferences and attributes for the NESTED_DATASTORE
according to the definitions of the nested table typent_tab
and the parent table mytab
:
begin-- create nested datastore prefctx_ddl.create_preference('ntds','nested_datastore'); -- nest tab column in main tablectx_ddl.set_attribute('ntds','nested_column', 'doc'); -- nested table typectx_ddl.set_attribute('ntds','nested_type', 'scott.nt_tab');-- lineno column in nested tablectx_ddl.set_attribute('ntds','nested_lineno','lno');--text column in nested tablectx_ddl.set_attribute('ntds','nested_text', 'ltxt');end;
Create Index on Nested Table
The following code creates the index using the nested table datastore:
create index myidx on mytab(dummy) -- index dummy column, not nest tableindextype is ctxsys.context parameters ('datastore ntds');
Query Nested Datastore
The following select statement queries the index built from a nested table:
select * from mytab where contains(dummy, 'dog and mat')>0;-- returns document 1, since it has dog in line 1 and mat in line 2.
Filter Types
Use the filter types to create preferences that determine how text is filtered for indexing. Filters allow word processor and formatted documents as well as plain text, HTML, and XML documents to be indexed.
For formatted documents, Oracle Text stores documents in their native format and uses filters to build temporary plain text or HTML versions of the documents. Oracle Text indexes the words derived from the plain text or HTML version of the formatted document.
To create a filter preference, you must use one of the following types:
Table 2-8 Filter Types
MAIL_FILTER
to transform RFC-822, RFC-2045 messages in to indexable text.USER_FILTERUser-defined external filter to be used for custom filteringPROCEDURE_FILTERUser-defined stored procedure filter to be used for custom filtering.CHARSET_FILTER
Use the CHARSET_FILTER
to convert documents from a non-database character set to the character set used by the database.
CHARSET_FILTER
has the following attribute:
Table 2-9 CHARSET_FILTER Attributes
If you specify UTF16AUTO, this filter automatically detects the if the character set is UTF16 big- or little-endian.
Specify JAAUTO for Japanese character set auto-detection. This filter automatically detects the custom character specification in JA16EUC or JA16SJIS and converts to the database character set. This filter is useful in Japanese when your data files have mixed character sets.
See Also:
Oracle Database Globalization Support Guide for more information about the supported Globalization Support character sets.UTF-16 Big- and Little-Endian Detection
If your character set is UTF-16, you can specify UTF16AUTO to automatically detect big- or little-endian data. Oracle Text does so by examining the first two bytes of the document row.
If the first two bytes are 0xFE, 0xFF, the document is recognized as little-endian and the remainder of the document minus those two bytes is passed on for indexing.
If the first two bytes are 0xFF, 0xFE, the document is recognized as big-endian and the remainder of the document minus those two bytes is passed on for indexing.
If the first two bytes are anything else, the document is assumed to be big-endian and the whole document including the first two bytes is passed on for indexing.
Indexing Mixed-Character Set Columns
A mixed character set column is one that stores documents of different character sets. For example, a text table might store some documents in WE8ISO8859P1 and others in UTF8.
To index a table of documents in different character sets, you must create your base table with a character set column. In this column, you specify the document character set on a per-row basis. To index the documents, Oracle Text converts the documents into the database character set.
Character set conversion works with the CHARSET_FILTER
. When the charset column isNULL
or not recognized, Oracle Text assumes the source character set is the one specified in thecharset attribute.
Note:
Character set conversion also works with theAUTO_FILTER
when the document format column is set toTEXT
.Indexing Mixed-Character Set Example
For example, create the table with a charset column:
create table hdocs ( id number primary key, fmt varchar2(10), cset varchar2(20), text varchar2(80));
Create a preference for this filter:
begincxt_ddl.create.preference('cs_filter', 'CHARSET_FILTER');ctx_ddl.set_attribute('cs_filter', 'charset', 'UTF8');end
Insert plain-text documents and name the character set:
insert into hdocs values(1, 'text', 'WE8ISO8859P1', '/docs/iso.txt');insert into hdocs values (2, 'text', 'UTF8', '/docs/utf8.txt');commit;
Create the index and name the charset column:
create index hdocsx on hdocs(text) indextype is ctxsys.context parameters ('datastore ctxsys.file_datastore filter cs_filter format column fmt charset column cset');
AUTO_FILTER
The AUTO_FILTER
is a universal filter that filters most document formats, including PDF and Microsoft Word™ documents. Use it for indexing both single-format and mixed-format columns. This filter automatically bypasses plain-text, HTML, XHTML, SGML, and XML documents.
See Also:
For a list of the formats supported byAUTO_FILTER
and to learn more about how to set up your environment to use this filter, seeAppendix B, "Oracle Text Supported Document Formats".Note:
TheAUTO_FILTER
replaces theINSO_FILTER
, which has been deprecated. While every effort has been made to ensure maximal backward compatibility between the two filters, so that applications usingINSO_FILTER
will continue to work without modification, some differences may arise. Users should therefore useAUTO_FILTER
in their new programs and, when possible, replace instances ofINSO_FILTER
, and any system preferences or constants that make use of it, in older applications.The AUTO_FILTER
preference has the following attributes:
Table 2-10 AUTO_FILTER Attributes
AUTO_FILTER
timeout in seconds. Use a number between 0 and 42,949,672. Default is 120. Setting this value 0 disables the feature.How this wait period is used depends on how you set timeout_type.
This feature is disabled for rows for which the corresponding charset and format column cause theAUTO_FILTER
to bypass the row, such as when format is marked TEXT
.
Use this feature to prevent the Oracle Text indexing operation from waiting indefinitely on a hanging filter operation.
timeout_typeSpecify either HEURISTIC or FIXED. Default is HEURISTIC.Specify HEURISTIC for Oracle Text to check every TIMEOUT seconds if output from Outside In HTML Export has increased. The operation terminates for the document if output has not increased. An error is recorded in theCTX_USER_INDEX_ERRORS
view and Oracle Text moves to the next document row to be indexed.
Specify FIXED to terminate the Outside In HTML Export processing after TIMEOUT seconds regardless of whether filtering was progressing normally or just hanging. This value is useful when indexing throughput is more important than taking the time to successfully filter large documents.
output_formattingSetting this attribute has no effect on filter performance or filter output. It is maintained for backward compatibility.Indexing Formatted Documents
To index a text column containing formatted documents such as Microsoft Word, use theAUTO_FILTER
. This filter automatically detects the document format. You can use theCTXSYS
.AUTO_FILTER
system-defined preference in the parameter clause as follows:
create index hdocsx on hdocs(text) indextype is ctxsys.context parameters ('datastore ctxsys.file_datastore filter ctxsys.auto_filter');
Note:
TheCTXSYS.AUTO_FILTER
replaces CTXSYS.INSO_FILTER
, which has been deprecated. Programs making use ofCTXSYS.INSO_FILTER
should still work. New programs should use CTXSYS.AUTO_FILTER
.Explicitly Bypassing Plain Text or HTML in Mixed Format Columns
A mixed-format column is a text column containing more than one document format, such as a column that contains Microsoft Word, PDF, plain text, and HTML documents.
The AUTO_FILTER
can index mixed-format columns, automatically bypassing plain text, HTML, and XML documents. However, if you prefer not to depend on the built-in bypass mechanism, you can explicitly tag your rows as text and cause theAUTO_FILTER
to ignore the row and not process the document in any way.
The format column in the base table enables you to specify the type of document contained in the text column. You can specify the following document types:TEXT
, BINARY
, and IGNORE
. During indexing, theAUTO_FILTER
ignores any document typed TEXT
, assuming the charset column is not specified. (The difference between a document with aTEXT
format column type and one with an IGNORE
type is that theTEXT
document is indexed, but ignored by the filter, while the IGNORE
document is not indexed at all. Use IGNORE
to overlook documents such as image files, or documents in a language that you do not want to index.IGNORE
can be used with any filter type.)
To set up the AUTO_FILTER
bypass mechanism, you must create a format column in your base table.
For example:
create table hdocs ( id number primary key, fmt varchar2(10), text varchar2(80));
Assuming you are indexing mostly Word documents, you specify BINARY
in the format column to filter the Word documents. Alternatively, to have theAUTO_FILTER
ignore an HTML document, specify TEXT
in the format column.
For example, the following statements add two documents to the text table, assigning one format asBINARY
and the other TEXT
:
insert into hdocs values(1, 'binary', '/docs/myword.doc');insert in hdocs values (2, 'text', '/docs/index.html');commit;
To create the index, use CREATE
INDEX
and specify the format column name in the parameter string:
create index hdocsx on hdocs(text) indextype is ctxsys.context parameters ('datastore ctxsys.file_datastore filter ctxsys.auto_filter format column fmt');
If you do not specify TEXT
or BINARY
for the format column,BINARY
is used.
Note:
You need not specify the format column inCREATE
INDEX
when using theAUTO_FILTER
.Character Set Conversion With AUTO_FILTER
The AUTO_FILTER
converts documents to the database character set when the document format column is set toTEXT
. In this case, the AUTO_FILTER
looks at the charset column to determine the document character set.
If the charset column value is not an Oracle Text character set name, the document is passed through without any character set conversion.
Note:
You need not specify the charset column when using theAUTO_FILTER
.If you do specify the charset column and do not specify the format column, theAUTO_FILTER
works like the CHARSET_FILTER, except that in this case there is no Japanese character set auto-detection.
See Also:
"CHARSET_FILTER".NULL_FILTER
Use the NULL_FILTER
type when plain text or HTML is to be indexed and no filtering needs to be performed.NULL_FILTER
has no attributes.
Indexing HTML Documents
If your document set is entirely HTML, Oracle recommends that you use the NULL_FILTER
in your filter preference.
For example, to index an HTML document set, you can specify the system-defined preferences forNULL_FILTER
and HTML_SECTION_GROUP
as follows:
create index myindex on docs(htmlfile) indextype is ctxsys.context parameters('filter ctxsys.null_filter section group ctxsys.html_section_group');
See Also:
For more information on section groups and indexing HTML documents, see "Section Group Types".MAIL_FILTER
Use the MAIL_FILTER
to transform RFC-822, RFC-2045 messages in to indexable text. The following limitations hold for the input:
Document must be US-ASCII
Lines must not be longer than 1024 bytes
Document must be syntactically valid with regard to RFC-822.
Behavior for invalid input is not defined. Some deviations may be robustly handled by the filter without error. Others may result in a fetch-time or filter-time error.
The MAIL_FILTER
has the following attributes:
Table 2-11 MAIL_FILTER Attributes
INDEX_FIELDS
is set to "FROM":From: Scott Tiger
becomes:
<FROM>Scott Tiger</FROM>
Only top-level fields are transformed in this way.
AUTO_FILTER_TIMEOUTSpecify a timeout value for theAUTO_FILTER
filtering invoked by the mail filter. Default is 60. (Replaces theINSO_TIMEOUT
attribute and is backward compatible with INSO_TIMEOUT
.)AUTO_FILTER_OUTPUT_FORMATTINGSpecify eitherTRUE
or FALSE
. Default is TRUE
.This attribute replaces the previous INSO_OUTPUT_FORMATTING
attribute. However, it has no effect in the current release.
INDEX_FIELDS
attribute should be transformed. The fields of the top-level message part identified byINDEX_FIELDS
are always transformed to tag markup (see the previous description ofINDEX_FIELDS
); PART_FIELD_STYLE
controls the transformation of subsequent parts; for example, attached emails.Possible values include IGNORE
(the default), in which the part fields are not included for indexing;TAG
, in which the part field names are transformed to tags, as occurs with top-level part fields;FIELD
, in which the part field names are preserved as fields, not as tags; andTEXT
, in which the part field names are eliminated and only the field content is preserved for indexing. See"Mail_Filter Example" for an example of how PART_FIELD_STYLE
works.
Filter Behavior
This filter does the following for each document:
Read and remove header fields
Decode message body if needed, depending on Content-transfer-encoding field
Take action depending on the Content-Type field value and the user-specified behavior specified in a mail filter configuration file. (See"About the Mail Filter Configuration File".) The possible actions are:
produce the body in the output text (
INCLUDE
). If no character set is encountered in the INLCUDE parts in the Content-Type header field, Oracle defaults to the value you specify in the character set column in the base table. You name your populated character set column in the parameter string of the CREATE INDEX command.AUTO_FILTER
the body contents (AUTO_FILTER
directive).remove the body contents from the output text (
IGNORE
)
If no behavior is specified for the type in the configuration file, the defaults are as follows:
text/*: produce body in the output text
application/*:
AUTO_FILTER
the body contentsimage/*, audio/*, video/*, model/*: ignore
Multipart messages are parsed, and the mail filter applied recursively to each part. Each part is appended to the output.
All text produced will be charset-converted to the database character set, if needed.
About the Mail Filter Configuration File
The MAIL_FILTER
filter makes use of a mail filter configuration file, which contains directives specifying how a mail document should be filtered. The mail filter configuration file is a editable text file. Here you can override default behavior for each Content-Type. The configuration file also contains IANA-to-Oracle Globalization Support character set name mappings.
The location of the file must be in ORACLE_HOME
/ctx/config. The name of the file to use is stored in the new system parameterMAIL_FILTER_CONFIG_FILE
. On install, this is set to drmailfl.txt, which has useful default contents.
Oracle recommends that you create your own mail filter configuration files to avoid overwrite by the installation of a new version or patch set. The mail filter configuration file should be in the database character set.
Mail File Configuration File Structure
The file has two sections, BEHAVIOR and CHARSETS. You indicate the start of the behavior section as follows:
[behavior]
Each line following starts with a mime type, then whitespace, then behavior specification. TheMIME
type can be a full TYPE
/SUBTYPE
or justTYPE
, which will apply to all subtypes of that type. TYPE
/SUBTYPE
specification overridesTYPE
specification, which overrides default behavior. Behavior can beINCLUDE
, AUTO_FILTER
, or IGNORE
(see "Filter Behavior" for definitions). For instance:
application/zip IGNOREapplication/msword AUTO_FILTERmodel IGNORE
You cannot specify behavior for "multipart" or "message" types. If you do, such lines are ignored. Duplicate specification for a type replaces earlier specifications.
Comments can be included in the mail configuration file by starting lines with the # symbol.
The charset mapping section begins with
[charsets]
Lines consist of an IANA name, then whitespace, then a Oracle Globalization Support charset name, like:
US-ASCII US7ASCIISO-8859-1 WE8ISO8859P1
This file is the only way the mail filter gets the mappings. There are no defaults.
When you change the configuration file, the changes affect only the documents indexed after that point. You must flush the shared pool after changing the file.
Mail_Filter Example
Suppose we have an email with the following form, in which other emails with different subject lines are attached to our email:
To: somebody@someplaceSubject: mainheaderContent-Type: multipart/mixed. . .Content-Type: text/plainX-Ref: some_valueSubject: subheader 1. . .Content-Type: text/plainX-Control: blah blah blah Subject: subheader 2. . .
We set INDEX_FIELDS
to be "Subject" and, initially, PART_FIELD_STYLE
toIGNORE
.
CTX_DDL.CREATE_PREFERENCE('my_mail_filt', 'mail_filter');CTX_DDL_SET_ATTRIBUTE(my_mail_filt', 'INDEX_FILES', 'subject');CTX_DDL.SET ATTRIBUTE ('my_mail_filt', 'PART_FIELD_STYLE', 'ignore');
Now when the index is created, the file will be indexed as follows:
<SUBJECT>mainheader</SUBJECT>
If PART_FIELD_STYLE
is instead set to TAG
, this becomes:
<SUBJECT>mainheader</SUBJECT><SUBJECT>subheader1</SUBJECT><SUBJECT>subheader2</SUBJECT>
If PART_FIELD_STYLE
is set to FIELD
instead, this is the result:
<SUBJECT>mainheader<SUBJECT>SUBJECT:subheader1SUBJECT:subheader2
Finally, if PART_FIELD_STYLE
is instead set to TEXT
, then the result is:
<SUBJECT>mainheader</SUBJECT>subheader1subheader2
USER_FILTER
Use the USER_FILTER
type to specify an external filter for filtering documents in a column.USER_FILTER
has the following attribute:
Table 2-12 USER_FILTER Attributes
- command
Specify the executable for the single external filter used to filter all text stored in a column. If more than one document format is stored in the column, the external filter specified forcommand must recognize and handle all such formats.
On UNIX, the executable you specify must exist in the
$ORACLE_HOME/ctx/bin
directory. On Windows, the executable you specify must exist in the%ORACLE_HOME%/bin
directory.You must create your user-filter executable with two parameters: the first is the name of the input file to be read, and the second is the name of the output file to be written to.
If all the document formats are supported by
AUTO_FILTER
, useAUTO_FILTER
instead ofUSER_FILTER
unless additional tasks besides filtering are required for the documents.
User Filter Example
The following example Perl script to be used as the user filter. This script converts the input text file specified in the first argument to uppercase and writes the output to the location specified in the second argument:
#!/usr/local/bin/perlopen(IN, $ARGV[0]);open(OUT, ">".$ARGV[1]);while (<IN>){ tr/a-z/A-Z/; print OUT;}close (IN);close (OUT);
Assuming that this file is named upcase.pl
, create the filter preference as follows:
begin ctx_ddl.create_preference ( preference_name => 'USER_FILTER_PREF', object_name => 'USER_FILTER' ); ctx_ddl.set_attribute ('USER_FILTER_PREF','COMMAND','upcase.pl');end;
Create the index in SQL*Plus as follows:
create index user_filter_idx on user_filter ( docs ) indextype is ctxsys.context parameters ('FILTER USER_FILTER_PREF');
PROCEDURE_FILTER
Use the PROCEDURE_FILTER
type to filter your documents with a stored procedure. The stored procedure is called each time a document needs to be filtered.
This type has the following attributes:
Table 2-13 PROCEDURE_FILTER Attributes
VARCHAR2, BLOB, CLOB, FILE
output_typeType of output argument for stored procedure.VARCHAR2, CLOB, FILE
rowid_parameterInclude rowid parameter?TRUE/FALSE
format_parameterInclude format parameter?TRUE/FALSE
charset_parameterInclude charset parameter?TRUE/FALSE
- procedure
Specify the name of the stored procedure to use for filtering. The procedure can be a PL/SQL stored procedure. The procedure can be a safe callout or call a safe callout.
With the
rowid_parameter
,format_parameter
, andcharset_parameter
set to FALSE, the procedure can have one of the following signatures:PROCEDURE(IN BLOB, IN OUT NOCOPY CLOB)PROCEDURE(IN CLOB, IN OUT NOCOPY CLOB)PROCEDURE(IN VARCHAR, IN OUT NOCOPY CLOB)PROCEDURE(IN BLOB, IN OUT NOCOPY VARCHAR2)PROCEDURE(IN CLOB, IN OUT NOCOPY VARCHAR2)PROCEDURE(IN VARCHAR2, IN OUT NOCOPY VARCHAR2)PROCEDURE(IN BLOB, IN VARCHAR2)PROCEDURE(IN CLOB, IN VARCHAR2)PROCEDURE(IN VARCHAR2, IN VARCHAR2)
The first argument is the content of the unfiltered row as passed out by the datastore. The second argument is for the procedure to pass back the filtered document text.
The procedure attribute is mandatory and has no default.
- input_type
Specify the type of the input argument of the filter procedure. You can specify one of the following:
Type Description procedureName of the filter stored procedure.input_typeType of input argument for stored procedure.output_typeType of output argument for stored procedure.rowid_parameterInclude rowid parameter?The input_type attribute is not mandatory. If not specified, BLOB is the default.
- output_type
Specify the type of output argument of the filter procedure. You can specify one of the following types:
Type Description CLOB
The output argument is IN OUT NOCOPY CLOB. Your procedure must write the filtered content to the CLOB passed in.VARCHAR2
The output argument is IN OUT NOCOPY VARCHAR2. Your procedure must write the filtered content to the VARCHAR2 variable passed in.FILE
The output argument must beIN
VARCHAR2
. On entering the filter procedure, the output argument is the name of a temporary file. The filter procedure must write the filtered contents to this named file.Using a FILE output type is useful only when the procedure is a safe callout, which can write to the file.
The output_type attribute is not mandatory. If not specified,
CLOB
is the default.- rowid_ parameter
When you specify TRUE, the rowid of the document to be filtered is passed as the first parameter, before the input and output parameters.
For example, with
INPUT_TYPE
BLOB
,OUTPUT_TYPE
CLOB
, andROWID_PARAMETER
TRUE
, the filter procedure must have the signature as follows:procedure(in rowid, in blob, in out nocopy clob)
This attribute is useful for when your procedure requires data from other columns or tables. This attribute is not mandatory. The default is
FALSE
.- format_parameter
When you specify
TRUE
, the value of the format column of the document being filtered is passed to the filter procedure before input and output parameters, but after the rowid parameter, if enabled.You specify the name of the format column at index time in the parameters string, using the keyword
'format column <columnname>'
. The parameter type must beIN
VARCHAR2
.The format column value can be read by means of the rowid parameter, but this attribute enables a single filter to work on multiple table structures, because the format attribute is abstracted and does not require the knowledge of the name of the table or format column.
FORMAT_PARAMETER
is not mandatory. The default isFALSE
.- charset_parameter
When you specify
TRUE
, the value of the charset column of the document being filtered is passed to the filter procedure before input and output parameters, but after the rowid and format parameter, if enabled.You specify the name of the charset column at index time in the parameters string, using the keyword
'charset column <columnname>'
. The parameter type must beIN
VARCHAR2
.CHARSET_PARAMETER
attribute is not mandatory. The default is FALSE.
Parameter Order
ROWID_PARAMETER
, FORMAT_PARAMETER
, and CHARSET_PARAMETER
are all independent. The order is rowid, the format, then charset, but the filter procedure is passed only the minimum parameters required.
For example, assume that INPUT_TYPE
is BLOB
and OUTPUT_TYPE
is CLOB
. If your filter procedure requires all parameters, the procedure signature must be:
(id IN ROWID, format IN VARCHAR2, charset IN VARCHAR2, input IN BLOB, output INOUT NOCOPY CLOB)
If your procedure requires only the ROWID
, then the procedure signature must be:
(id IN ROWID,input IN BLOB, ouput IN OUT NOCOPY CLOB)
Procedure Filter Execute Requirements
In order to create an index using a PROCEDURE_FILTER
preference, the index owner must have execute permission on the procedure.
Error Handling
The filter procedure can raise any errors needed through the normal PL/SQL raise_application_error facility. These errors are propagated to theCTX_USER_INDEX_ERRORS view or reported to the user, depending on how the filter is invoked.
Procedure Filter Preference Example
Consider a filter procedure CTXSYS.NORMALIZE
that you define with the following signature:
PROCEDURE NORMALIZE(id IN ROWID, charset IN VARCHAR2, input IN CLOB, output IN OUT NOCOPY VARCHAR2);
To use this procedure as your filter, set up your filter preference as follows:
beginctx_ddl.create_preference('myfilt', 'procedure_filter');ctx_ddl.set_attribute('myfilt', 'procedure', 'normalize');ctx_ddl.set_attribute('myfilt', 'input_type', 'clob');ctx_ddl.set_attribute('myfilt', 'output_type', 'varchar2');ctx_ddl.set_attribute('myfilt', 'rowid_parameter', 'TRUE');ctx_ddl.set_attribute('myfilt', 'charset_parameter', 'TRUE');end;
Lexer Types
Use the lexer preference to specify the language of the text to be indexed. To create a lexer preference, you must use one of the following lexer types:
Table 2-14 Lexer Types
BASIC_LEXER
Use the BASIC_LEXER
type to identify tokens for creating Text indexes for English and all other supported whitespace-delimited languages.
The BASIC_LEXER
also enables base-letter conversion, composite word indexing, case-sensitive indexing and alternate spelling for whitespace-delimited languages that have extended character sets.
In English and French, you can use the BASIC_LEXER
to enable theme indexing.
Note:
Any processing the lexer does to tokens before indexing (for example, removal of characters, and base-letter conversion) are also performed on query terms at query time. This ensures that the query terms match the form of the tokens in the Text index.BASIC_LEXER
supports any database character set.
BASIC_LEXER
has the following attributes:
Table 2-15 BASIC_LEXER Attributes
CARRIAGE_RETURN (\r)
base_letterNO (disabled) YES (enabled)base_letter_typeGENERIC (default) SPECIFICoverride_base_letterTRUEFALSE (default)
mixed_caseNO (disabled) YES (enabled)compositeDEFAULT (no composite word indexing, default) GERMAN (German composite word indexing) DUTCH (Dutch composite word indexing)index_stems0 NONE1 ENGLISH
2 DERIVATIONAL
3 DUTCH
4 FRENCH
5 GERMAN
6 ITALIAN
7 SPANISH
index_themesYES (enabled) NO (disabled, default) NO (disabled, default)index_textYES (enabled, default NO (disabled)prove_themesYES (enabled, default) NO (disabled)theme_languageAUTO (default) (any Globalization Support language)alternate_spellingGERMAN (German alternate spelling) DANISH (Danish alternate spelling) SWEDISH (Swedish alternate spelling) NONE (No alternate spelling, default)new_german_spellingYESNO (default)
- continuation
Specify the characters that indicate a word continues on the next line and should be indexed as a single token. The most common continuation characters are hyphen '-' and backslash '\'.
- numgroup
Specify a single character that, when it appears in a string of digits, indicates that the digits are groupings within a larger single unit.
For example, comma ',' might be defined as a numgroup character because it often indicates a grouping of thousands when it appears in a string of digits.
- numjoin
Specify the characters that, when they appear in a string of digits, cause Oracle Text to index the string of digits as a single unit or word.
For example, period '.' can be defined as numjoin characters because it often serves as decimal points when it appears in a string of digits.
Note:
The default values for numjoin and numgroup are determined by the Globalization Support initialization parameters that are specified for the database.In general, a value need not be specified for either numjoin or numgroup when creating a lexer preference for
BASIC_LEXER
.- printjoins
Specify the non alphanumeric characters that, when they appear anywhere in a word (beginning, middle, or end), are processed as alphanumeric and included with the token in the Text index. This includesprintjoins that occur consecutively.
For example, if the hyphen '-' and underscore '_' characters are defined as printjoins, terms such as pseudo-intellectual and_file_ are stored in the Text index as pseudo-intellectual and _file_.
Note:
If a printjoins character is also defined as a punctuations character, the character is only processed as an alphanumeric character if the character immediately following it is a standard alphanumeric character or has been defined as aprintjoins or skipjoins character.- punctuations
Specify the non-alphanumeric characters that, when they appear at the end of a word, indicate the end of a sentence. The defaults are period '.', question mark '?', and exclamation point '!'.
Characters that are defined as punctuations are removed from a token before text indexing. However, if apunctuations character is also defined as a printjoins character, the character is removed only when it is the last character in the token.
For example, if the period (.) is defined as both a printjoins and a punctuations character, the following transformations take place during indexing and querying as well:
Token Indexed Token .doc.docdog.docdog.docdog..docdog..docdog.dogdog...dog..In addition,
BASIC_LEXER
uses punctuations characters in conjunction withnewline and whitespace characters to determine sentence and paragraph delimiters for sentence/paragraph searching.- skipjoins
Specify the non-alphanumeric characters that, when they appear within a word, identify the word as a single token; however, the characters are not stored with the token in the Text index.
For example, if the hyphen character '-' is defined as a skipjoins, the word pseudo-intellectual is stored in the Text index aspseudointellectual.
Note:
printjoins and skipjoins are mutually exclusive. The same characters cannot be specified for both attributes.- startjoins/endjoins
For startjoins, specify the characters that when encountered as the first character in a token explicitly identify the start of the token. The character, as well as any otherstartjoins characters that immediately follow it, is included in the Text index entry for the token. In addition, the firststartjoins character in a string of startjoins characters implicitly ends the previous token.
For endjoins, specify the characters that when encountered as the last character in a token explicitly identify the end of the token. The character, as well as any otherstartjoins characters that immediately follow it, is included in the Text index entry for the token.
The following rules apply to both startjoins andendjoins:
The characters specified for startjoins/endjoins cannot occur in any of the other attributes for
BASIC_LEXER
.startjoins/endjoins characters can occur only at the beginning or end of tokens
Printjoins differ from endjoins and startjoins in that position does not matter. For example,$35 will be indexed as one token if $ is a startjoin or a printjoin, but as two tokens if it is defined as an endjoin.
- whitespace
Specify the characters that are treated as blank spaces between tokens.
BASIC_LEXER
uses whitespace characters in conjunction withpunctuations and newline characters to identify character strings that serve as sentence delimiters for sentence and paragraph searching.The predefined default values for whitespace are 'space' and 'tab'. These values cannot be changed. Specifying characters aswhitespace characters adds to these defaults.
- newline
Specify the characters that indicate the end of a line of text.
BASIC_LEXER
uses newline characters in conjunction with punctuations and whitespace characters to identify character strings that serve as paragraph delimiters for sentence and paragraph searching.The only valid values for newline are
NEWLINE
andCARRIAGE_RETURN
(for carriage returns). The default isNEWLINE
.- base_letter
Specify whether characters that have diacritical marks (umlauts, cedillas, acute accents, and so on) are converted to their base form before being stored in the Text index. The default is NO (base-letter conversion disabled). For more information on base-letter conversions and
base_letter_type
, see Base-Letter Conversion.- base_letter_type
Specify GENERIC or SPECIFIC.
The GENERIC value is the default and means that base letter transformation uses one transformation table that applies to all languages. For more information on base-letter conversions and
base_letter_type
, see Base-Letter Conversion.- override_base_letter
When
base_letter
is enabled at the same time asalternate_spelling
, it is sometimes necessary to overridebase_letter
to prevent unexpected results from serial transformations. SeeOverriding Base-Letter Transformations with Alternate Spelling. Default isFALSE
.- mixed_case
Specify whether the lexer leaves the tokens exactly as they appear in the text or converts the tokens to all uppercase. The default is NO (tokens are converted to all uppercase).
Note:
Oracle Text ensures that word queries match the case sensitivity of the index being queried. As a result, if you enable case sensitivity for your Text index, queries against the index are always case sensitive.- composite
Specify whether composite word indexing is disabled or enabled for either
GERMAN
orDUTCH
text. The default isDEFAULT
(composite word indexing disabled).Words that are usually one entry in a German dictionary are not split into composite stems, while words that aren't dictionary entries are split into composite stems.
In order to retrieve the indexed composite stems, you must issue a stem query, such as$bahnhof. The language of the wordlist stemmer must match the language of the composite stems.
Stemming User-Dictionaries
Oracle Text ships with a system stemming dictionary ($ORACLE_HOME/ctx/data/enlx/dren.dct
), which is used for bothENGLISH
and DERIVATIONAL
stemming. You can create a user-dictionary for your own language to customize how words are decomposed. These dictionaries are shown inTable 2-16.
Table 2-16 Stemming User-Dictionaries
$ORACLE_HOME/ctx/data/frlx/drfr.dct
French$ORACLE_HOME/ctx/data/delx/drde.dct
German$ORACLE_HOME/ctx/data/nllx/drnl.dct
Dutch$ORACLE_HOME/ctx/data/itlx/drit.dct
Italian$ORACLE_HOME/ctx/data/eslx/dres.dct
SpanishStemming user-dictionaries are not supported for languages other than those listed inTable 2-16.
The format for the user dictionary is as follows:
input term <tab> output term
The individual parts of the decomposed word must be separated by the # character. The following example entries are for the German wordHauptbahnhof:
Hauptbahnhof<tab>Haupt#BahnhofHauptbahnhofes<tab>Haupt#BahnhofHauptbahnhof<tab>Haupt#BahnhofHauptbahnhoefe<tab>Haupt#Bahnhof
- index_themes
Specify YES to index theme information in English or French. This makes
ABOUT
queries more precise. The index_themes andindex_text attributes cannot both be NO.If you use the
BASIC_LEXER
and specify no value for index_themes, this attribute defaults toNO
.You can set this parameter to TRUE for any indextype including CTXCAT. To issue an ABOUT query with CATSEARCH, use the query template with CONTEXT grammar.
Note:
index_themes
requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see theOracle Text Application Developer's Guide.- prove_themes
Specify
YES
to prove themes. Theme proving attempts to find related themes in a document. When no related themes are found, parent themes are eliminated from the document.While theme proving is acceptable for large documents, short text descriptions with a few words rarely prove parent themes, resulting in poor recall performance with
ABOUT
queries.Theme proving results in higher precision and less recall (less rows returned) for
ABOUT
queries. For higher recall inABOUT
queries and possibly less precision, you can disable theme proving. Default isYES
.The
prove_themes
attribute is supported forCONTEXT
andCTXRULE
indexes.- theme_language
Specify which knowledge base to use for theme generation when index_themes is set to
YES
. When index_themes isNO
, setting this parameter has no effect on anything.You can specify any Globalization Support language or
AUTO
. You must have a knowledge base for the language you specify. This release provides a knowledge base in only English and French. In other languages, you can create your own knowledge base.See Also:
"Adding a Language-Specific Knowledge Base" inChapter 14, "Oracle Text Executables".The default is
AUTO
, which instructs the system to set this parameter according to the language of the environment.- index_stems
Specify the stemmer to use for stem indexing. You can choose one of
NONE
ENGLISH
DERIVATIONAL
DUTCH
FRENCH
GERMAN
ITALIAN
SPANISH
Tokens are stemmed to a single base form at index time in addition to the normal forms. Indexing stems enables better query performance for stem ($) queries, such as$computed.
- index_text
Specify
YES
to index word information. The index_themes and index_text attributes cannot both beNO
.The default is
NO
.- alternate_spelling
Specify either
GERMAN
,DANISH
, orSWEDISH
to enable the alternate spelling in one of these languages. Enabling alternate spelling enables you to query a word in any of its alternate forms.Alternate spelling is off by default; however, in the language-specific scripts that Oracle provides in
admin/defaults
(drdefd.sql
for German,drdefdk.sql
for Danish, anddrdefs.sql
for Swedish), alternate spelling is turned on. If your installation uses these scripts, then alternate spelling is on. However, You can specifyNONE
for no alternate spelling. For more information about the alternate spelling conventions Oracle Text uses, seeAlternate Spelling.- new_german_spelling
Specify whether the queries using the
BASIC_LEXER
return both traditional and reformed (new) spellings of German words. Ifnew_german_spelling
is set to YES, then both traditional and new forms of words are indexed. If it is set to NO, then the word will be indexed only as it as provided in the query. The default is NO.See Also:
"New German Spelling"
BASIC_LEXER Example
The following example sets printjoin characters and disables theme indexing with theBASIC_LEXER
:
beginctx_ddl.create_preference('mylex', 'BASIC_LEXER');ctx_ddl.set_attribute('mylex', 'printjoins', '_-');ctx_ddl.set_attribute ( 'mylex', 'index_themes', 'NO');ctx_ddl.set_attribute ( 'mylex', 'index_text', 'YES'); end;
To create the index with no theme indexing and with printjoins characters set as described, issue the following statement:
create index myindex on mytable ( docs ) indextype is ctxsys.context parameters ( 'LEXER mylex' );
MULTI_LEXER
Use MULTI_LEXER
to index text columns that contain documents of different languages. For example, you can use this lexer to index a text column that stores English, German, and Japanese documents.
This lexer has no attributes.
You must have a language column in your base table. To index multi-language tables, you specify the language column when you create the index.
You create a multi-lexer preference with the CTX_DDL.CREATE_PREFERENCE
. You add language-specific lexers to the multi-lexer preference with theCTX_DDL.ADD_SUB_LEXER
procedure.
During indexing, the MULTI_LEXER
examines each row's language column value and switches in the language-specific lexer to process the document.
The WORLD_LEXER
lexer also performs multi-language indexing, but without the need for separate language columns (that is, it has automatic language detection). For more onWORLD_LEXER
, see "WORLD_LEXER".
Multi-language Stoplists
When you use the MULTI_LEXER
, you can also use a multi-language stoplist for indexing.
See Also:
"Multi-Language Stoplists".MULTI_LEXER Example
Create the multi-language table with a primary key, a text column, and a language column as follows:
create table globaldoc ( doc_id number primary key, lang varchar2(3), text clob);
Assume that the table holds mostly English documents, with the occasional German or Japanese document. To handle the three languages, you must create three sub-lexers, one for English, one for German, and one for Japanese:
ctx_ddl.create_preference('english_lexer','basic_lexer');ctx_ddl.set_attribute('english_lexer','index_themes','yes');ctx_ddl.set_attribute('english_lexer','theme_language','english');ctx_ddl.create_preference('german_lexer','basic_lexer');ctx_ddl.set_attribute('german_lexer','composite','german');ctx_ddl.set_attribute('german_lexer','mixed_case','yes');ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');ctx_ddl.create_preference('japanese_lexer','japanese_vgram_lexer');
Create the multi-lexer preference:
ctx_ddl.create_preference('global_lexer', 'multi_lexer');
Since the stored documents are mostly English, make the English lexer the default using CTX_DDL.ADD_SUB_LEXER:
ctx_ddl.add_sub_lexer('global_lexer','default','english_lexer');
Now add the German and Japanese lexers in their respective languages with CTX_DDL.ADD_SUB_LEXER procedure. Also assume that the language column is expressed in the standard ISO 639-2 language codes, so add those as alternate values.
ctx_ddl.add_sub_lexer('global_lexer','german','german_lexer','ger');ctx_ddl.add_sub_lexer('global_lexer','japanese','japanese_lexer','jpn');
Now create the index globalx
, specifying the multi-lexer preference and the language column in the parameter clause as follows:
create index globalx on globaldoc(text) indextype is ctxsys.contextparameters ('lexer global_lexer language column lang');
Querying Multi-Language Tables
At query time, the multi-lexer examines the language setting and uses the sub-lexer preference for that language to parse the query. If the language is not set, then the default lexer is used.
Otherwise, the query is parsed and run as usual. The index contains tokens from multiple languages, so such a query can return documents in several languages. To limit your query to a given language, use a structured clause on the language column.
CHINESE_VGRAM_LEXER
The CHINESE_VGRAM_LEXER
type identifies tokens in Chinese text for creating Text indexes.
CHINESE_VGRAM_LEXER Attribute
The CHINESE_VGRAM_LEXER
has the following attribute:
Table 2-17 CHINESE_VGRAM_LEXER Attributes
YES
and NO
(default).Character Sets
You can use this lexer if your database character set is one of the following:
AL32UTF8
ZHS16CGB231280
ZHS16GBK
ZHS32GB18030
ZHT32EUC
ZHT16BIG5
ZHT32TRIS
ZHT16MSWIN950
ZHT16HKSCS
UTF8
CHINESE_LEXER
The CHINESE_LEXER
type identifies tokens in traditional and simplified Chinese text for creating Oracle Text indexes.
This lexer offers the following benefits over the CHINESE_VGRAM_LEXER
:
generates a smaller index
better query response time
generates real word tokens resulting in better query precision
supports stop words
Because the CHINESE_LEXER
uses a different algorithm to generate tokens, indexing time is longer than withCHINESE_VGRAM_LEXER
.
You can use this lexer if your database character is one of the Chinese or Unicode character sets supported by Oracle.
CHINESE_LEXER Attribute
The CHINESE_LEXER
has the following attribute:
Table 2-18 CHINESE_LEXER Attributes
YES
and NO
(default).Customizing the Chinese Lexicon
You can modify the existing lexicon (dictionary) used by the Chinese lexer, or create your own Chinese lexicon, with thectxlc
command.
See Also:
Lexical Compiler (ctxlc) inOracle Text ExecutablesJAPANESE_VGRAM_LEXER
The JAPANESE_VGRAM_LEXER
type identifies tokens in Japanese for creating Text indexes. It has no attributes. This lexer supports the stem ($) operator.
JAPANESE_VGRAM_LEXER Attributes
This lexer has the following attributes:
Table 2-19 JAPANESE_VGRAM_LEXER Attributes
NONE
or ALL
to ignore certain Japanese blank characters, such as a full-width forward slash or a full-width middle dot. Default isNONE
.mixed_case_ASCII7Enable mixed-case (upper- and lower-case) searches of ASCII7 text (for example,cat and Cat). Allowable values areYES
and NO
(default).JAPANESE_VGRAM_LEXER Character Sets
You can use this lexer if your database character set is one of the following:
JA16SJIS
JA16EUC
UTF8
AL32UTF8
JA16EUCTILDE
JA16EUCYEN
JA16SJISTILDE
JA16SJISYEN
JAPANESE_LEXER
The JAPANESE_LEXER
type identifies tokens in Japanese for creating Text indexes. This lexer supports the stem ($) operator.
This lexer offers the following benefits over the JAPANESE_VGRAM_LEXER
:
generates a smaller index
better query response time
generates real word tokens resulting in better query precision
Because the JAPANESE_LEXER
uses a new algorithm to generate tokens, indexing time is longer than withJAPANESE_VGRAM_LEXER
.
Customizing the Japanese Lexicon
You can modify the existing lexicon (dictionary) used by the Japanese lexer, or create your own Japanese lexicon, with thectxlc
command.
See Also:
Lexical Compiler (ctxlc) inOracle Text ExecutablesJAPANESE_LEXER Attributes
This lexer has the following attributes:
Table 2-20 JAPANESE_LEXER Attributes
NONE
or ALL
to ignore certain Japanese blank characters, such as a full-width forward slash or a full-width middle dot. Default isNONE
.mixed_case_ASCII7Enable mixed-case (upper- and lower-case) searches of ASCII7 text (for example,cat and Cat). Allowable values areYES
and NO
(default).JAPANESE LEXER Character Sets
The JAPANESE_LEXER
supports the following character sets:
JA16SJIS
JA16EUC
UTF8
AL32UTF8
JA16EUCTILDE
JA16EUCYEN
JA16SJISTILDE
JA16SJISYEN
Japanese Lexer Example
When you specify JAPANESE_LEXER
for creating text index, the JAPANESE_LEXER
resolves a sentence into words.
For example, the following compound word (natural language institute)
Description of the illustration nihongo1.gif
is indexed as three tokens:
Description of the illustration nihongo2.gif
In order to resolve a sentence into words, the internal dictionary is referenced. When a word cannot be found in the internal dictionary, Oracle Text uses theJAPANESE_VGRAM_LEXER
to resolve it.
KOREAN_MORPH_LEXER
The KOREAN_MORPH_LEXER
type identifies tokens in Korean text for creating Oracle Text indexes.
Supplied Dictionaries
The KOREAN_MORPH_LEXER
uses four dictionaries:
Table 2-21 KOREAN_MORPH_LEXER Dictionaries
$ORACLE_HOME/ctx/data/kolx/drk2sdic.dat
Grammar$ORACLE_HOME/ctx/data/kolx/drk2gram.dat
Stopword$ORACLE_HOME/ctx/data/kolx/drk2xdic.dat
User-defined$ORACLE_HOME/ctx/data/kolx/drk2udic.dat
The grammar, user-defined, and stopword dictionaries should be written using the KSC 5601 or MSWIN949 character sets. You can modify these dictionaries using the defined rules. The system dictionary must not be modified.
You can add unregistered words to the user-defined dictionary file. The rules for specifying new words are in the file.
Supported Character Sets
You can use KOREAN_MORPH_LEXER
if your database character set is one of the following:
KO16KSC5601
KO16MSWIN949
UTF8
AL32UTF8
The KOREAN_MORPH_LEXER
enables mixed-case searches.
Unicode Support
The KOREAN_MORPH_LEXER
supports:
words in non-KSC5601 Korean characters defined in Unicode
supplementary characters
See Also:
For information on supplementary characters, see the Oracle Database Globalization Support GuideSome Korean documents may have non-KSC5601 characters in them. As the KOREAN_MORPH_LEXER
can recognize all possible 11,172 Korean (Hangul) characters, such documents can also be interpreted by using the UTF8 or AL32UTF8 character sets.
Use the AL32UTF8 character set for your database to extract surrogate characters. By default, theKOREAN_MORPH_LEXER
extracts all series of surrogate characters in a document as one token for each series.
Limitations on Korean Unicode Support
For conversion Hanja to Hangul (Korean), the KOREAN_MORPH_LEXER
supports only the 4888 Hanja characters defined in KSC5601.
KOREAN_MORPH_LEXER Attributes
When you use the KOREAN_MORPH_LEXER
, you can specify the following attributes:
Table 2-22 KOREAN_MORPH_LEXER Attributes
TRUE
or FALSE
to index verbs, adjectives, and adverbs. Default isFALSE
.one_char_wordSpecify TRUE
or FALSE
to index one syllable. Default isFALSE
.numberSpecify TRUE
or FALSE
to index number. Default isFALSE
.user_dicSpecify TRUE
or FALSE
to index user dictionary. Default isTRUE
.stop_dicSpecify TRUE
of FALSE
to use stop-word dictionary. Default isTRUE
. The stop-word dictionary belongs to KOREAN_MORPH_LEXER
.compositeSpecify indexing style of composite noun.Specify COMPOSITE_ONLY
to index only composite nouns.
Specify NGRAM
to index all noun components of a composite noun.
Specify COMPONENT_WORD
to index single noun components of composite nouns as well as the composite noun itself. Default isCOMPONENT_WORD
.
The following example describes the difference between NGRAM
and COMPONENT_WORD
.
TRUE
or FALSE
for morphological analysis. If set toFALSE
, tokens are created from the words that are divided by delimiters such as white space in the document. Default isTRUE
.to_upperSpecify TRUE
or FALSE
to convert English to uppercase. Default isTRUE
.hanjaSpecify TRUE
to index hanja characters. If set toFALSE
, hanja characters are converted to hangul characters. Default isFALSE
.long_wordSpecify TRUE
to index long words that have more than 16 syllables in Korean. Default isFALSE
.japaneseSpecify TRUE
to index Japanese characters in Unicode (only in the 2-byte area). Default isFALSE
.englishSpecify TRUE
to index alphanumeric strings. Default isTRUE
.Limitations
Sentence and paragraph sections are not supported with the KOREAN_MORPH_LEXER
.
KOREAN_MORPH_LEXER Example: Setting Composite Attribute
You can use the composite attribute to control how composite nouns are indexed.
NGRAM Example
When you specify NGRAM
for the composite attribute, composite nouns are indexed with all possible component tokens. For example, the following composite noun (information processing institute)
Description of the illustration 1.jpg
is indexed as six tokens:
Description of the illustration 2.jpg
Description of the illustration 3.jpg
You can specify NGRAM
indexing as follows:
beginctx_ddl.create_preference('my_lexer','KOREAN_MORPH_LEXER');ctx_ddl.set_attribute('my_lexer','COMPOSITE','NGRAM');end
To create the index:
create index koreanx on korean(text) indextype is ctxsys.contextparameters ('lexer my_lexer');
COMPONENT_WORD Example
When you specify COMPONENT_WORD
for the composite attribute, composite nouns and their components are indexed. For example, the following composite noun (information processing institute)
Description of the illustration 1.jpg
is indexed as four tokens:
Description of the illustration 1.jpg
Description of the illustration comp.jpg
You can specify COMPONENT_WORD
indexing as follows:
beginctx_ddl.create_preference('my_lexer','KOREAN_MORPH_LEXER');ctx_ddl.set_attribute('my_lexer','COMPOSITE','COMPONENT_WORD');end
To create the index:
create index koreanx on korean(text) indextype is ctxsys.contextparameters ('lexer my_lexer');
USER_LEXER
Use USER_LEXER to plug in your own language-specific lexing solution. This enables you to define lexers for languages that are not supported by Oracle Text. It also enables you to define a new lexer for a language that is supported but whose lexer is inappropriate for your application.
The user-defined lexer you register with Oracle Text is composed of two routines that you must supply:
Table 2-23 User-Defined Routines for USER_LEXER
Limitations
The following features are not supported with the USER_LEXER
:
CTX_DOC.GIST
andCTX_DOC.THEMES
CTX_QUERY.HFEEDBACK
ABOUT
query operatorCTXRULE
indextypeVGRAM
indexing algorithm
USER_LEXER Attributes
USER_LEXER has the following attributes:
Table 2-24 USER_LEXER Attributes
INDEX_PROCEDURE
This callback stored procedure is called by Oracle Text as needed to tokenize a document or a stop word found in the stoplist object.
Requirements
This procedure can be a PL/SQL stored procedure.
The index owner must have EXECUTE privilege on this stored procedure.
This stored procedure must not be replaced or dropped after the index is created. You can replace or drop this stored procedure after the index is dropped.
Parameters
Two different interfaces are supported for the user-defined lexer indexing procedure:
VARCHAR2 Interface
CLOB Interface
Restrictions
This procedure must not perform any of the following operations:
rollback
explicitly or implicitly commit the current transaction
issue any other transaction control statement
alter the session language or territory
The child elements of the root element tokens of the XML document returned must be in the same order as the tokens occur in the document or stop word being tokenized.
The behavior of this stored procedure must be deterministic with respect to all parameters.
INPUT_TYPE
Two different interfaces are supported for the User-defined lexer indexing procedure. One interface enables the document or stop word and the corresponding tokens encoded as XML to be passed as VARCHAR2 datatype whereas the other interface uses the CLOB datatype. This attribute indicates the interface implemented by the stored procedure specified by the INDEX_PROCEDURE attribute.
VARCHAR2 Interface
BASIC_WORDLIST AttributesTable 2-25 describes the interface that enables the document or stop word from stoplist object to be tokenized to be passed as VARCHAR2 from Oracle Text to the stored procedure and for the tokens to be passed as VARCHAR2 as well from the stored procedure back to Oracle Text.
Your user-defined lexer indexing procedure should use this interface when all documents in the column to be indexed are smaller than or equal to 32512 bytes and the tokens can be represented by less than or equal to 32512 bytes. In this case the CLOB interface given in Table 2-26 can also be used, although the VARCHAR2 interface will generally perform faster than the CLOB interface.
This procedure must be defined with the following parameters:
Table 2-25 VARCHAR2 Interface for INDEX_PROCEDURES
If the document is larger than 32512 bytes then Oracle Text will report a document level indexing error.
2IN OUTVARCHAR2Tokens encoded as XML.If the document contains no tokens, then either NULL must be returned or the tokens element in the XML document returned must contain no child elements.
Byte length of the data must be less than or equal to 32512.
To improve performance, use the NOCOPY hint when declaring this parameter. This passes the data by reference, rather than passing data by value.
The XML document returned by this procedure should not include unnecessary whitespace characters (typically used to improve readability). This reduces the size of the XML document which in turn minimizes the transfer time.
To improve performance, index_procedure should not validate the XML document with the corresponding XML schema at run-time.
Note that this parameter is IN OUT for performance purposes. The stored procedure has no need to use the IN value.
3INBOOLEANOracle Text sets this parameter to TRUE when Oracle Text needs the character offset and character length of the tokens as found in the document being tokenized.Oracle Text sets this parameter to FALSE when Text is not interested in the character offset and character length of the tokens as found in the document being tokenized. This implies that the XML attributes off and len must not be used.
CLOB Interface
Table 2-26 describes the CLOB interface that enables the document or stop word from stoplist object to be tokenized to be passed as CLOB from Oracle Text to the stored procedure and for the tokens to be passed as CLOB as well from the stored procedure back to Oracle Text.
The user-defined lexer indexing procedure should use this interface when at least one of the documents in the column to be indexed is larger than 32512 bytes or the corresponding tokens are represented by more than 32512 bytes.
Table 2-26 CLOB Interface for INDEX_PROCEDURE
To improve performance, use the NOCOPY hint when declaring this parameter. This passes the data by reference, rather than passing data by value.
The XML document returned by this procedure should not include unnecessary whitespace characters (typically used to improve readability). This reduces the size of the XML document which in turn minimizes the transfer time.
To improve performance, index_procedure should not validate the XML document with the corresponding XML schema at run-time.
Note that this parameter is IN OUT for performance purposes. The stored procedure has no need to use the IN value. The IN value will always be a truncated CLOB.
The first and second parameters are temporary CLOBS. Avoid assigning these CLOB locators to other locator variables. Assigning the formal parameter CLOB locator to another locator variable causes a new copy of the temporary CLOB to be created resulting in a performance hit.
QUERY_PROCEDURE
This callback stored procedure is called by Oracle Text as needed to tokenize words in the query. A space-delimited group of characters (excluding the query operators) in the query will be identified by Oracle Text as aword.
Requirements
This procedure can be a PL/SQL stored procedure.
The index owner must have EXECUTE privilege on this stored procedure.
This stored procedure must not be replaced or be dropped after the index is created. You can replace or drop this stored procedure after the index is dropped.
Restrictions
This procedure must not perform any of the following operations:
rollback
explicitly or implicitly commit the current transaction
issue any other transaction control statement
alter the session language or territory
The child elements of the root element tokens of the XML document returned must be in the same order as the tokens occur in the queryword being tokenized.
The behavior of this stored procedure must be deterministic with respect to all parameters.
Parameters
Table 2-27 describes the interface for the user-defined lexer query procedure:
Table 2-27 User-defined Lexer Query Procedure XML Schema Attributes
The wildcard characters in the query word must be preserved in the tokens returned in order for the wildcard query feature to work properly.
The character offset is 0 (zero) based. Offset information follows USC-2 codepoint semantics.
3IN OUTVARCHAR2Tokens encoded as XML.If the query word contains no tokens then either NULL must be returned or the tokens element in the XML document returned must contain no child elements.
The length of the data must be less-than or equal to 32512 bytes.
Encoding Tokens as XML
The sequence of tokens returned by your stored procedure must be represented as an XML 1.0 document. The XML document must be valid with respect to the XML Schemas given in the following sections.
XML Schema for No-Location, User-defined Indexing Procedure
XML Schema for User-defined Indexing Procedure with Location
XML Schema for User-defined Lexer Query Procedure
Limitations
To boost performance of this feature, the XML parser in Oracle Text will not perform validation and will not be a full-featured XML compliant parser. This implies that only minimal XML features will be supported. The following XML features are not supported:
Document Type Declaration (for example,
<!DOCTYPE [...]>
) and therefore entity declarations. Only the following built-in entities can be referenced: lt, gt, amp, quot, and apos.CDATA sections.
Comments.
Processing Instructions.
XML declaration (for example,
<?xml version="1.0" ...?>
).Namespaces.
Use of elements and attributes other than those defined by the corresponding XML Schema.
Character references (for example ট).
xml:space attribute.
xml:lang attribute
XML Schema for No-Location, User-defined Indexing Procedure
This section describes additional constraints imposed on the XML document returned by the user-defined lexer indexing procedure when the third parameter is FALSE. The XML document returned must be valid with respect to the following XML Schema:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="tokens"> <xsd:complexType> <xsd:sequence> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element name="eos" type="EmptyTokenType"/> <xsd:element name="eop" type="EmptyTokenType"/> <xsd:element name="num" type="xsd:token"/> <xsd:group ref="IndexCompositeGroup"/> </xsd:choice> </xsd:sequence> </xsd:complexType> </xsd:element> <!-- Enforce constraint that compMem element must be preceeded by word element or compMem element for indexing --> <xsd:group name="IndexCompositeGroup"> <xsd:sequence> <xsd:element name="word" type="xsd:token"/> <xsd:element name="compMem" type="xsd:token" minOccurs="0"maxOccurs="unbounded"/> </xsd:sequence> </xsd:group> <!-- EmptyTokenType defines an empty element without attributes --> <xsd:complexType name="EmptyTokenType"/></xsd:schema>
Here are some of the constraints imposed by this XML Schema:
The root element is tokens. This is mandatory. It has no attributes.
The root element can have zero or more child elements. The child elements can be one of the following: eos, eop, num, word, and compMem. Each of these represent a specific type of token.
The compMem element must be preceded by a word element or a compMem element.
The eos and eop elements have no attributes and must be empty elements.
The num, word, and compMem elements have no attributes. Oracle Text will normalize the content of these elements as follows: convert whitespace characters to space characters, collapse adjacent space characters to a single space character, remove leading and trailing spaces, perform entity reference replacement, and truncate to 64 bytes.
Table 2-28 describes the element names defined in the preceding XML Schema.
Table 2-28 User-defined Lexer Indexing Procedure XML Schema Element Names
Supporting this token type is optional. Without support for this token type, adding the NUMERBS stopclass will have no effect.
eosThis element represents end-of-sentence token. Oracle Text uses this information so that it can support WITHIN SENTENCE queries.Supporting this token type is optional. Without support for this token type, queries against the SENTENCE section will not work as expected.
eopThis element represents end-of-paragraph token. Oracle Text uses this information so that it can support WITHIN PARAGRAPH queries.Supporting this token type is optional. Without support for this token type, queries against the PARAGRAPH section will not work as expected.
compMemSame as the word element, except that the implicit word offset is the same as the previous word token.Support for this token type is optional.
Example
Document: Vom Nordhauptbahnhof und aus der Innenstadt zum Messegelände.
Tokens:
<tokens> <word> VOM </word> <word> NORDHAUPTBAHNHOF </word> <compMem>NORD</compMem> <compMem>HAUPT </compMem> <compMem>BAHNHOF </compMem> <compMem>HAUPTBAHNHOF </compMem> <word> UND </word> <word> AUS </word> <word> DER </word> <word> INNENSTADT </word> <word> ZUM </word> <word> MESSEGELÄNDE </word> <eos/></tokens>
Example
Document: Oracle Database 10g Release 1
Tokens:
<tokens> <word> ORACLE10G</word> <word> RELEASE </word> <num> 1 </num></tokens>
Example
Document: WHERE salary<25000.00 AND job = 'F&B Manager'
Tokens:
<tokens> <word> WHERE </word> <word> salary<2500.00 </word> <word> AND </word> <word> job </word> <word> F&B </word> <word> Manager </word></tokens>
XML Schema for User-defined Indexing Procedure with Location
This section describes additional constraints imposed on the XML document returned by the user-defined lexer indexing procedure when the third parameter is TRUE. The XML document returned must be valid according to the following XML schema:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="tokens"> <xsd:complexType> <xsd:sequence> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element name="eos" type="EmptyTokenType"/> <xsd:element name="eop" type="EmptyTokenType"/> <xsd:element name="num" type="DocServiceTokenType"/> <xsd:group ref="DocServiceCompositeGroup"/> </xsd:choice> </xsd:sequence> </xsd:complexType> </xsd:element> <!-- Enforce constraint that compMem element must be preceeded by word element or compMem element for document service --> <xsd:group name="DocServiceCompositeGroup"> <xsd:sequence> <xsd:element name="word" type="DocServiceTokenType"/> <xsd:element name="compMem" type="DocServiceTokenType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:group> <!-- EmptyTokenType defines an empty element without attributes --> <xsd:complexType name="EmptyTokenType"/> <!-- DocServiceTokenType defines an element with content and mandatory attributes --> <xsd:complexType name="DocServiceTokenType"> <xsd:simpleContent> <xsd:extension base="xsd:token"> <xsd:attribute name="off" type="OffsetType" use="required"/> <xsd:attribute name="len" type="xsd:unsignedShort" use="required"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:simpleType name="OffsetType"> <xsd:restriction base="xsd:unsignedInt"> <xsd:maxInclusive value="2147483647"/> </xsd:restriction> </xsd:simpleType></xsd:schema>
Some of the constraints imposed by this XML Schema are as follows:
The root element is tokens. This is mandatory. It has no attributes.
The root element can have zero or more child elements. The child elements can be one of the following: eos, eop, num, word, and compMem. Each of these represent a specific type of token.
The compMem element must be preceded by a word element or a compMem element.
The eos and eop elements have no attributes and must be empty elements.
The num, word, and compMem elements have two mandatory attributes:
off
and len. Oracle Text will normalize the content of these elements as follows: convert whitespace characters to space characters, collapse adjacent space characters to a single space character, remove leading and trailing spaces, perform entity reference replacement, and truncate to 64 bytes.The
off
attribute value must be an integer between 0 and 2147483647 inclusive.The
len
attribute value must be an integer between 0 and 65535 inclusive.
Table 2-28 describes the element types defined in the preceding XML Schema.
Table 2-29 describes the attributes defined in the preceding XML Schema.
Table 2-29 User-defined Lexer Indexing Procedure XML Schema Attributes
The offset is with respect to the character document passed to the user-defined lexer indexing procedure, not the document fetched by the datastore. The document fetched by the datastore may be pre-processed by the filter object or the section group object, or both, before being passed to the user-defined lexer indexing procedure.
The offset of the first character in the document being tokenized is 0 (zero). Offset information follows USC-2 codepoint semantics.
lenThis attribute represents the character length (same semantics as SQL function LENGTH) of the token as it appears in the document being tokenized.The length is with respect to the character document passed to the user-defined lexer indexing procedure, not the document fetched by the datastore. The document fetched by the datastore may be pre-processed by the filter object or the section group object before being passed to the user-defined lexer indexing procedure.
Length information follows USC-2 codepoint semantics.
Sum of off
attribute value and len
attribute value must be less than or equal to the total number of characters in the document being tokenized. This is to ensure that the document offset and characters being referenced are within the document boundary.
Example
Document: User-defined Lexer.
Tokens:
<tokens> <word off="0" len="4"> USE </word> <word off="5" len="7"> DEF </word> <word off="13" len="5"> LEX </word> <eos/></tokens>
XML Schema for User-defined Lexer Query Procedure
This section describes additional constraints imposed on the XML document returned by the user-defined lexer query procedure. The XML document returned must be valid with respect to the following XML Schema:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="tokens"> <xsd:complexType> <xsd:sequence> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element name="num" type="QueryTokenType"/> <xsd:group ref="QueryCompositeGroup"/> </xsd:choice> </xsd:sequence> </xsd:complexType> </xsd:element><!--Enforce constraint that compMem element must be preceeded by word elementor compMem element for query--> <xsd:group name="QueryCompositeGroup"> <xsd:sequence> <xsd:element name="word" type="QueryTokenType"/> <xsd:element name="compMem" type="QueryTokenType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:group> <!-- QueryTokenType defines an element with content and with an optional attribute --> <xsd:complexType name="QueryTokenType"> <xsd:simpleContent> <xsd:extension base="xsd:token"> <xsd:attribute name="wildcard" type="WildcardType" use="optional"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:simpleType name="WildcardType"> <xsd:restriction base="WildcardBaseType"> <xsd:minLength value="1"/> <xsd:maxLength value="64"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="WildcardBaseType"> <xsd:list> <xsd:simpleType> <xsd:restriction base="xsd:unsignedShort"> <xsd:maxInclusive value="378"/> </xsd:restriction> </xsd:simpleType> </xsd:list> </xsd:simpleType></xsd:schema>
Here are some of the constraints imposed by this XML Schema:
The
root
element is tokens. This is mandatory. It has no attributes.The
root
element can have zero or more child elements. The child elements can be one of the following:num
andword
. Each of these represent a specific type of token.The
compMem
element must be preceded by aword
element or acompMem
element.The purpose of
compMem
is to enableUSER_LEXER
queries to return multiple forms for a single query. For example, if a user-defined lexer indexes the word bank asBANK(FINANCIAL)
andBANK(RIVER)
, the query procedure can return the first term as aword
and the second as acompMem
element:<tokens> <word>BANK(RIVER)</word> <compMem>BANK(FINANCIAL)</compMem></tokens>
See Table 2-30, "User-defined Lexer Query Procedure XML Schema Attributes" for more on the
compMem
element.The num and word elements have a single optional attribute: wildcard. Oracle Text will normalize the content of these elements as follows: convert whitespace characters to space characters, collapse adjacent space characters to a single space character, remove leading and trailing spaces, perform entity reference replacement, and truncate to 64 bytes.
The wildcard attribute value is a white-space separated list of integers. The minimum number of integers is 1 and the maximum number of integers is 64. The value of the integers must be between 0 and 378 inclusive. The intriguers in the list can be in any order.
Table 2-28 describes the element types defined in the preceding XML Schema.
Table 2-30 describes the attribute defined in the preceding XML Schema.
Table 2-30 User-defined Lexer Query Procedure XML Schema Attributes
word
element, but its implicit word offset is the same as the previousword
token. Oracle Text will equate this token with the previous word
token and with subsequent compMem
tokens using the queryEQUIV
operator.wildcardAny% or _ characters in the query which are not escaped by the user are considered wildcard characters because they are replaced by other characters. These wildcard characters in the query must be preserved during tokenization in order for the wildcard query feature to work properly. This attribute represents the character offsets (same semantics as SQL function LENGTH) of wildcard characters in the content of the element. Oracle Text will adjust these offsets for any normalization performed on the content of the element. The characters pointed to by the offsets must either be% or _ characters.The offset of the first character in the content of the element is 0. Offset information follows USC-2 codepoint semantics.
If the token does not contain any wildcard characters then this attribute must not be specified.
Example
Query word: pseudo-%morph%
Tokens:
<tokens> <word> PSEUDO </word> <word wildcard="1 7"> %MORPH% </word></tokens>
Example
Query word: <%>Tokens:<tokens> <word wildcard="5"> <%> </word></tokens>
WORLD_LEXER
Use the WORLD_LEXER
to index text columns that contain documents of different languages. For example, you can use this lexer to index a text column that stores English, Japanese, and German documents.
WORLD_LEXER
differs from MULTI_LEXER
in that WORLD_LEXER
automatically detects the language(s) of a document. UnlikeMULTI_LEXER
, WORLD_LEXER
does not require you to have a language column in your base table or to specify the language column when you create the index. Moreover, it is not necessary to use sub-lexers, as withMULTI_LEXER
. (See MULTI_LEXER.)
This lexer has no attributes.
WORLD_LEXER
works with languages whose character sets are defined by the Unicode 4.0 standard. For a list of languages thatWORLD_LEXER
can work with, see "World Lexer Features".
WORLD_LEXER Attribute
The WORLD_VGRAM_LEXER
has the following attribute:
Table 2-31 WORLD_LEXER Attributes
YES
and NO
(default).WORLD_LEXER Example
Here is an example of creating an index using WORLD_LEXER
.
exec ctx_ddl.create_preference('MYLEXER', 'world_lexer');create index doc_idx on doc(data) indextype is CONTEXT parameters ('lexer MYLEXER stoplist CTXSYS.EMPTY_STOPLIST');
Wordlist Type
Use the wordlist preference to enable the query options such as stemming, fuzzy matching for your language. You can also use the wordlist preference to enable substring and prefix indexing, which improves performance for wildcard queries withCONTAINS
and CATSEARCH
.
To create a wordlist preference, you must use BASIC_WORDLIST
, which is the only type available.
BASIC_WORDLIST
Use BASIC_WORDLIST
type to enable stemming and fuzzy matching or to create prefix indexes with Text indexes.
See Also:
For more information about the stem and fuzzy operators, see Chapter 3, "Oracle Text CONTAINS Query Operators".BASIC_WORDLIST
has the following attributes:
Table 2-32 BASIC_WORDLIST Attributes
NULL (no stemming)
ENGLISH (English inflectional)
DERIVATIONAL (English derivational)
DUTCH
FRENCH
GERMAN
ITALIAN
SPANISH
AUTO (Automatic language-detection for stemming for the languages above. Does not auto-detect Japanese.)
JAPANESE
fuzzy_matchSpecify which fuzzy matching cluster to use. You can specify one of the following:AUTO (automatic language detection for stemming)
CHINESE_VGRAM
DUTCH
ENGLISH
FRENCH
GENERIC
GERMAN
ITALIAN
JAPANESE_VGRAM
KOREAN
OCR
SPANISH
fuzzy_scoreSpecify a default lower limit of fuzzy score. Specify a number between 0 and 80. Text with scores below this number is not returned. Default is 60.fuzzy_numresultsSpecify the maximum number of fuzzy expansions. Use a number between 0 and 5,000. Default is 100.substring_indexSpecify TRUE for Oracle Text to create a substring index. A substring index improves left-truncated and double-truncated wildcard queries such as%ing or %benz%. Default is FALSE.In order to create a Text index with a wordlist that has substring_index
set to TRUE, the user creating the index needs theCREATE TRIGGER
system privilege.
- stemmer
Specify the stemmer used for word stemming in Text queries. When you do not specify a value for stemmer, the default is
ENGLISH
.Specify
AUTO
for the system to automatically set the stemming language according to the language setting of the session. When there is no stemmer for a language, the default isNULL
. With theNULL
stemmer, the stem operator is ignored in queries.You can create your own stemming user-dictionary. See "Stemming User-Dictionaries" for more information.
- fuzzy_match
Specify which fuzzy matching routines are used for the column. Fuzzy matching is currently supported for English, Japanese, and, to a lesser extent, the Western European languages.
Note:
The fuzzy_match attributes value for Chinese and Korean are dummy attribute values that prevent the English and Japanese fuzzy matching routines from being used on Chinese and Korean text.The default for fuzzy_match is
GENERIC
.Specify
AUTO
for the system to automatically set the fuzzy matching language according to language setting of the session.
- fuzzy_score
Specify a default lower limit of fuzzy score. Specify a number between 0 and 80. Text with scores below this number are not returned. The default is 60.
Fuzzy score is a measure of how close the expanded word is to the query word. The higher the score the better the match. Use this parameter to limit fuzzy expansions to the best matches.
- fuzzy_numresults
Specify the maximum number of fuzzy expansions. Use a number between 0 and 5000. The default is 100.
Setting a fuzzy expansion limits the expansion to a specified number of the best matching words.
- substring_index
Specify
TRUE
for Oracle Text to create a substring index. A substring index improves performance for left-truncated or double-truncated wildcard queries such as%ing or %benz%. The default is false.Substring indexing has the following impact on indexing and disk resources:
Index creation and DML processing is up to 4 times slower
The size of the substring index created is approximately the size of the $X index on the word table.
Index creation with
substring_index
enabled requires more rollback segments during index flushes than with substring index off. Oracle recommends that you do either of the following when creating a substring index:make available double the usual rollback or
decrease the index memory to reduce the size of the index flushes to disk
- prefix_index
Specify
yes
to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such asTO%. Defaults toNO
.Note:
Enabling prefix indexing increases index size.Prefix indexing chops up tokens into multiple prefixes to store in the $I table.For example, words
TOKEN
andTOY
are normally indexed like this in the $I table:Token Type Information TOKEN0DOCID 1 POS 1TOY0DOCID 1 POS 3With prefix indexing, Oracle Text indexes the prefix substrings of these tokens as follows with a new token type of 6:
Token Type Information TOKEN0DOCID 1 POS 1TOY0DOCID 1 POS 3T6DOCID 1 POS 1 POS 3TO6DOCID 1 POS 1 POS 3TOK6DOCID 1 POS 1TOKE6DOCID 1 POS 1TOKEN6DOCID 1 POS 1TOY6DOCID 1 POS 3Wildcard searches such as
TO%
are now faster because Oracle Text does no expansion of terms and merging of result sets. To obtain the result, Oracle Text need only examine the (TO,6) row.- prefix_length_min
Specify the minimum length of indexed prefixes. Defaults to 1.
For example, setting
prefix_length_min
to 3 andprefix_length_max
to 5 indexes all prefixes between 3 and 5 characters long.Note:
A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging.- prefix_length_max
Specify the maximum length of indexed prefixes. Defaults to 64.
For example, setting
prefix_length_min
to 3 andprefix_length_max
to 5 indexes all prefixes between 3 and 5 characters long.Note:
A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging.- wildcard_maxterms
Specify the maximum number of terms in a wildcard (%) expansion. Use this parameter to keep wildcard query performance within an acceptable limit. Oracle Text returns an error when the wildcard query expansion exceeds this number.
BASIC_WORDLIST Example
The following example shows the use of the BASIC_WORDLIST
type.
Enabling Fuzzy Matching and Stemming
The following example enables stemming and fuzzy matching for English. The preferenceSTEM_FUZZY_PREF
sets the number of expansions to the maximum allowed. This preference also instructs the system to create a substring index to improve the performance of double-truncated searches.
begin ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST'); ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH'); ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0'); ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000'); ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE'); ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');end;
To create the index in SQL, issue the following statement:
create index fuzzy_stem_subst_idx on mytable ( docs ) indextype is ctxsys.context parameters ('Wordlist STEM_FUZZY_PREF');
Enabling Sub-string and Prefix Indexing
The following example sets the wordlist preference for prefix and sub-string indexing. For prefix indexing, it specifies that Oracle Text create token prefixes between 3 and 4 characters long:
begin
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',3);ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 4);ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end
Setting Wildcard Expansion Limit
Use the wildcard_maxterms attribute to set the maximum allowed terms in a wildcard expansion.
--- create a sample tabledrop table quick ;create table quick ( quick_id number primary key, text varchar(80) ); --- insert a row with 10 expansions for 'tire%'insert into quick ( quick_id, text ) values ( 1, 'tire tirea tireb tirec tired tiree tiref tireg tireh tirei tirej');commit;--- create an index using wildcard_maxterms=100begin Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST'); ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 100) ;end; /create index wildcard_idx on quick(text) indextype is ctxsys.context parameters ('Wordlist wildcard_pref') ;--- query on 'tire%' - should work fineselect quick_id from quick where contains ( text, 'tire%' ) > 0;--- now re-create the index with wildcard_maxterms=5drop index wildcard_idx ;begin Ctx_Ddl.Drop_Preference('wildcard_pref'); Ctx_Ddl.Create_Preference('wildcard_pref', 'BASIC_WORDLIST'); ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 5) ;end; /create index wildcard_idx on quick(text) indextype is ctxsys.context parameters ('Wordlist wildcard_pref') ;--- query on 'tire%' gives "wildcard query expansion resulted in too many terms"select quick_id from quick where contains ( text, 'tire%' ) > 0;
Storage Types
Use the storage preference to specify tablespace and creation parameters for tables associated with a Text index. The system provides a single storage type calledBASIC_STORAGE
:
Table 2-33 Storage Types
BASIC_STORAGE
The BASIC_STORAGE
type specifies the tablespace and creation parameters for the database tables and indexes that constitute a Text index.
The clause you specify is added to the internal CREATE
TABLE
(CREATE
INDEX
for the i_index _clause) statement at index creation. You can specify most allowable clauses, such as storage, LOB storage, or partitioning. However, you cannot specify an index organized table clause.
See Also:
For more information about how to specifyCREATE
TABLE
andCREATE
INDEX
statements, see Oracle Database SQL Reference.BASIC_STORAGE
has the following attributes:
Table 2-34 BASIC_STORAGE Attributes
CREATE
TABLE
statement.The I table is the index data table.
k_table_clauseParameter clause for dr$indexname$K table creation. Specify storage and tablespace clauses to add to the end of the internalCREATE
TABLE
statement.The K table is the keymap table.
r_table_clauseParameter clause for dr$indexname$R table creation. Specify storage and tablespace clauses to add to the end of the internalCREATE
TABLE
statement.The R table is the rowid table.
The default clause is: 'LOB(DATA) STORE AS (CACHE)'.
If you modify this attribute, always include this clause for good performance.
n_table_clauseParameter clause for dr$indexname$N table creation. Specify storage and tablespace clauses to add to the end of the internalCREATE
TABLE
statement.The N table is the negative list table.
i_index_clauseParameter clause for dr$indexname$X index creation. Specify storage and tablespace clauses to add to the end of the internalCREATE
INDEX
statement. The default clause is: 'COMPRESS 2'
which instructs Oracle Text to compress this index table.If you choose to override the default, Oracle recommends including COMPRESS 2
in your parameter clause to compress this table, since such compression saves disk space and helps query performance.
SUBSTRING_INDEX
in the BASIC_WORDLIST
.Specify storage and tablespace clauses to add to the end of the internal CREATE
INDEX
statement. The P table is an index-organized table so the storage clause you specify must be appropriate to this type of table.
Storage Default Behavior
By default, BASIC_STORAGE
attributes are not set. In such cases, the Text index tables are created in the index owner's default tablespace. Consider the following statement, issued by userIUSER
, with no BASIC_STORAGE
attributes set:
create index IOWNER.idx on TOWNER.tab(b) indextype is ctxsys.context;
In this example, the text index is created in IOWNER's
default tablespace.
Storage Example
The following examples specify that the index tables are to be created in the foo
tablespace with an initial extent of 1K:
beginctx_ddl.create_preference('mystore', 'BASIC_STORAGE');ctx_ddl.set_attribute('mystore', 'I_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'K_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'R_TABLE_CLAUSE', 'tablespace users storage (initial 1K) lob (data) store as (disable storage in row cache)');ctx_ddl.set_attribute('mystore', 'N_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); ctx_ddl.set_attribute('mystore', 'I_INDEX_CLAUSE', 'tablespace foo storage (initial 1K) compress 2');ctx_ddl.set_attribute('mystore', 'P_TABLE_CLAUSE', 'tablespace foo storage (initial 1K)'); end;
Section Group Types
In order to issue WITHIN
queries on document sections, you must create a section group before you define your sections. You specify your section group in the parameter clause ofCREATE INDEX.
To create a section group, you can specify one of the following group types with the CTX_DDL.CREATE_SECTION_GROUP procedure:
Table 2-35 Section Group Types
NULL_SECTION_GROUP
Use this group type when you define no sections or when you defineonly SENTENCE or PARAGRAPH sections. This is the default.BASIC_SECTION_GROUP
Use this group type for defining sections where the start and end tags are of the form<A>
and </A>
.Note: This group type does not support input such as unbalanced parentheses, comments tags, and attributes. UseHTML_SECTION_GROUP
for this type of input.
HTML_SECTION_GROUP
Use this group type for indexing HTML documents and for defining sections in HTML documents.XML_SECTION_GROUP
Use this group type for indexing XML documents and for defining sections in XML documents. All sections to be indexed must be manually defined for this group.AUTO_SECTION_GROUP
Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case sensitive as in XML.Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form tag@attribute.
Stop sections, empty tags, processing instructions, and comments are not indexed.
The following limitations apply to automatic section groups:
You cannot add zone, field, or special sections to an automatic section group.
You can define a stop section that applies only to one particular type; that is, if you have two different XML DTDs, both of which use a tag called
FOO
, you can define(TYPE1)FOO
to be stopped, but(TYPE2)FOO
to not be stopped.The length of the indexed tags, including prefix and namespace, cannot exceed 64 bytes. Tags longer than this are not indexed.
PATH_SECTION_GROUP
Use this group type to index XML documents. Behaves like theAUTO_SECTION_GROUP
.The difference is that with this section group you can do path searching with theINPATH
and HASPATH
operators. Queries are also case-sensitive for tag and attribute names. Stop sections are not allowed.
NEWS_SECTION_GROUP
Use this group for defining sections in newsgroup formatted documents according to RFC 1036.Section Group Examples
This example shows the use of section groups in both HTML and XML documents.
Creating Section Groups in HTML Documents
The following statement creates a section group called htmgroup
with the HTML group type.
beginctx_ddl.create_section_group('htmgroup', 'HTML_SECTION_GROUP');end;
You can optionally add sections to this group using the procedures in the CTX_DDL
package, such as CTX_DDL.ADD_SPECIAL_SECTION
or CTX_DDL.ADD_ZONE_SECTION
. To index your documents, you can issue a statement such as:
create index myindex on docs(htmlfile) indextype is ctxsys.context parameters('filter ctxsys.null_filter section group htmgroup');
See Also:
For more information on section groups, see Chapter 7, "CTX_DDL Package"Creating Sections Groups in XML Documents
The following statement creates a section group called xmlgroup
with theXML_SECTION_GROUP
group type.
beginctx_ddl.create_section_group('xmlgroup', 'XML_SECTION_GROUP');end;
You can optionally add sections to this group using the procedures in the CTX_DDL
package, such as CTX_DDL.ADD_ATTR_SECTION
or CTX_DDL.ADD_STOP_SECTION
. To index your documents, you can issue a statement such as:
create index myindex on docs(htmlfile) indextype is ctxsys.context parameters('filter ctxsys.null_filter section group xmlgroup');
See Also:
For more information on section groups, see Chapter 7, "CTX_DDL Package"Automatic Sectioning in XML Documents
The following statement creates a section group called auto
with theAUTO_SECTION_GROUP
group type. This section group automatically creates sections from tags in XML documents.
begin
ctx_ddl.create_section_group('auto', 'AUTO_SECTION_GROUP');
end;CREATE INDEX myindex on docs(htmlfile) INDEXTYPE IS ctxsys.context PARAMETERS('filter ctxsys.null_filter section group auto');
Classifier Types
This section describes the classifier types used to create a preference for CTX_CLS.TRAIN and CTXRULE index creation. The following two classifier types are supported:
RULE_CLASSIFIER
SVM_CLASSIFIER
RULE_CLASSIFIER
Use the RULE_CLASSIFIER type for creating preferences for the query rule generating procedure, CTX_CLS.TRAIN and for CTXRULE creation. The rules generated with this type are essentially query strings and can be easily examined. The queries generated by this classifier can use the AND, NOT, or ABOUT operators. The WITHIN operator is supported for queries on field sections only.
This type has the following attributes:
Table 2-36 RULE_CLASSIFIER Attributes
THRESHOLD
I50199Specify threshold (in percentage) for rule generation. One rule is output only when its confidence level is larger than threshold.MAX_TERMS
I100202000For each class, a list of relevant terms is selected to form rules. Specify the maximum number of terms that can be selected for each class.MEMORY_SIZE
I500104000Specify memory usage for training in MB. Larger values improve performance.NT_THRESHOLD
F0.00100.90Specify a threshold for term selection. There are two thresholds guiding two steps in selecting relevant terms. This threshold controls the behavior of the first step. At this step, terms are selected as candidate terms for the further consideration in the second step. The term is chosen when the ratio of the occurrence frequency over the number of documents in the training set is larger than this threshold.TERM_THRESHOLD
I100100Specify a threshold as a percentage for term selection. This threshold controls the second step term selection. Each candidate term has a numerical quantity calculated to imply its correlation with a given class. The candidate term will be selected for this class only when the ratio of its quantity value over the maximum value for all candidate terms in the class is larger than this threshold.PRUNE_LEVEL
I750100Specify how much to prune a built decision tree for better coverage. Higher values mean more aggressive pruning and the generated rules will have larger coverage but less accuracy.SVM_CLASSIFIER
Use the SVM_CLASSIFIER type for creating preferences for the rule generating procedure, CTX_CLS.TRAIN, and for CTXRULE creation. This classifier type represents the Support Vector Machine method of classification and generates rules in binary format. Use this classifier type when you need high classification accuracy.
This type has the following attributes:
Table 2-37 SVM_CLASSIFIER Attributes
MAX_DOCTERMS
I50108192Specify the maximum number of terms representing one document.MAX_FEATURES
I3,0001100,000Specify the maximum number of distinct features.THEME_ON
BFALSENULLNULLSpecify TRUE to use themes as features.Classification with themes requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see theOracle Text Application Developer's Guide.
TOKEN_ON
BTRUENULLNULLSpecify TRUE to use regular tokens as features.STEM_ON
BFALSENULLNULLSpecify TRUE to use stemmed tokens as features. This only works when turning INDEX_STEM on for the lexer.MEMORY_SIZE
I500104000Specify approximate memory size in MB.SECTION_WEIGHT
120100Specify the occurrence multiplier for adding a term in a field section as a normal term. For example, by default, the termcat in "<A>cat</A>" is a field section term and is treated as a normal term with occurrence equal to 2, but you can specify that it be treated as a normal term with a weight up to 100.SECTION_WEIGHT
is only meaningful when the index policy specifies a field section.Cluster Types
This section describes the cluster types used for creating preferences for theCTX_CLS.CLUSTERING
procedure.
See Also:
For more information about clustering, see "CLUSTERING" in Chapter 6, "CTX_CLS Package" as well as the Oracle Text Application Developer's GuideKMEAN_CLUSTERING
This clustering type has the following attributes:
Table 2-38 KMEAN_CLUSTERING Attributes
MAX_DOCTERMS
I50108192Specify the maximum number of distinct terms representing one document.MAX_FEATURES
I3,0001500,000Specify the maximum number of distinct features.THEME_ON
BFALSENULLNULLSpecify TRUE to use themes as features.Clustering with themes requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see theOracle Text Application Developer's Guide.
TOKEN_ON
BTRUENULLNULLSpecify TRUE to use regular tokens as features.STEM_ON
BFALSENULLNULLSpecify TRUE to use stemmed tokens as features. This only works when turning INDEX_STEM on for the lexer.MEMORY_SIZE
I500104000Specify approximate memory size in MB.SECTION_WEIGHT
120100Specify the occurrence multiplier for adding a term in a field section as a normal term. For example, by default, the termcat in "<A>cat</A>" is a field section term and is treated as a normal term with occurrence equal to 2, but you can specify that it be treated as a normal term with a weight up to 100.SECTION_WEIGHT
is only meaningful when the index policy specifies a field section.CLUSTER_NUM
I200220000Specify the total number of leaf clusters to be generated.Stoplists
Stoplists identify the words in your language that are not to be indexed. In English, you can also identify stopthemes that are not to be indexed. By default, the system indexes text using the system-supplied stoplist that corresponds to your database language.
Oracle Text provides default stoplists for most common languages including English, French, German, Spanish, Chinese, Dutch, and Danish. These default stoplists contain only stopwords.
See Also:
For more information about the supplied default stoplists, see Appendix E, "Oracle Text Supplied Stoplists".Multi-Language Stoplists
You can create multi-language stoplists to hold language-specific stopwords. A multi-language stoplist is useful when you use theMULTI_LEXER
to index a table that contains documents in different languages, such as English, German, and Japanese.
To create a multi-language stoplist, use the CTX_DLL.CREATE_STOPLIST procedure and specify a stoplist type ofMULTI_STOPLIST
. You add language specific stopwords with CTX_DDL.ADD_STOPWORD.
At indexing time, the language column of each document is examined, and only the stopwords for that language are eliminated. At query time, the session language setting determines the active stopwords, like it determines the active lexer when using the multi-lexer.
Creating Stoplists
You can create your own stoplists using the CTX_DLL.CREATE_STOPLIST procedure. With this procedure you can create aBASIC_STOPLIST
for single language stoplist, or you can create a MULTI_STOPLIST
for a multi-language stoplist.
When you create your own stoplist, you must specify it in the parameter clause ofCREATE
INDEX
.
Modifying the Default Stoplist
The default stoplist is always named CTXSYS.DEFAULT_STOPLIST. You can use the following procedures to modify this stoplist:
CTX_DDL.ADD_STOPWORD
CTX_DDL.REMOVE_STOPWORD
CTX_DDL.ADD_STOPTHEME
CTX_DDL.ADD_STOPCLASS
When you modify CTXSYS.DEFAULT_STOPLIST
with the CTX_DDL
package, you must re-create your index for the changes to take effect.
Dynamic Addition of Stopwords
You can add stopwords dynamically to a default or custom stoplist withALTER INDEX. When you add a stopword dynamically, you need not re-index, because the word immediately becomes a stopword and is removed from the index.
Note:
Even though you can dynamically add stopwords to an index, you cannot dynamically remove stopwords.To remove a stopword, you must use CTX_DDL.REMOVE_STOPWORD, drop your index and re-create it.See Also:
ALTER INDEX inChapter 1, "Oracle Text SQL Statements and Operators".System-Defined Preferences
When you install Oracle Text, some indexing preferences are created. You can use these preferences in the parameter clause ofCREATE INDEX or define your own.
The default index parameters are mapped to some of the system-defined preferences described in this section.
See Also:
For more information about default index parameters, see "Default Index Parameters" .System-defined preferences are divided into the following categories:
Data Storage
Filter
Lexer
Section Group
Stoplist
Storage
Wordlist
Data Storage
This section discusses the types associated with data storage preferences.
CTXSYS.DEFAULT_DATASTORE
This preference uses the DIRECT_DATASTORE type. You can use this preference to create indexes for text columns in which the text is stored directly in the column.
CTXSYS.FILE_DATASTORE
This preference uses the FILE_DATASTORE type.
CTXSYS.URL_DATASTORE
This preference uses the URL_DATASTORE type.
Filter
This section discusses the types associated with filtering preferences.
CTXSYS.NULL_FILTER
This preference uses the NULL_FILTER type.
CTXSYS.AUTO_FILTER
This preference uses the AUTO_FILTER type.
Lexer
This section discusses the types associated with lexer preferences.
CTXSYS.DEFAULT_LEXER
The default lexer depends on the language used at install time. The following sections describe the default settings forCTXSYS.DEFAULT_LEXER
for each language.
American and English Language Settings
If your language is English, this preference uses the BASIC_LEXER with the index_themes
attribute disabled.
Danish Language Settings
If your language is Danish, this preference uses the BASIC_LEXER with the following option enabled:
alternate spelling (alternate_spelling attribute set to
DANISH
)
Dutch Language Settings
If your language is Dutch, this preference uses the BASIC_LEXER with the following options enabled:
composite indexing (
composite
attribute set toDUTCH
)
German and German DIN Language Settings
If your language is German, this preference uses the BASIC_LEXER with the following options enabled:
case-sensitive indexing (
mixed_case
attribute enabled)composite indexing (
composite
attribute set toGERMAN
)alternate spelling (
alternate_spelling
attribute set toGERMAN
)
Finnish, Norwegian, and Swedish Language Settings
If your language is Finnish, Norwegian, or Swedish, this preference uses the BASIC_LEXER with the following option enabled:
alternate spelling (alternate_spelling attribute set to
SWEDISH
)
Japanese Language Settings
If you language is Japanese, this preference uses the JAPANESE_VGRAM_LEXER.
Korean Language Settings
If your language is Korean, this preference uses the KOREAN_MORPH_LEXER. All attributes for the KOREAN_MORPH_LEXER
are enabled.
Chinese Language Settings
If your language is Simplified or Traditional Chinese, this preference uses theCHINESE_VGRAM_LEXER.
Other Languages
For all other languages not listed in this section, this preference uses the BASIC_LEXER with no attributes set.
See Also:
To learn more about these options, see BASIC_LEXER .CTXSYS.BASIC_LEXER
This preference uses the BASIC_LEXER
.
Section Group
This section discusses the types associated with section group preferences.
CTXSYS.NULL_SECTION_GROUP
This preference uses the NULL_SECTION_GROUP
type.
CTXSYS.HTML_SECTION_GROUP
This preference uses the HTML_SECTION_GROUP
type.
CTXSYS.AUTO_SECTION_GROUP
This preference uses the AUTO_SECTION_GROUP
type.
CTXSYS.PATH_SECTION_GROUP
This preference uses the PATH_SECTION_GROUP
type.
Stoplist
This section discusses the types associated with stoplist preferences.
CTXSYS.DEFAULT_STOPLIST
This stoplist preference defaults to the stoplist of your database language.
See Also:
For a complete list of the stop words in the supplied stoplists, see Appendix E, "Oracle Text Supplied Stoplists".CTXSYS.EMPTY_STOPLIST
This stoplist has no words.
Storage
This section discusses the types associated with storage preferences.
CTXSYS.DEFAULT_STORAGE
This storage preference uses the BASIC_STORAGE type.
Wordlist
This section discusses the types associated with wordlist preferences.
CTXSYS.DEFAULT_WORDLIST
This preference uses the language stemmer for your database language. If your language is not listed inTable 2-32, this preference defaults to the NULL stemmer and the GENERIC fuzzy matching attribute.
System Parameters
This section describes the Oracle Text system parameters. They fall into the following categories:
General System Parameters
Default Index Parameters
General System Parameters
When you install Oracle Text, in addition to the system-defined preferences, the following system parameters are set:
Table 2-39 General System Parameters
MAX_INDEX_MEMORY
This is the maximum indexing memory that can be specified in the parameter clause ofCREATE
INDEX
and ALTER
INDEX
.DEFAULT_INDEX_MEMORY
This is the default indexing memory used withCREATE
INDEX
and ALTER
INDEX
.LOG_DIRECTORY
This is the directory for CTX_OUTPUT
log files.CTX_DOC_KEY_TYPE
This is the default input key type, eitherROWID
or PRIMARY_KEY
, for the CTX_DOC
procedures. Set toROWID
at install time.See also: CTX_DOC. SET_KEY_TYPE.
You can view system defaults by querying the CTX_PARAMETERS view. You can change defaults using the CTX_ADM.SET_PARAMETER procedure.
Default Index Parameters
This section describes the index parameters you can use when you create context and ctxcat indexes.
CONTEXT Index Parameters
The following default parameters are used when you do not specify preferences in the parameter clause ofCREATE INDEX when you create a context index. Each default parameter names a system-defined preference to use for data storage, filtering, lexing, and so on.
Table 2-40 Default CONTEXT Index Parameters
DEFAULT_DATASTORE
No datastore preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_DATASTOREDEFAULT_FILTER_FILE
No filter preference specified in parameter clause ofCREATE
INDEX
, and either of the following conditions is true:Your files are stored in external files (BFILES) or
You specify a datastore preference that uses
FILE_DATASTORE
DEFAULT_FILTER_BINARY
No filter preference specified in parameter clause ofCREATE
INDEX
, and Oracle Text detects that the text column datatype isRAW
, LONG
RAW
, or BLOB
.CTXSYS.AUTO_FILTERDEFAULT_FILTER_TEXT
No filter preference specified in parameter clause ofCREATE
INDEX
, and Oracle Text detects that the text column datatype is eitherLONG
, VARCHAR2
, VARCHAR
, CHAR
, orCLOB
.CTXSYS.NULL_FILTERDEFAULT_SECTION_HTML
No section group specified in parameter clause ofCREATE
INDEX
, and when either of the following conditions is true:Your datastore preference uses
URL_DATASTORE
orYour filter preference uses
AUTO_FILTER
.
DEFAULT_SECTION_TEXT
No section group specified in parameter clause ofCREATE
INDEX
, and when you do not use eitherURL_DATASTORE
or AUTO_FILTER
.CTXSYS.NULL_SECTION_GROUPDEFAULT_STORAGE
No storage preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_STORAGEDEFAULT_LEXER
No lexer preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_LEXERDEFAULT_STOPLIST
No stoplist specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_STOPLISTDEFAULT_WORDLIST
No wordlist preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_WORDLISTCTXCAT Index Parameters
The following default parameters are used when you create a CTXCAT index with CREATE INDEX and do not specify any parameters in the parameter string. The CTXCAT index supports only the index set, lexer, storage, stoplist, and wordlist parameters. Each default parameter names a system-defined preference.
Table 2-41 Default CTXCAT Index Parameters
DEFAULT_CTXCAT_INDEX_SET
No index set specified in parameter clause ofCREATE
INDEX
. DEFAULT_CTXCAT_STORAGE
No storage preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_STORAGEDEFAULT_CTXCAT_LEXER
No lexer preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_LEXERDEFAULT_CTXCAT_STOPLIST
No stoplist specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_STOPLISTDEFAULT_CTXCAT_WORDLIST
No wordlist preference specified in parameter clause ofCREATE
INDEX
.Note that while you can specify a wordlist preference for CTXCAT
indexes, most of the attributes do not apply, since the catsearch query language does not support wildcarding, fuzzy, and stemming. The only attribute that is useful isPREFIX_INDEX
for Japanese data.
CTXRULE Index Parameters
The following default parameters are used when you create a CTXRULE
index withCREATE
INDEX
and do not specify any parameters in the parameter string. TheCTXRULE
index supports only the lexer, storage, stoplist, and wordlist parameters. Each default parameter names a system-defined preference.
Table 2-42 Default CTXRULE Index Parameters
CREATE
INDEX
.CTXSYS.DEFAULT_LEXERDEFAULT_CTXRULE_STORAGENo storage preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_STORAGEDEFAULT_CTXRULE_STOPLISTNo stoplist specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_STOPLISTDEFAULT_CTXRULE_WORDLISTNo wordlist preference specified in parameter clause ofCREATE
INDEX
.CTXSYS.DEFAULT_WORDLISTDEFAULT_CLASSIFIERNo classifier preference is specified in parameter clause.RULE_CLASSIFIERViewing Default Values
You can view system defaults by querying the CTX_PARAMETERS view. For example, to see all parameters and values, you can issue:
SQL> SELECT par_name, par_value from ctx_parameters;
Changing Default Values
You can change a default value using the CTX_ADM.SET_PARAMETER procedure to name another custom or system-defined preference to use as default.
- Oracle 全文检索 中文检索
- Oracle Text 全文检索
- oracle 全文检索
- Oracle 全文检索功能
- oracle 全文检索
- Oracle 全文检索设置
- oracle全文检索
- oracle 全文检索实践
- oracle 全文检索
- oracle 全文检索
- oracle 全文检索实践
- oracle 全文检索实践
- oracle 全文检索实践 .
- oracle 全文检索实践
- oracle全文检索
- oracle全文检索
- Oracle全文检索步骤
- oracle 全文检索实践
- 第六章 06_String_6
- 【转】JQuery实现页面随滚动条滚动而动态加载内容的效果
- Unity网络通信(三)同步物体
- ASP.NET程序中常用的三十三种代码(没看)
- Eclipse开发Android的快捷键:
- oracle 全文检索
- C++ 何时使用引用参数(一些指导原则)
- 利用GBK双字节编码突破PHP单引号转义限制进行SQL注入:set names gbk导致的sql注入
- 企业数据防泄密
- MVC中上传文件
- 驰骋工作流引擎ccflow常见问题
- Pentaho源代码阅读报告
- std::vector学习
- [编程之美]饮料供货