Oracle and RTF data
来源:互联网 发布:c 多线程编程实战 编辑:程序博客网 时间:2024/05/16 14:43
Yes, Oracle Text can do this.
http://download-west.oracle.com/docs...cpkg.htm#12729
You need a context index with the INSO filter on the RTF document column in
order to use it.
e.g.
create table RTFDOCUMENT
(
ID NUMBER not null,
DOC CLOB not null,
PLAIN_DOC CLOB
);
[insert an RTF document with id=1 using dbms_lob.loadclobfromfile or whatever]
SQL> create index rtfdocument_tx1 on rtfdocument (doc) indextype is
ctxsys.context parameters ('filter ctxsys.INSO_FILTER'); Index created
SQL> declare
2 v_doc clob;
3 v_plain clob;
4 r number;
5 begin
6 select id, doc
7 into r, v_doc
8 from rtfdocument
9 where id = 1 for update;
10
11 dbms_lob.createtemporary(v_plain, true);
12 dbms_lob.append(v_plain, v_doc); 13
14 ctx_doc.filter('RTFDOCUMENT_TX1', to_char(r), v_plain, plaintext =>
true);
15
16 update rtfdocument set plain_doc = v_plain where id = 1;
17 end;
18 /
PL/SQL procedure successfully completed
SQL> commit;
Commit complete
[plain_doc now contains a plain text version of the RTF document].
It seems a bit over-enthusiastic with newlines, but it certainly works.
from ask tom
Tim -- Thanks for the question regarding "Remove RTF Tags from a Text Field Containing RTF", version 9i
You Asked
Hi Tom,In our database we have some varchar2(4000) fields (could later be a clob). Where we store RTF data. I would like to know if there is anyway I can remove the RTF tags returning only the text again. For Example a Function RTFtoTEXT that takes the RTF:{/rtf1/ansi/deff0{/fonttbl{/f0/fnil/fcharset0 Courier New;}}{/colortbl ;/red0/green0/blue255;}{/*/generator Msftedit 5.41.15.1507;}/viewkind4/uc1/pard/lang1033/ul/b/i/f0/fs20 This is a test./par/cf1/ulnone/i0 This is a test./cf0/b0/par/par/par/par/par} And Returns:This is a test.This is a test.Can you do this with Oracle Text or Something?TIATim
and we said...
Yes, we can do this with text -- you cana) filter to plaintext if you have an indexb) filter to html with or without an indexc) call ctxhx directly from the command line to filter the text and load itI'll demo a) and b). you can play with ctxhx from the command line from $ORACLE_HOME/ctx/bin if you want (run it, it'll tell you the inputs it takes)ops$tkyte@ORA9IR2> create table demo 2 ( id int primary key, 3 theblob blob, 4 theclob clob 5 ) 6 / Table created. ops$tkyte@ORA9IR2> create table filter ( query_id number, document clob ); Table created. ops$tkyte@ORA9IR2>ops$tkyte@ORA9IR2> create index demo_idx on demo(theblob) indextype is ctxsys.context; Index created. ops$tkyte@ORA9IR2>ops$tkyte@ORA9IR2> create sequence s; Sequence created. ops$tkyte@ORA9IR2>ops$tkyte@ORA9IR2>ops$tkyte@ORA9IR2> create or replace directory my_files as '/home/tkyte/Desktop/' 2 / Directory created. ops$tkyte@ORA9IR2> @traceops$tkyte@ORA9IR2> alter session set events '10046 trace name context forever, level 12'; Session altered. ops$tkyte@ORA9IR2> declare 2 l_blob blob; 3 l_clob clob; 4 l_id number; 5 l_bfile bfile; 6 begin 7 insert into demo values ( s.nextval, empty_blob(), empty_clob() ) 8 returning id, theblob, theclob into l_id, l_blob, l_clob; 9 10 l_bfile := bfilename( 'MY_FILES', 'asktom.rtf' ); 11 dbms_lob.fileopen( l_bfile ); 12 13 dbms_lob.loadfromfile( l_blob, l_bfile, 14 dbms_lob.getlength( l_bfile ) ); 15 16 dbms_lob.fileclose( l_bfile ); 17 18 ctx_doc.ifilter( l_blob, l_clob ); 19 commit; 20 ctx_doc.filter( 'DEMO_IDX', l_id, 'FILTER', l_id, TRUE ); 21 end; 22 / PL/SQL procedure successfully completed. ops$tkyte@ORA9IR2>ops$tkyte@ORA9IR2> set long 500ops$tkyte@ORA9IR2> select utl_raw.cast_to_varchar2(dbms_lob.substr(theblob,500,1)) from demo; UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(THEBLOB,500,1))-------------------------------------------------------------------------------{/rtf1/ansi/ansicpg1252/uc1 /deff0/deflang1033/deflangfe1033{/fonttbl{/f0/froman/fcharset0/fprq2{/*/panose 02020603050405020304}Times New Roman;}{/f1/fswiss/fcharset0/fprq2{/*/panose 020b0604020202020204}Arial;}{/f2/fmodern/fcharset0/fprq1{/*/panose 02070309020205020404}Courier New;}{/f23/froman/fcharset128/fprq1{/*/panose 00000000000000000000}MS Mincho{/*/falt MS ??};}{/f28/froman/fcharset128/fprq1{/*/panose 00000000000000000000}@MS Mincho;}{/f29/froman/fcharset238/fprq2 Times New Roman CE; ops$tkyte@ORA9IR2> select theclob from demo; THECLOB-------------------------------------------------------------------------------<HTML><BODY><h1><font size="5" face="Arial"><b>Primary key index in Partitioning</b></font></h1><p><font size="3" face="Times New Roman"><i>I</i> <i>have a table accounts which has 80 million records (OLTP system). I would like to partition the table byacct_by_date column. I will be going with range partition and global indexes. My concern is regd the primary keyacct_id. The index that will be created for primary key should it be local or global and which should I opt for?</i></font></ ops$tkyte@ORA9IR2> select document from filter; DOCUMENT------------------------------------------------------------------------------- Primary key index in Partitioning I have a table accounts which has 80 million records (OLTP system). I would like to partition the table by acct_by_date column.�I will be going with range partition and global indexes. My concern is regd the primary key acct_id. The index that will be created for primary key should it be local or global and which should I opt for? Well, this is an easy one.�The primary key index can be local IF and ONLY IF, the primary key is in fact the (or part of thifilter works without a ctxsys.context index, but only lets you get HTML, filter works only with an index -- but lets you get plain text OR html
Hi Tom,Thanks for that it put me on the right track.Just a quick followup question or three :)1) I am assuming the trace commands are not needed for anything special or is there a reason for this?2) When I do the filter on a clob nothing happens. Why? I'm guessing that this is already considered plain text or what? create table workflow_temprtf( id int primary key, theblob blob, theclob clob); create table workflow_rtffilter (query_id number,document clob); create index workflow_rtfidx on workflow_temprtf(theblob) indextype is ctxsys.context;create index workflow_rtfidx2 on workflow_temprtf(theclob) indextype is ctxsys.context;create sequence workflow_rtfseq;declare l_blob blob; l_clob clob; l_id number; l_bfile bfile; lrtf varchar2(400); lresult clob;begin lrtf:='{/rtf1/ansi/ansicpg1252/deff0{/fonttbl{/f0/fnil/fcharset0 Arial;}}'||CHR(13)||CHR(10); lrtf:=lrtf||'/viewkind4/uc1/pard/lang1033/fs17 this is a test'||CHR(13)||CHR(10); lrtf:=lrtf||'/par }'; select workflow_rtfseq.nextval into l_id from dual; insert into workflow_temprtf values (l_id, utl_raw.cast_to_raw(lrtf) , lrtf ); ctx_doc.filter('WORKFLOW_RTFIDX2', l_id, 'WORKFLOW_RTFFILTER', l_id, TRUE ); select document into lresult from workflow_rtffilter where query_id=l_id; dbms_output.put_line(lresult);end;Returns RTF still ? why ?3) I did succeed in getting it to work with:create table workflow_temprtf( id int primary key, theblob blob, theclob clob); create table workflow_rtffilter (query_id number,document clob); create index workflow_rtfidx on workflow_temprtf(theblob) indextype is ctxsys.context;create index workflow_rtfidx2 on workflow_temprtf(theclob) indextype is ctxsys.context;create sequence workflow_rtfseq;declare l_blob blob; l_clob clob; l_id number; l_bfile bfile; lrtf varchar2(400); lresult clob;begin lrtf:='{/rtf1/ansi/ansicpg1252/deff0{/fonttbl{/f0/fnil/fcharset0 Arial;}}'||CHR(13)||CHR(10); lrtf:=lrtf||'/viewkind4/uc1/pard/lang1033/fs17 this is a test'||CHR(13)||CHR(10); lrtf:=lrtf||'/par }'; select workflow_rtfseq.nextval into l_id from dual; insert into workflow_temprtf values (l_id, utl_raw.cast_to_raw(lrtf) , lrtf ); ctx_doc.filter('WORKFLOW_RTFIDX', l_id, 'WORKFLOW_RTFFILTER', l_id, TRUE ); select document into lresult from workflow_rtffilter where query_id=l_id; dbms_output.put_line(lresult);end;However, and as also in your example there is a lot of extra lines in the result. Whats with those any idea?Thanks for the help.
Followup October 19, 2004 - 10am Central time zone:
1) doh, left that in by accident...2) yes, the clob is just text -- blobs can have pdfs, xls, ppt, docs, whatever. the inso filter can recognize these. clobs cannot have those objects as they are binary in nature.3) they are just "there", you could turn them into spaces easily -- but the filters just produce TEXT -- that is what gets fed into the indexing engine -- it just needed words. No formatting is saved with the text filter, just the text. If formatting is relevant, the HTML output works.
Excellent! But can we do the same without Text?Oracle CTX produce strange results with russian encodings.
Followup March 1, 2005 - 3pm Central time zone:
it should not produce strange results, but this entire answer is predicated on using text specifically.
Since russian text in rtf looks like: "/'ea/'e0/'ea /'ec/'e8/'ed/'e8/'ec/'f3/'ec" I have to use INSO filter for indexes, but INSO filter not implemented on my production 64 bit linux box. Don't you know why? I cannot find anything about this on OTN or metalink.
Followup May 5, 2005 - 9am Central time zone:
the latest releases use a new filtering technology -- suggest you open an itar (we are dropping the inso filters) to get information on this.
Sorry to resurrect an old thread but the ctx_doc.ifilter function helped me get past a hurdle in migrating old data to a new system but there is one part missing and I may need to deal with is outside of oracle. I cant find what I am looking for in the documentation. (Probably just not looking in the right place)I am importing several notes from a legacy system that are stored in varchar2(4000) fields in RTF format. If the RTF document is longer than 4000 characters then another record was created with an incremented sequence. I appended the notes into a blob field in a new table and used ifilter to convert the RTF tags to HTML and stored the results in a clob. This is working great although I had to use a slow-by-slow cursor loop to accomplish all this.What I am now faced with are those notes that have embedded images. Is it possible to extract those images and store them somewhere? If so can you point me to some documentation that will help me figure this out and maybe provide an example?Thanks
Followup November 2, 2008 - 4pm Central time zone:
but this is a question for "ask-about-the-rtf-file-format.com" site.I have no information about the internal format of a RTF file, sorry.
- Oracle and RTF data
- Update Data Template(xml file) and RTF(rtf file) Template for BI Publisher Report
- Oracle export and import data
- Oracle Data Warehousing and Business Intelligence Solutions
- Oracle Data Warehousing and Business Intelligence Solutions
- Oracle data import and export impl/exp
- 27 FAQs on Oracle loading data and exporting data
- Oracle data types and Microsoft SQL Server data types
- Delphi中对Oracle存取RTF文档
- Advanced Data Access with ADO.NET and Oracle
- oracle学习笔记2-Restricting and sorting data
- Oracle DBA Guide to Data Warehousing and Star Schemas
- Exporting and Importing Objects Using Oracle Data Pump(1.expdp)
- Exporting and Importing Objects Using Oracle Data Pump(2.impdp)
- Oracle Physical Inventory Adjustment Process and Data flow.
- Data analysis and Data mining
- Delphi中对Oracle存取RTF文档(作者:苏涌)
- Oracle Report 6i 報表轉爲rtf格式問題
- xml.au3
- oracle安装后怎么不能进OEM?
- 配置Tomncat后运行错误及解决方法
- 关于XML的研究
- 一个老站长的坎坷创业路
- Oracle and RTF data
- PHP面向对象编程
- ORACLE Nologging相关知识
- 分布式数据库拆表拆库的常用策略
- Myspace的六次重构
- 关于Window操作系统中对Oracle的性能监控
- 3123131231
- 10.13小结(之前没贴上去)
- 推荐学英语的经典好书——新概念英语