Need help on a fuzzy match problem
来源:互联网 发布:网络机柜配线架的安装 编辑:程序博客网 时间:2024/06/05 23:45
A fuzzy match is request to perform a provider match while loading extract file from clients, the code is easy but we met an efficiency problem.
On the extract file, each the provider has a not null account_no as their identity; also, we have a column pr_account_no ( with index ) in our production.
The problem is, both of them are varchar2 data type and have both leading '0's, plus, the extract file may have trailing '0's too. so all the following account_no in extract file should be matched with the provider '00012345' in our system:
'001234500', '12345', '000012345', '1234500000', etc.
The current code is like this:
with
providers as (
select 1 id, '0012345' pr_account_no from dual union all
select 2, '0023450' from dual
),
clients as (
select '001234500' client_account_no from dual union all -- will match to provider 1
select '000012345' from dual union all -- will match to provider 1
select '1234500000' from dual union all-- will match to provider 1
select '00234500' from dual union all -- will match to provider 2
select '23450' from dual union all -- will match to provider 2
select '000023450' from dual union all -- will match to provider 2
select '00002345' from dual union all -- will NOT match to provider 2
select '12345' from dual union all -- will match to provider 1
select '2345' from dual -- will NOT match to provider 2
)
select
providers.id, providers.pr_account_no , clients.client_account_no
from providers, clients
where regexp_like
( clients.client_account_no,
'0*'|| trim( leading '0' from providers.pr_account_no )||'0*'
);
As of the efficiency issue, we have 574 K rows in provider table, and 74K rows in extract file on a daily basis which is processed row by row.
About 5-7 minutes only on this matching procedure in 11g2, which is not acceptable according to the business request. Any one would have some hints to improve it? Appreciated in advance.
On the extract file, each the provider has a not null account_no as their identity; also, we have a column pr_account_no ( with index ) in our production.
The problem is, both of them are varchar2 data type and have both leading '0's, plus, the extract file may have trailing '0's too. so all the following account_no in extract file should be matched with the provider '00012345' in our system:
'001234500', '12345', '000012345', '1234500000', etc.
The current code is like this:
with
providers as (
select 1 id, '0012345' pr_account_no from dual union all
select 2, '0023450' from dual
),
clients as (
select '001234500' client_account_no from dual union all -- will match to provider 1
select '000012345' from dual union all -- will match to provider 1
select '1234500000' from dual union all-- will match to provider 1
select '00234500' from dual union all -- will match to provider 2
select '23450' from dual union all -- will match to provider 2
select '000023450' from dual union all -- will match to provider 2
select '00002345' from dual union all -- will NOT match to provider 2
select '12345' from dual union all -- will match to provider 1
select '2345' from dual -- will NOT match to provider 2
)
select
providers.id, providers.pr_account_no , clients.client_account_no
from providers, clients
where regexp_like
( clients.client_account_no,
'0*'|| trim( leading '0' from providers.pr_account_no )||'0*'
);
As of the efficiency issue, we have 574 K rows in provider table, and 74K rows in extract file on a daily basis which is processed row by row.
About 5-7 minutes only on this matching procedure in 11g2, which is not acceptable according to the business request. Any one would have some hints to improve it? Appreciated in advance.
0 0
- Need help on a fuzzy match problem
- Problem J 小鑫need help
- UVa12210 - A Match Making Problem(排序)
- UVa 12210 - A Match Making Problem
- P1035I need help
- 小鑫need help
- A. Help Vasilisa the Wise 2 codeforces-problem-143A
- Fuzzy can not go on
- ZOJ Problem Set - 1049 I Think I Need a Houseboat
- ZOJ Problem Set - 1049 I Think I Need a Houseboat
- ZOJ Problem Set - 1049||I Think I Need a Houseboat
- HDU Problem 1203 I NEED A OFFER! 【01背包】
- Problem A.Ant on a Chessboard
- Problem A.Ant on a Chessboard
- Problem A.Ant on a Chessboard
- SDUTOJ3114-小鑫need help
- UVA_12210_A Match Making Problem
- module mod_muc_odbc for Mysql. Need your help.
- android 事件处理机制之requestDisallowInterceptTouchEvent
- apache httpcomponent 4.3 使用https示例
- ADF 按钮直接传参 使用 setPropertyActionListener
- android MaskFilter ColorFilter Xfermode PorterDuffXfermode PorterDuff.Mode
- java和C操作数组的一个小区别
- Need help on a fuzzy match problem
- 关于openoffice/libreoffice+swftools+jodconverter+flexpaper预览文件的一些问题
- Jquery 常用方法经典总结
- 返回结果未定义怎么解决
- jQuery -> 基于当前元素的遍历
- 第十三章 Delphi开发数据库应用程序概述(二)
- 常用芯片整理-74HC595D
- 离开学校如何自学成为网页设计师
- volatile 变量