Need help on a fuzzy match problem

来源:互联网 发布:网络机柜配线架的安装 编辑:程序博客网 时间:2024/06/05 23:45
A fuzzy match is request to perform a provider match while loading extract file from clients, the code is easy but we met an efficiency problem.

On the extract file, each the provider has a not null account_no as their identity; also, we have a column pr_account_no ( with index ) in our production. 

The problem is, both of them are varchar2 data type and have both leading '0's, plus, the extract file may have trailing '0's too. so all the following account_no in extract file should be matched  with the provider '00012345' in our system:

'001234500', '12345', '000012345', '1234500000', etc.

The current code is like this:

with 
  providers as (
   select 1 id'0012345' pr_account_no from dual union all
   select 2,      '0023450' from dual
    ),
clients as (
  select '001234500' client_account_no from dual union all  -- will match to provider 1
  select '000012345'   from dual union all -- will match to provider 1
  select '1234500000' from dual union all-- will match to provider 1
  select '00234500'    from dual union all -- will match to provider 2
  select '23450'       from dual union all  -- will match to provider 2
  select '000023450'   from dual union all -- will match to provider 2
  select '00002345'    from dual union all -- will NOT match to provider 2
  select '12345'       from dual union all -- will match to provider 1
  select '2345'       from dual           --  will NOT match to provider 2
)
select  
   providers.id, providers.pr_account_no , clients.client_account_no
from  providers, clients 
where  regexp_like
     ( clients.client_account_no,
          '0*'|| trimleading '0' from providers.pr_account_no )||'0*'
     );

As of the efficiency issue,  we have  574 K rows in provider table, and 74K rows in extract file on a daily basis which is processed row by row. 

About 5-7 minutes only on this matching procedure in 11g2, which is not acceptable according to the business request. Any one would have some hints to improve it? Appreciated in advance.
0 0