abinitio

来源：互联网发布：迅雷知乎编辑：程序博客网时间：2024/05/23 19:48

http://datawarehouse.ittoolbox.com/groups/technical-functional/abinitio-l/abinitio-data-profiler-1110349

I've never used abinitio data profiler, can some one throw some
light on how it will help in data quality, as I understand data
profiler provides statistical information. Internally it uses
abinitio graph. So why we use this at all. Can't we get all
those things by using various components.

yes, you can do it with abinitio components, but data profiler is for
people, who feels bored in constructing the graph and finding the
results.

Lets say for an example, you have a file with 40-50 fields, creating
graph to identifying distinct values of each fields would obviously feel
bored for its "copy Paste Nature".

Instead profiling does it in click of a button.

Its you to decide which is better for your enviroment.

Thanks
Satish Govindarajan

You could build it yourself. But of course that could be said for any software product. You have the time and budget? Go for it.

Original Message---------------
>I've never used abinitio data profiler, can some one throw some
> light on how it will help in data quality, as I understand data
> profiler provides statistical information. Internally it uses
> abinitio graph. So why we use this at all. Can't we get all
> those things by using various components.
>
>
>
>

Mark this reply as the best answer?
(Choose carefully, this can't be changed)
Yes | No

best wishes ab_wish ....

just do that in an abinitio graph ....

but not a bad idea .... things are pretty possible in graphs also ...

you can save a lot of money for ur company

Data Profiler is much more than this.
Get your vendor to give you a demo of it.
It's little if anything to do with being bored!!!!

Certainly, data profiling and data cleansing are important pre-ETL
functions.

Kalman Toth, Database, Data Warehouse & BI Architect
URL: http://www.sqlusa.com/businesswriting/ - The Best SQL Server 2005
Training in the World

Ab Initio Data Profiler is used for Analysing the data, it gives Statics of the data like null values,max,min,avg,.. .By using this we can analyse the data, what type of data it is. It will be used before send the sample data in production.
The Ab Initio works on UNIX box. We have to get the Source from UNIX.

Abinitio Data Profiler is like a complete software package,yes we
can do develop things that the profiler can do from the GDE but
profiler comes with a seperate UI interface called Abinitio EME
website which is totally connected to the profiler tool.

So when ever we profile a given dataset(can be a serial,multi
file,database table) the profiler setups the data source as a
dataset(a wizard is provided for this) and each dataset can be
categorized into projects,directories and for profiling each
dataset a profile name and path(The unix directory where you
want to run the profiler job) needs to be setup.

Depending upon the file size a dataset can be profiled in two
ways.

1)Using the 'run the dataset' tool wizard.
2)Deploying the dataset profile in the background->A *.ksh wil be
deployed for every job and can be run from any unix shell.This
is generally preferred for file sizes more than 100000 records.

When ever the profiling job completes the results will be
directly updated into the EME Website ,so clients who request
for profiling their data sources will be given access details to
access this website so that they can directly check the
profiling results from the website.

Have worked with this tool for an year,it gives out some
efficient and productive results which can be used by architects
for analysis but have to say its very slow in processing and the
performance degrades with increase in size of the dataset to be
profiled and only one dataset can be profiled at a time,so no
scope for parallel execution of multiple jobs.

But the main advantage of this tool is for its field based
profiling,gives out analysis on each and every field of the data
file/table rather than row based profiling.We can even implement
user defined contraints or conditions on a given field and check
for its behaviour in the profiling report.

Just a breif desc

Add functional dependency and cross field relationship,
validation transforms and integration with GDE and easy delivery
mechanism via the EME interface as other advantages. For BMR it
amazes our user when they get a feel of the "data" without
running numerous SQl queries. With the ability to unload results
using GDE it just opens another dimension for data quality.
Agreed that there is not much you can do to speed up profiles
but using a proper subset, removing sort computations, using
unloaded data and increasing parallelism do offer substantial
bump in throughput.

===============

http://datawarehouse.ittoolbox.com/groups/technical-functional/abinitio-l/abinitio-frameworks-2954243