Abinitio Data Profiler is like a complete software package,yes we
can do develop things that the profiler can do from the GDE but
profiler comes with a seperate UI interface called Abinitio EME
website which is totally connected to the profiler tool.
So when ever we profile a given dataset(can be a serial,multi
file,database table) the profiler setups the data source as a
dataset(a wizard is provided for this) and each dataset can be
categorized into projects,directories and for profiling each
dataset a profile name and path(The unix directory where you
want to run the profiler job) needs to be setup.
Depending upon the file size a dataset can be profiled in two
ways.
1)Using the 'run the dataset' tool wizard.
2)Deploying the dataset profile in the background->A *.ksh wil be
deployed for every job and can be run from any unix shell.This
is generally preferred for file sizes more than 100000 records.
When ever the profiling job completes the results will be
directly updated into the EME Website ,so clients who request
for profiling their data sources will be given access details to
access this website so that they can directly check the
profiling results from the website.
Have worked with this tool for an year,it gives out some
efficient and productive results which can be used by architects
for analysis but have to say its very slow in processing and the
performance degrades with increase in size of the dataset to be
profiled and only one dataset can be profiled at a time,so no
scope for parallel execution of multiple jobs.
But the main advantage of this tool is for its field based
profiling,gives out analysis on each and every field of the data
file/table rather than row based profiling.We can even implement
user defined contraints or conditions on a given field and check
for its behaviour in the profiling report.
Just a breif desc