Define Native (JDBC) Database Connections

来源:互联网 发布:搜了网络股票代码 编辑:程序博客网 时间:2024/06/08 08:48
Fine-Tune Pentaho Systems >Administer DI Server > Specify Data Connections for the DI Server


http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_admin_guide%2Ftask_create_new_connection_definition.html


Define Native (JDBC) Database Connections

Once you have chosen to use the Native (JDBC) access protocol, here are configuration and maintenance tasks you can perform.
  • Add Drivers
  • Create Connections
  • Add Database-Specific Options
  • Configure Database Connections
  • Define Connection Pooling
  • Connect to Clusters
  • Modify Connections
When you are done, please go on to the next stop on the Guide Post graphic.

Add Drivers

The DI Server and workstations need the appropriate driver to connect to the database that stores your data. Your database administrator, Chief Intelligence Officer, or IT manager should be able to provide the appropriate driver. If not, you can download drivers from your database vendor's website. See the Supported Technologies to ensure that your database and its driver are supported by Pentaho.

Note: If you are using a Microsoft SQL Server (MSSQL), you might need to use an alternative, non-vendor-supported driver called JTDS. ContactPentaho support to ensure that you are adding the correct driver.

Installing Drivers

Once you have the correct driver, copy it to these directories.

DI Server: /pentaho/server/data-integration-server/tomcat/webapps/pentaho-di/WEB-INF/lib/.

Spoon: data-integration/lib

You must restart Spoon for the driver to take effect.

There should be only one driver for your database in this directory. Ensure that there are no other versions of the same vendor's driver in this directory. If there are, back up the old driver files and remove them to avoid version conflicts. This is a concern when you are adding a driver for the same database type as your Pentaho solution repository. If you have any concerns about how to proceed, contactPentaho support.

When the driver files are in place restart the server.

Connecting to a Microsoft SQL Server Using Integrated or Windows Authentication

If you are using a Microsoft SQL Server (MSSQL), you might need to use an alternative, non-vendor-supported driver called JTDS. ContactPentaho support to ensure that you are adding the correct driver

For Microsoft Windows, most JDBC drivers support Type 2 integrated authentication through theintegratedSecurity connection string property. To use integrated authentication, copy thesqljdbc_auth.dll file to all machines and directories to which you copied the JDBC driver. You can find this file in this location.

<installation directory>\sqljdbc_<version>\<language>\auth\
If running:Use the sqljdbc_auth.dll file here:32-bit Java Virtual Machine (JVM) even if the operating system is version x64x86 folder64-bit JVM on a x64 processorx64 folder64-bit JVM on an Itanium processorIA64 folder

Specify Native (JDBC) Connection Information

Before you can create the connection, you must have installed the appropriate JDBC driver for your particular data.

Pentaho Data Integration (PDI) allows you to define connections to multiple databases provided by multiple database vendors (MySQL, Oracle, PostgreSQL, and many more). PDI ships with the most suitable JDBC drivers for PostgreSQL, our default database.

Note:

Pentaho recommends that you avoid using ODBC connections. The ODBC to JDBC bridge driver does not always provide an exact match and adds another level of complexity that may affect performance. The only time you may have to use ODBC is if there is no available JDBC driver. For details, this article explains "Why you should avoid ODBC."http://wiki.pentaho.com/pages/viewpage.action?pageId=14850644.

When you define a database connection, the connection information (for example, the user name, password, and port number) is stored in the DI Repository and is available to other users when they connect to the repository. If you are not using the DI Repository, the database connection information is stored in the XML file associated with the transformation or job.

Connections that are available for use with a transformation or job are listed under theDatabase connections node in the View pane in Spoon.

You must have information about your database, such as your database type, port number, user name and password, before you define a JDBC connection. You can also set connection properties using variables. Variables provide you with the ability to access data from multiple database types using the same transformations and jobs.
Note: Make sure to use clean ANSI SQL that works on all used database types.

  1. From within Spoon, navigate to theView tab of the Explorer pane. Double-click on theDatabase connections folder. The Database Connection dialog box appears.
    Section NameWhat to DoConnection NameType name that uniquely identifies your new connectionConnection TypeSelect the type of database to which you are connectingAccessSelect your method of access. Available access types depend on the connecting database type.Host NameType the name of the server that hosts the database to which you are connecting. Alternatively, you can specify the host by IP address.Database NameEnter the name of the database to which you are connecting. If you are using a ODBC connection, enter the Data Source Name (DSN) in this field.Port NumberEnter the TCP/IP port number if it is different from the default.User nameOptionally, type the user name used to connect to the database.PasswordOptionally, type the password used to connect to the database.
  2. Click Test.A confirmation message displays if Spoon is able to establish a connection with the target database.
  3. Click OK to save your entries and exit the Database Connection dialog box.
  4. From within the View tab, right-click on the connection and selectShare from the list that appears. This shares the connection with your users. They will be able to select the shared connection. From within theView tab, click Explore to open theDatabase Explorer for an existing connection. This shows you the schemas and tables inside the connection.

Add Database-Specific Options

Add database-specific options by adding parameters to the generated URL.

  1. From within the Database Connection dialog box, select Options.
  2. Select the first available row in the parameter table.
  3. Choose the database type and enter a valid parameter name and its corresponding value.
    Note: For more database-specific configuration help, clickHelp. A new browser opens and displays additional information about configuring the JDBC connection for the currently selected database type.
  4. Click OK to save your entries.

Advanced Configuration of Database Connections

The Advanced option in the Database Connection dialog box allows you to configure properties that are, for most part, associated with how SQL is generated. These options allow you to set a standard across all of your SQL tools, ETL tools and design tools. All database table names and column names are always upper case or lower case no matter what users do in the tools.

FeatureDescriptionSupports boolean data typesInstructs PDI to use native boolean data types if supported by the database.Quote all in databaseEnables the databases to use a case-sensitive tablename (for example MySQL is case-sensitive on Linux but not case sensitive on Windows). If you quote the identifiers, the databases will use a case sensitive tablename.Force all to lower caseEnables all identifiers to lower case.Force all to upper caseEnables all identifiers to upper case.Preferred schema name...Enter the preferred schema name (for example, MYSCHEMA).Enter SQL name...Enter the SQL statement used to initialize a connection.

Pentaho has implemented a database-specific quoting system that allows you to use any name or character acceptable to the supported databases' naming conventions.

Pentaho Data Integration contains a list of reserved words for most of the supported databases. To ensure that quoting behaves correctly, Pentaho has implemented a strict separation between the schema (user/owner) of a table and the table name itself. Doing otherwise makes it impossible to quote tables or fields with one or more periods in them correctly. Placing periods in table and field names is common practice in some ERP systems (for example, fields such as "V.A.T.")

To avoid quoting-related errors, a rule stops the Pentaho Data Integration from performing quoting activity when there is a start or end quote in the table name or schema. This allows you to specify the quoting mechanism yourself.

Define Connection Pooling

Instead of having a connection open for each individual step in a transformation, you can set up a connection pool and define options like the initial pool size, maximum pool size, and connection pool parameters. For example, you might start with a pool of ten or fifteen connections, and as you run jobs or transformations, the unused connections drop off. Pooling helps control database access, especially if you have transformations that contain many steps and that require a large number of connections. Pooling can also be implemented when your database licensing restricts the number of active concurrent connections.

This table shows descriptions of the pooling options.

FeatureDescriptionEnable connection poolingEnables connection poolingPool SizeSets the initial size of the connection pool; sets the maximum number of connections in the connection poolParametersAllows you to define additional custom pool parameters; click Restore Defaults when appropriateDescriptionAllows you to add a description for your parameters
  1. Select Enable Connection Pooling.
  2. Type the initial pool size in the Initial: area and the maximum pool size in the Maximum: area.
  3. Select the parameters you need from within theParameters: area. A Description of the parameter appears in theDescription: area when you select a check box.
  4. Click OK to save your selections and close theDatabase Connection dialog box.

Connect to Clusters

This option allows you to enable clustering for the database connection and create connections to the data partitions. To create a new data partition, enter aPartition ID and the Host Name,Port, Database, User Name, and Password for connecting to the partition.

Modify Connections

This table contains information about other database-related connection tasks you can perform.

TaskDescriptionEdit a ConnectionRight-click on the connection name and select Edit.Duplicate a ConnectionRight-click on the connection name and select Duplicate.Copy to a ClipboardAllows you to copy the XML defining the step to the clipboard. You can then paste this step into another transformation. Double-click on the connection name in the tree or right-click on the connection name and selectCopy to Clipboard.Delete a ConnectionDouble-click on the connection name in the tree or right-click on the connection name and selectDelete.SQL EditorTo execute SQL command against an existing connection, right-click on the connection name and selectSQL Editor.Clear the Database CacheTo speed up connections Pentaho Data Integration uses a database cache. When the information in the cache no longer represents the layout of the database, right-click on the connection in the tree and selectClear DB Cache.... This command is commonly used when databases tables have been changed, created or deleted.Share a ConnectionRather than redefining a connection each time you create a job or transformation on yourlocal device, right-click and select Share to share the connection information among jobs and transformations.Exploring the DatabaseDouble-click on the connection name in the tree or right-click on the connection name and selectExplore.Show dependenciesRight-click a connection name and select Show dependencies to see all of the transformations and jobs that use this database connection.
0 0