Transparent Application Failover (TAF)

来源:互联网 发布:阿里云 接收短信 编辑:程序博客网 时间:2024/05/21 06:50

一、什么是TAF

Transparent Application Failover ,透明应用程序故障转移,当客户端和oracle数据库建立连接后,正在执行某个应用,此时客户端连接的某个实例突然宕机,那么客户端的连接就会被转移到其他健康实例上去, 对于客户端的taf,如果执行的是update,insert等事物语句,那么就会回退,如果是select语句,那么会转移到新的实例继续执行,这个过程不需要用户的的介入,是透明的。


TAF 是和下列类型的故障转移

  • Oracle Real Application Clusters

  • Replicated systems

  • Standby databases

  • Single instance Oracle database


二:net service参数:


ParameterDescription

FAILOVER

在连接监听的时候,如果FAILOVER=on,在连接地址列表中的第一个地址,监听连接失败,指示net去连接地址列表中的其他监听,当FAILOVER=on=off 时,只会连接地址列表中的第一地址的监听,

Connect-time failover is turned on by default for multiple address lists (ADDRESS_LIST), connect descriptors (DESCRIPTION), and multiple connect descriptors (DESCRIPTION_LIST).

When using a connect descriptor with a SERVICE_NAME, ensure that the value is not a GLOBAL_DBNAME in any SID_DESCentry, or a SID_NAME in any SID_DESC entry without a GLOBAL_DBNAME set.

LOAD_BALANCE            

当LOAD_BALANCE   =on 那么指示net progress  以随机的方式连接地址列表中地址,在不同监听之间平衡负载压力。当LOAD_BALANCE   =off

是让net progress 安顺序访问地址列表的监听,直到连接成功为止.

Client load balancing is turned on by default for multiple connect descriptors (DESCRIPTION_LIST).

SOURCE_ROUTE

When set to on, instructs Oracle Net to use each address in the order presented until the destination is reached. This parameter is required for reaching the destination using a specific route, that is, by specific computers. This parameter is used to enable connections to Oracle Connection Manager.


Creating a List of Listener Protocol Addresses

sales.us.example.com= (DESCRIPTION=  (ADDRESS_LIST=    (ADDRESS=(PROTOCOL=tcp)(HOST=sales1-server)(PORT=1521))    (ADDRESS=(PROTOCOL=tcp)(HOST=sales2-server)(PORT=1521)))  (CONNECT_DATA=    (SERVICE_NAME=sales.us.example.com)))


OptionParameter Setting

Try each address, in order, until one succeeds.

FAILOVER=on

Try each address, randomly, until one succeeds.

LOAD_BALANCE=on

FAILOVER=on

Try one address, selected at random.

LOAD_BALANCE=on

Use each address in order until destination reached.

SOURCE_ROUTE=on

Use only the first address.

LOAD_BALANCE=off

FAILOVER=off

SOURCE_ROUTE=off

sales.us.example.com= (DESCRIPTION=  (ADDRESS_LIST=   (LOAD_BALANCE=off)   (FAILOVER=ON)   (ADDRESS=(PROTOCOL=tcp)(HOST=sales1-server)(PORT=1521))    (ADDRESS=(PROTOCOL=tcp)(HOST=sales2-server)(PORT=1521)))  (CONNECT_DATA=(SERVICE_NAME=sales.us.example.com)))


三、TAF 参数

BACKUP  

A different net service name for backup connections. A backup should be specified when using preconnect to pre-establish connections.

DELAY

每次重试的时间间隔. 如果 RETRIES 已经指定,那么DELAY默认是1秒,If a callback function is registered, then this parameter is ignored.

METHOD

设置快速转移从  primary node 到 backup node:

  • basic:  在故障转移的时间间隔建立连接。此选项是故障转移前是不去对backup node做连接的,只有在故障发生时,才开始去连接backup node。

  • preconnect: 建立预连接。此选项比basic方式提供了更快的故障转移,。用户建立connect,对primary node和backup node 节点都会建立连接

    BASIC方式在Failover时会有时间延迟,PRECONNECT方式虽然没有时间延迟,但是建立多个冗余连接会消耗更多资源

RETRIES

故障转移重试的次数,如果指定了DELAY,那么默认重试5次,If a callback function is registered, then this parameter is ignored.

TYPE

故障转移的类型. Three types of Oracle Net failover functionality are available by default to Oracle Call Interface (OCI) applications:

  • session: Set to failover the session. If a user's connection is lost, then a new session is automatically created for the user on the backup. This type of failover does not attempt to recover select operations.

  • select: Set to enable users with open cursors to continue fetching on them after failure. However, this mode involves overhead on the client side in normal select operations.

  • none: This is the default. No failover functionality is used. This can also be explicitly specified to prevent failover from happening.

    对于select和session方式对于未提交的事务都会自动回滚,区别在于对select 语句的处理,对于select,用户正在执行的select语句会被转移到新的实例上,在新的节点上继续返回后续结果集,而已经返回的记录集则抛弃。

    假设用户正在节点1上执行查询,整个结果集共有100条记录,现在已从节点1上返回10条记录,这时节点1宕机,用户连接被转移到节点2上,如果是session模式,则需要重新执行查询语句;如果是select方式,会从节点2上继续返回剩下的90天记录,而已经从节点1返回的10条记录不会重复返回给用户,对于用户而言,感受不到这种切换,显然为了实现select 方式,Oracle 必须为每个session保存更多的内容,包括游标,用户上下文等,需要更多的资源也是用资源换时间的方案。



四、举例

TAF with Connect-Time Failover and Client Load Balancing

在客户端的tnsname.ora里添加如下


sales.us.example.com= (DESCRIPTION=  (LOAD_BALANCE=on)    ----随机访问一个监听,直到成功  (FAILOVER=on)   (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales2-server)  ---ip地址或者主机名       (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (FAILOVER_MODE=       (TYPE=select)               --select方式的故障转移       (METHOD=basic))))

此方式建立的的故障转移,只能故障转移select语句,例如  updateinsert 语句在failover后操作会回退。

update t1 set sal=sal+1
此时数据库出现故障,那么就会被故障转移到另外一个节点,这是提交就会保存哦
SQL> commit;
commit
*
第 1 行出现错误:
ORA-25402: 事务处理必须重新运行


以此方式建立的taf,是建立在客户端的tnsname.ora里,这种方式建立的taf有一定的弊端,就是当客户端连接的时候,如果某个实例是在mount状态,那么客户端也是可能连接上去,由于是mount状态,所以会报错,而且现在应用程序都是用中间层,在客户端配置容易出错误,所以 有了服务端的taf,能解决此问题。

TAF Retrying a Connection

sales.us.example.com= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (FAILOVER_MODE=       (TYPE=select)        (METHOD=basic)       (RETRIES=20)       (DELAY=15))))   ----TAF重新连接的时间,每次重试连接间隔15秒,总共重试20次,故障转移类型为select方式

TAF Pre-establishing a Connection

sales1.us.example.com= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_NAME=sales1)      (FAILOVER_MODE=       (BACKUP=sales2.us.example.com)  -----此backup指向下面红色字体       (TYPE=select)        (METHOD=preconnect))))sales2.us.example.com= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales2-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_NAME=sales2)     (FAILOVER_MODE=       (BACKUP=sales1.us.example.com)        (TYPE=select)        (METHOD=preconnect))))

通过测试此预连接方式,只能故障转移已经连接到数据库上的连接select。如果已经发生故障,再去连接sales1.us.example.com是连接不上的. 发生故障前连接到发生故障的数据库的update ,在发生故障时没有提交的会回退。

Specifying the Instance Role for Primary and Secondary Instance Configurations


instance_role  在下列情况使用:

           1、你想明确指定连接到第一个实例还是第二个实例上去,默认是连接到第一个实例,主实例上。

           2、你想使用taf预连接到第二实例上去。

The INSTANCE_ROLE 值有:

  • primary: 指定连接到 primary instance.

  • secondary: 指定连接到secondary instance.

  • any: 可以连接到连接地址里的任何一个实例。

Connection to Instance Role Type


sales_primary= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales2-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_ROLE=primary)))sales_secondary= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales2-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_ROLE=secondary)))


Connection To a Specific Instance

sales1= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_ROLE=any)     (INSTANCE_NAME=sales1)     (SERVER=dedicated)))sales2= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales2-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_ROLE=any)     (INSTANCE_NAME=sales2)     (SERVER=dedicated)))

TAF Pre-Establishing a Connection


sales1.example.com= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales1-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_ROLE=primary)      (FAILOVER_MODE=       (BACKUP=sales2.example.com)        (TYPE=select)       (METHOD=preconnect))))sales2.example.com= (DESCRIPTION=  (ADDRESS=       (PROTOCOL=tcp)         (HOST=sales2-server)         (PORT=1521))   (CONNECT_DATA=     (SERVICE_NAME=sales.us.example.com)      (INSTANCE_ROLE=secondary)))




通过语句可以查看是否已经发生了故障转移,以及故障转移的方式

SQL> SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)    FROM V$SESSION

    GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;
MACHINE FAILOVER_TYPE FAILOVER_M FAICOUNT(*)
---------------------------------------------------------------- ------------- ---------- --- ----------
rac1 NONE       NONE  NO     46

rac1 SESSION       BASIC YES      1


http://docs.oracle.com/cd/E11882_01/network.112/e10836/advcfg.htm#NETAG332