Changing the database characterset from US7ASCII to WE8MSWIN1252

来源：互联网发布：淘宝在线人工客服编辑：程序博客网时间：2024/05/16 01:01

This note looks at the procedure for changing the characterset of a 10g database from US7ASCII to WE8MSWIN1252. Keep this in mind that in Oracle 11g, the DBCA does not show US7ASCII as one of the ‘recommended’ charactersets.

All characters included in the US7ASCII character set are defined in WE8MSWIN1252 with the same code point, which means that WE8MSWIN1252 is a binary or “strict” superset of US7ASCII.

Remember – we cannot go the opposite way and change the characterset from WE8MSWIN1252 to US7ASCII.

There are a number of Metalink notes which discuss in a lot of detail what I have mentioned below, but I had quite a tough time understanding clearly what had been mentioned in those notes which are quite detailed with too many sections and sub-sections, so I thought I’d try to hopefully make things a bit more clear and concise.

Note: 555823.1 – Changing US7ASCII or WE8ISO8859P1 to WE8MSWIN1252
Note: 444701.1 – Csscan output explained
Note: 745809.1 – Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
Note: 225912.1 – Changing the Database Character Set ( NLS_CHARACTERSET ) [ID 225912.1]

In my opinion, changing the database characterset if quite a tricky task and it is best to raise an SR for this task and involve Oracle Support.

Basically there are three methods by which you can change the characterset of a database and the method will vary a bit depending on the Oracle database version.

a) Conventional Export and Import
b) If the database is 8i or 9i via the ALTER DATABASE CHARACTERSET command
c) For 10g and upwards, we need to use the csalter.

Kindly note that whether we can use Export/Import method to change the characterset or not will depend on the particular output obtained after running csscan (Character Scanner Utility). So even if we are planning to use Export/Import method, we should install and run the csscan utility regardless.

Installing the csscan utility

So the first step is to install the csscan in the database. Follow note 745809.1 for this.

We need to run the script csminst.sql which is located at $ORACLE_HOME/rdbms/admin.

This script will create the CSMIG user account in the database and the script creates the user and assigns a default tablespace of SYSTEM. A number of CSM$ tables are created in the CSMIG schema and it is a good practice to allocate a dedicated tablespace for all CSMIG objects and not use the SYSTEM tablespace.

So I have made a backup of the csminst.sql script and made the following changes:

alter user csmig default tablespace SYSTEM

changed to …

alter user csmig default tablespace CSMIG_DATA

and also removed the lines which are apparently not needed

grant READ on directory log_file_dir to system
/
grant READ on directory data_file_dir to system
/

Running csscan and some parameters explained

Note: Before running the csscan, make a note of all the INVALID objects in the database and also purge the DBA_RECYCLEBIN;

Also, the csscan will do a full table scan of all the tables in the database and this could cause a performance impact. So run the csscan at an appropriate time period.

When we are converting the chracterset from US7ASCII to WE8MSWIN1252, we will need to run the csscan several times.

Let us look at the first csscan command we will run.

csscan \”sys/syspwd as sysdba\” FULL=Y FROMCHAR=US7ASCII TOCHAR=US7ASCII LOG=US7check CAPTURE=Y ARRAY=1000000 PROCESS=2

csscan will produce three files with the . err, .txt and .out extensions and the name of each will depend on the value used for the LOG parameter.

So in this case because we have used LOG= US7check, the files US7check.txt, US7check.out and US7check.err will be created in the directory where we are running the csscan command from.

The .out file is basically a log of all the scan activity performed .
The .txt file is the most important file as it has the Database Scan Summary Report
The .err file will list all the ‘exception’ rows – rows with lossy data or truncated data or convertible data.

The PROCESS parameter can be increased if we have adequate resources and determines how many processes or database sessions will be created to scan the tables.

As an alternative to FULL=Y, we can also user the parameters USER or TABLE to limit the scan to one schema or one set of tables. But we need to use FULL=Y when we run the last csscan just before actually converting the data.

In case we need to send the .err files to Oracle Support and would like to limit the size, we can use SUPPRESS parameter. For example specifying SUPPRESS=1000 will log information for a maximum of 1000 rows per table. But this will not affect the output of the .txt file in any way. So for the first scan done on a big database, use the SUPPRESS parameter.

The Conversion Summary section

The Conversion Summary section of the .txt file is the most important section of the Scan Summary Report and has 4 columns showing the possible status of the data

Changeless: All the characters use the same codepoints in the new characterset as compared to the old characterset and no action needs to be taken. The csalter or ALTER DATABASE CHARACTERSET command will handle that for us.

Convertible: The data is valid, but the characters will change to a different code point in the new characterset. In this case for any application data reported as convertible, we need to export the data, truncate the tables or delete the data, change the characterset using csalter or ALTER DATABASE and then import after characterset has been changed.

Truncation: In case of going from a single byte characterset to a multiple byte characterset, the data will grow and we will need to increase the column size in this case. In this case, the .err file will tell us by how much the data is going to grow post conversion and we need to accordingly modify the column sizes of the affected tables.

Lossy: If we see data in the Lossy column, then it means that the either the data is not a valid code point for the current characterset, or the proposed target characterset does not define that data. If we see data listed as lossy, we CANNOT use Export/Import to change characterset, but need to use either ALTER DATABASE CHARACTERSET command for 8i and 9i and csalter for 10g and upwards.

Lossy data found when using FROMCHAR=US7ASCII TOCHAR=US7ASCII

When we look at the US7check.txt file we find that there is LOSSY data.

[Application Data Conversion Summary]

Datatype Changeless Convertible Truncation Lossy

--------------------- ---------------- ---------------- ---------------- ----------------

VARCHAR2 10,538,698,007 0 0 10,394

CHAR 922,484,552 0 0 0

LONG 24,079 0 0 0

CLOB 1,269,114 0 0 75,293,485

VARRAY 0 0 0 0

--------------------- ---------------- ---------------- ---------------- ----------------

Total 11,462,475,752 0 0 75,303,879

This means that the data is not a valid code point for the source NLS_CHARACTERSET (US7ASCII).

The presence of this lossy data means that we CANNOT use Export/Import to do a characterset conversion and have to use the csalter method instead.

Do we still have lossy data if we use FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252?

We then run this csscan command

csscan full=y FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=PROD_1252_TO_1252_beforeconvert ARRAY=1024000 CAPTURE=Y process=4

From the PROD_1252_TO_1252_beforeconvert.txt file we can see that now there is no application data which is reported as lossy.

[Application Data Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy

--------------------- ---------------- ---------------- ---------------- ----------------

VARCHAR2                10,538,740,813                0                0                0

CHAR                       922,484,552                0                0                0

LONG                            34,823                0                0                0

CLOB                        76,562,599                0                0                0

VARRAY                               0                0                0                0

--------------------- ---------------- ---------------- ---------------- ----------------

Total                   11,537,822,787                0                0                0

The csscan output from 1252 to 1252 shows that the conversion is clean. It shows that the all the characters in the data is completely known by WE8MSWIN1252 characterset.

Changeless means that current CHAR, VARCHAR2, LONG and CLOB data even in this US7ASCII database is all within the defined code range of 1252

These lines in the Database Scan Summary report clarify the fact that we can now go ahead and change the characterset of the database. If we run Csalter without these conditions met then you will see messages like ” Unrecognized convertible data found in scanner result ” in the Csalter output and Csalter will abort.

[Scan Summary]
All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set

Running the final csscan before the actual conversion

We will now run our final csscan.

csscan FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=PROD_1252_TO_1252_convert ARRAY=1024000 CAPTURE=Y

Before we can run the csalter script, we need to have a ‘clean’ FULL=Y csscan result that must have beencompleted in the past 7 days prior to running csalter.

A ‘clean’ scan means that there is no convertible (except Data Dictionary CLOB data which can be convertible and will be handled by Csalter), truncation or lossy data in the database.

Performing the actual characterset conversion.

A full backup of the database should be taken before this operation in case we need to revert in case the characterset change operation fails for any reason.

Also, the characterset change operation needs to be performed with the database in RETRICTED SESSION mode, so it will mean downtime to the application.

Note – changing the characterset is not dependant on the volume of data in the tables, but depends on the number of columns and how many tables have the ‘exception’ data found out by running the csscan.

Shutdown and startup the database in restricted mode.

Run the csalter.plb which will prompt us to enter a confirmation.

SQL> @?/rdbms/admin/csalter.plb

0 rows created.

Function created.

Function created.

Procedure created.

This script will update the content of the Oracle Data Dictionary.

Please ensure you have a full backup before initiating this procedure

Would you like to proceed (Y/N)?Y

old   6:     if (UPPER('&conf') <> 'Y') then

new   6:     if (UPPER('Y') <> 'Y') then

Checking data validility...

begin converting system objects

PL/SQL procedure successfully completed.

Alter the database character set...

CSALTER operation completed, please restart database

PL/SQL procedure successfully completed.

0 rows deleted.

Function dropped.

Function dropped.

Procedure dropped.

Shutdown and restart the database ane verify that the characterset has been changed to WE8MSWIN1252

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup;
ORACLE instance started.

SQL> select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';

VALUE
----------------------------------------
WE8MSWIN1252