字符集问题的初步探讨(五)-导出文件字符集
来源:互联网 发布:mac air需要关机吗 编辑:程序博客网 时间:2024/05/16 09:30
我们知道在导出文件中,记录着导出使用的字符集id,通过查看导出文件头的第2、3个字节,我们可以找到16进制表示的字符集ID,在Windows上,
我们可以使用UltraEdit等工具打开dmp文件,查看其导出字符集::
在Unix上我们可以通过以下命令来查看:
cat expdat.dmp | od -x | head |
Oracle提供标准函数,对字符集名称及ID进行转换:
SQL> select nls_charset_id('ZHS16GBK') from dual;NLS_CHARSET_ID('ZHS16GBK')-------------------------- 8521 row selected.SQL> select nls_charset_name(852) from dual;NLS_CHAR--------ZHS16GBK1 row selected.十进制转换十六进制:SQL> select to_char('852','xxxx') from dual;TO_CH----- 3541 row selected. |
对应上面的图中第2、3字节,我们知道该导出文件字符集为ZHS16GBk.
查询数据库中有效的字符集可以使用以下脚本:
col nls_charset_id for 9999col nls_charset_name for a30col hex_id for a20select nls_charset_id(value) nls_charset_id, value nls_charset_name,to_char(nls_charset_id(value),'xxxx') hex_idfrom v$nls_valid_valueswhere parameter = 'CHARACTERSET'order by nls_charset_id(value)/ |
输出样例如下:
NLS_CHARSET_ID NLS_CHARSET_NAME HEX_ID .................................. |
在很多时候,当我们进行导入操作的时候,已经离开了源数据库,这时如果目标数据库的字符集和导出文件不一致,很多时候就需要进行特殊处理,
以下介绍几种方法,主要以US7ASCII和ZHS16GBK为例
1. 源数据库字符集为US7ASCII,导出文件字符集为US7ASCII或ZHS16GBK,目标数据库字符集为ZHS16GBK
在Oracle92中,我们发现对于这种情况,不论怎样处理,这个导出文件都无法正确导入到Oracle9i数据库中,这可能是因为Oracle9i的编码方案发生了较大改变。
以下是我们所做的简单测试,其中导出文件命名规则为:
S-Server ,后跟Server字符集
C-client , 后跟导出操作时客户端字符集
导入时客户端字符集设置在命令行完成,限于篇幅,我们省略了部分测试过程。
对于Oracle9iR2,我们的测试结果是US7ASCII字符集,不管怎样转换,都无法正确导入ZHS16GBK字符集的数据库中。
在进行导入操作时,如果字符不能正常转换,Oracle数据库会自动用一个”?”代替,也就是编码63。
E:/nls2>set NLS_LANG=AMERICAN_AMERICA.US7ASCIIE:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle tables=testImport: Release 9.2.0.4.0 - Production on Mon Nov 3 17:14:39 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionExport file created by EXPORT:V09.02.00 via conventional pathimport done in US7ASCII character set and AL16UTF16 NCHAR character setimport server uses ZHS16GBK character set (possible charset conversion). . importing table "TEST" 2 rows importedImport terminated successfully without warnings.E:/nls2>sqlplus eygle/eygleSQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:14:50 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select name,dump(name) from test;NAME DUMP(NAME)-----------------------------???? Typ=1 Len=4: 63,63,63,63test Typ=1 Len=4: 116,101,115,1162 rows selected.SQL> exitDisconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionE:/nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBKE:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle tables=test ignore=yImport: Release 9.2.0.4.0 - Production on Mon Nov 3 17:15:28 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionExport file created by EXPORT:V09.02.00 via conventional pathimport done in ZHS16GBK character set and AL16UTF16 NCHAR character setexport client uses US7ASCII character set (possible charset conversion). . importing table "TEST" 2 rows importedImport terminated successfully without warnings.E:/nls2>sqlplus eygle/eygleSQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:15:34 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select name,dump(name) from test;NAME DUMP(NAME)--------------------------------------------------------------------------------???? Typ=1 Len=4: 63,63,63,63test Typ=1 Len=4: 116,101,115,116???? Typ=1 Len=4: 63,63,63,63test Typ=1 Len=4: 116,101,115,1164 rows selected.SQL> drop table test;Table dropped.SQL> exitDisconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionE:/nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBKE:/nls2>imp eygle/eygle file=Sus7ascii-Czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=yImport: Release 9.2.0.4.0 - Production on Mon Nov 3 17:17:21 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionExport file created by EXPORT:V09.02.00 via conventional pathimport done in ZHS16GBK character set and AL16UTF16 NCHAR character set. . importing table "TEST" 2 rows importedImport terminated successfully without warnings.E:/nls2>sqlplus eygle/eygleSQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:17:30 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select name,dump(name) from test;NAME DUMP(NAME)----------------------------------------------???? Typ=1 Len=4: 63,63,63,63test Typ=1 Len=4: 116,101,115,1162 rows selected.SQL> exitDisconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionE:/nls2>set NLS_LANG=AMERICAN_AMERICA.US7ASCIIE:/nls2>imp eygle/eygle file=Sus7ascii-Czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=yImport: Release 9.2.0.4.0 - Production on Mon Nov 3 17:18:00 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionExport file created by EXPORT:V09.02.00 via conventional pathimport done in US7ASCII character set and AL16UTF16 NCHAR character setimport server uses ZHS16GBK character set (possible charset conversion)export client uses ZHS16GBK character set (possible charset conversion). . importing table "TEST" 2 rows importedImport terminated successfully without warnings.E:/nls2>sqlplus eygle/eygleSQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:18:08 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select name,dump(name) from test;NAME DUMP(NAME)----------------------------------------???? Typ=1 Len=4: 63,63,63,63test Typ=1 Len=4: 116,101,115,116???? Typ=1 Len=4: 63,63,63,63test Typ=1 Len=4: 116,101,115,1164 rows selected.SQL> |
对于这种情况,我们可以通过使用Oracle8i的导出工具,设置导出字符集为US7ASCII,导出后修改第二、三字符,修改 0001 为
0354,这样就可以将US7ASCII字符集的数据正确导入到ZHS16GBK的数据库中。
修改导出文件:
导入修改后的导出文件:
E:/nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBKE:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii-exp817.dmp fromuser=eygle touser=eygle tables=testImport: Release 9.2.0.4.0 - Production on Mon Nov 3 17:37:17 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionExport file created by EXPORT:V08.01.07 via conventional pathimport done in ZHS16GBK character set and AL16UTF16 NCHAR character setexport server uses UTF8 NCHAR character set (possible ncharset conversion). . importing table "TEST" 2 rows importedImport terminated successfully without warnings.E:/nls2>sqlplus eygle/eygleSQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:37:23 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select name,dump(name) from test;NAME DUMP(NAME)--------------------------------------------------------------------------------测试 Typ=1 Len=4: 178,226,202,212Test Typ=1 Len=4: 116,101,115,1162 rows selected.SQL> |
2. 使用create database的方法
如果导出文件使用的字符集是US7ASCII,目标数据库的字符集是ZHS16GBK,我们可以使用create database的方法来修改,具体如下:
SQL> col parameter for a30SQL> col value for a30SQL> select * from v$nls_parameters;PARAMETER VALUE------------------------------ ------------------------------NLS_LANGUAGE AMERICANNLS_TERRITORY AMERICANLS_CURRENCY $NLS_ISO_CURRENCY AMERICANLS_NUMERIC_CHARACTERS .,NLS_CALENDAR GREGORIANNLS_DATE_FORMAT DD-MON-RRNLS_DATE_LANGUAGE AMERICANNLS_CHARACTERSET ZHS16GBKNLS_SORT BINARY……………….19 rows selected.SQL> create database character set us7ascii;create database character set us7ascii*ERROR at line 1:ORA-01031: insufficient privilegesSQL> select * from v$nls_parameters;PARAMETER VALUE------------------------------ ------------------------------NLS_LANGUAGE AMERICANNLS_TERRITORY AMERICANLS_CURRENCY $NLS_ISO_CURRENCY AMERICANLS_NUMERIC_CHARACTERS .,NLS_CALENDAR GREGORIANNLS_DATE_FORMAT DD-MON-RRNLS_DATE_LANGUAGE AMERICANNLS_CHARACTERSET US7ASCIINLS_SORT BINARY…………..19 rows selected.SQL> exitDisconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionE:/nls2>set nls_lang=AMERICAN_AMERICA.US7ASCIIE:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygleImport: Release 9.2.0.4.0 - Production on Sun Nov 2 14:53:26 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionExport file created by EXPORT:V09.02.00 via conventional pathimport done in US7ASCII character set and AL16UTF16 NCHAR character setimport server uses ZHS16GBK character set (possible charset conversion). . importing table "TEST" 2 rows importedImport terminated successfully without warnings.E:/nls2>sqlplus eygle/eygleSQL*Plus: Release 9.2.0.4.0 - Production on Sun Nov 2 14:53:35 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select * from test;NAME----------测试test2 rows selected. |
我们看到,当发出create database character set us7ascii;命令时,数据库v$nls_parameters中的字符集设置随之更改,该参数影响导入进程,
更改后可以正确导入数据,重起数据库后,该设置恢复。
提示:v$nls_paraemters来源于x$nls_parameters,该动态性能视图影响导入操作;而nls_database_parameters来源于props$数据表,影响数据存储。
3. Oracle提供的字符扫描工具csscan
我们说以上的方法只是应该在不得已的情况下使用,其本质是欺骗数据库,强制导入数据,可能损失元数据。
如果要确保数据的完整性,应该使用csscan扫描数据库,找出所有不兼容的字符,然后通过编写相应的脚本及代码,在转换之后进行更新,确保数据的正确性。
我们简单看一下csscan的使用。
要使用csscan之前,需要以sys用户身份创建相应数据字典对象:
E:/nls2>sqlplus "/ as sysdba"SQL*Plus: Release 9.2.0.4.0 - Production on Sun Nov 2 19:42:07 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Connected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionSQL> select instance_name from v$intance;select instance_name from v$intance *ERROR at line 1:ORA-00942: table or view does not existSQL> select instance_name from v$instance;INSTANCE_NAME----------------penny1 row selected.SQL> @?/rdbms/admin/csminst.sqlUser created.Grant succeeded.……….. |
这个脚本创建相应用户(csmig)及数据字典对象,扫描信息会记录在相应的数据字典表里。
我们可以在命令行调用这个工具对数据库进行扫描:
E:/nls2>csscan FULL=Y FROMCHAR=ZHS16GBK TOCHAR=US7ASCII LOG=US7check.log CAPTURE=Y ARRAY=1000000 PROCESS=2Character Set Scanner v1.1 : Release 9.2.0.1.0 - Production on Sun Nov 2 20:24:45 2003Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.Username: eygle/eygleConnected to:Oracle9i Enterprise Edition Release 9.2.0.4.0 - ProductionWith the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.4.0 - ProductionEnumerating tables to scan.... process 1 scanning SYS.SOURCE$[AAAABHAABAAAAIRAAA]. process 2 scanning SYS.ATTRIBUTE$[AAAAEoAABAAAAhZAAA]. process 2 scanning SYS.PARAMETER$[AAAAEoAABAAAAhZAAA]. process 2 scanning SYS.METHOD$[AAAAEoAABAAAAhZAAA]……... process 2 scanning SYSTEM.DEF$_AQERROR[AAAA8fAABAAACWJAAA]. process 1 scanning WMSYS.WM$ENV_VARS[AAABeWAABAAAFMZAAA]………………….. process 2 scanning SYS.UGROUP$[AAAAA5AABAAAAGpAAA]. process 2 scanning SYS.CON$[AAAAAcAABAAAACpAAA]. process 1 scanning SYS.FILE$[AAAAARAABAAAABxAAA]Creating Database Scan Summary Report...Creating Individual Exception Report...Scanner terminated successfully. |
然后我们可以检查输出的日志来查看数据库扫描情况:
Database Scan Individual Exception Report[Database Scan Parameters]Parameter Value ------------------------------ ------------------------------------------------Scan type Full database Scan CHAR data? YES Current database character set ZHS16GBK New database character set US7ASCII Scan NCHAR data? NO Array fetch buffer size 1000000 Number of processes 2 Capture convertible data? YES ------------------------------ ------------------------------------------------[Data Dictionary individual exceptions][Application data individual exceptions]User : EYGLETable : TESTColumn: NAMEType : VARCHAR2(10)Number of Exceptions : 1 Max Post Conversion Data Size: 4 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------AAABpIAADAAAAAMAAA lossy conversion 测试 ------------------ ------------------ ----- ------------------------------ |
不能转换的数据将会被记录下来,我们可以根据这些信息在转换之后,对数据进行相应的更新,确保转换无误。
- 字符集问题的初步探讨(五)-导出文件字符集
- 字符集问题的初步探讨(五)-导出文件字符集
- 字符集问题的初步探讨(五)-导出文件字符集
- 字符集问题的初步探讨(五)----如何识别导出文件的字符集
- 字符集问题的初步探讨(五)
- 字符集问题的初步探讨(五)
- 字符集问题的初步探讨
- 字符集问题的初步探讨
- 字符集问题的初步探讨
- 字符集问题的初步探讨(四)-- 导入导出及转换
- 字符集问题的初步探讨(四)-导入导出及转换
- 字符集问题的初步探讨(四)-导入导出及转换
- 字符集问题的初步探讨(四)-导入导出及转换
- 字符集问题的初步探讨(一)--字符集的基本知识
- 字符集问题的初步探讨(二)--数据库的字符集
- 字符集问题的初步探讨(三)--字符集的更改
- 字符集问题的初步探讨(一)-字符集的基本知识
- 字符集问题的初步探讨(二)-数据库的字符集
- PHP已从爱好者步进入企业级应用
- 字符集问题的初步探讨(二)-数据库的字符集
- 字符集问题的初步探讨(三)-字符集的更改
- php 压缩内容输出
- 字符集问题的初步探讨(四)-导入导出及转换
- 字符集问题的初步探讨(五)-导出文件字符集
- 字符集问题的初步探讨(六)-乱码的产生
- 字符集问题的初步探讨(七)-字符集更改的内部操作
- 使用XFire和gSoap传送多个附件
- www.eygle.com
- in和exists的区别与SQL执行效率分析
- TCP/IP协议
- 让你的PLSQL Developer方便地访问不同字符集的数据库
- i约瑟夫环