kettle spoon excel
来源:互联网 发布:西门子触摸屏编程软件 编辑:程序博客网 时间:2024/06/12 01:14
使用kettle spoon etl 工具将oracle中大表抽取到excel中
因为数据库表中有clob字段,整个表数据量500万左右,表大小有5.3G,
根据业务需求将数据准备到另外新的表中,create table * as select ***。这样kettle查询速度快点,避免一堆关联查询。
然后采用本地kettle 编辑好脚本,测试发现本地抽取到excel效率只有40条/秒
果断放到服务器上面跑kettle的脚本。喜由天降,每分钟4w条的速度还是可以接受的。160w数据轻松导出!
但是有遇到一些问题,数据导出超过65536条就报下面错误:
[root@hz_gg57 dmb]# sh run_nielsen.sh
run_nielsen.sh: line 1: #!/bin/sh: No such file or directory
2016/07/08 12:54:54 - Kitchen - Start of run.
2016/07/08 12:54:54 - RepositoriesMeta - Reading repositories XML file: /home/data-integration/.kettle/repositories.xml
2016/07/08 12:54:57 - JOB_export_nielsen_info - Start of job execution
2016/07/08 12:54:57 - JOB_export_nielsen_info - Starting entry [export_nielsen_info]
2016/07/08 12:54:58 - Trans_export_nielsen_info - Dispatching started for transformation [Trans_export_nielsen_info]
2016/07/08 12:55:38 - 获取nielsen_info数据.0 - linenr 50000
2016/07/08 12:55:38 - Excel输出.0 - linenr 50000
2016/07/08 12:55:50 - Excel输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Error writing field (0,65536) : jxl.write.biff.RowsExceededException: The maximum number of rows permitted on a worksheet been exceeded
2016/07/08 12:55:50 - Excel输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : jxl.write.biff.RowsExceededException: The maximum number of rows permitted on a worksheet been exceeded
2016/07/08 12:55:50 - Excel输出.0 - at jxl.write.biff.WritableSheetImpl.getRowRecord(WritableSheetImpl.java:1214)
2016/07/08 12:55:50 - Excel输出.0 - at jxl.write.biff.WritableSheetImpl.addCell(WritableSheetImpl.java:1151)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeField(ExcelOutput.java:391)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeField(ExcelOutput.java:284)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.writeRowToFile(ExcelOutput.java:206)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.steps.exceloutput.ExcelOutput.processRow(ExcelOutput.java:147)
2016/07/08 12:55:50 - Excel输出.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/08 12:55:50 - Excel输出.0 - at java.lang.Thread.run(Thread.java:745)
2016/07/08 12:55:50 - 获取nielsen_info数据.0 - Finished reading query, closing connection.
2016/07/08 12:55:50 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 12:55:50 - 获取nielsen_info数据.0 - Finished processing (I=65642, O=0, R=0, W=65640, U=0, E=0)
2016/07/08 12:55:56 - Excel输出.0 - Finished processing (I=0, O=65535, R=65536, W=65535, U=0, E=1)
2016/07/08 12:55:56 - Trans_export_nielsen_info - Transformation detected one or more steps with errors.
2016/07/08 12:55:56 - Trans_export_nielsen_info - Transformation is killing the other steps!
2016/07/08 12:55:56 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 12:55:56 - JOB_export_nielsen_info - Finished job entry [export_nielsen_info] (result=[false])
2016/07/08 12:55:56 - JOB_export_nielsen_info - Job execution finished
2016/07/08 12:55:56 - Kitchen - Finished!
2016/07/08 12:55:56 - Kitchen - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Finished with errors
2016/07/08 12:55:56 - Kitchen - Start=2016/07/08 12:54:54.136, Stop=2016/07/08 12:55:56.806
2016/07/08 12:55:56 - Kitchen - Processing ended after 1 minutes and 2 seconds (62 seconds total).
经过网上查询说是2003excel版本每个sheet只能支持65536行。
之后在kettle输出控件选为microsoft excel
扩展名选为2007版本的;并且勾上Stream XSLX DATA 选项,支持流的形式写文件,不然大数量的时候写入不了excel,报内存溢出。
内存溢出GC啦!
[root@hz_gg57 dmb]# sh run_nielsen.sh
run_nielsen.sh: line 1: ?#!/bin/sh: No such file or directory
2016/07/08 13:23:16 - Kitchen - Start of run.
2016/07/08 13:23:16 - RepositoriesMeta - Reading repositories XML file: /home/data-integration/.kettle/repositories.xml
2016/07/08 13:23:29 - JOB_export_nielsen_info - Start of job execution
2016/07/08 13:23:29 - JOB_export_nielsen_info - Starting entry [export_nielsen_info]
2016/07/08 13:23:29 - Trans_export_nielsen_info - Dispatching started for transformation [Trans_export_nielsen_info]
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : UnexpectedError:
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : java.lang.OutOfMemoryError: GC overhead limit exceeded
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.resize(Saver.java:1695)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.preEmit(Saver.java:1300)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:1187)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.emitElement(Saver.java:959)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver.processElement(Saver.java:476)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:307)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Saver$TextSaver.saveToString(Saver.java:1859)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Cursor._xmlText(Cursor.java:546)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.store.Cursor.xmlText(Cursor.java:2436)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.xmlbeans.impl.values.XmlObjectBase.xmlText(XmlObjectBase.java:1489)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.model.SharedStringsTable.getKey(SharedStringsTable.java:130)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.model.SharedStringsTable.addEntry(SharedStringsTable.java:176)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:345)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:315)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.writeField(ExcelWriterStep.java:514)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.writeNextLine(ExcelWriterStep.java:300)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.steps.excelwriter.ExcelWriterStep.processRow(ExcelWriterStep.java:163)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - at java.lang.Thread.run(Thread.java:745)
child index = 2, logging object : org.pentaho.di.core.logging.LoggingObject@6ec0657a parent=3920a2a8-6de3-43f9-88b1-1214e926bfa3
2016/07/08 13:25:17 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 13:25:17 - Microsoft Excel 输出.0 - Finished processing (I=0, O=26908, R=26908, W=26907, U=0, E=1)
2016/07/08 13:25:17 - 获取nielsen_info数据.0 - Finished reading query, closing connection.
2016/07/08 13:25:17 - Trans_export_nielsen_info - Transformation detected one or more steps with errors.
2016/07/08 13:25:17 - Trans_export_nielsen_info - Transformation is killing the other steps!
2016/07/08 13:25:17 - 获取nielsen_info数据.0 - Finished processing (I=27012, O=0, R=0, W=27010, U=0, E=0)
2016/07/08 13:25:17 - Trans_export_nielsen_info - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Errors detected!
2016/07/08 13:25:17 - JOB_export_nielsen_info - Finished job entry [export_nielsen_info] (result=[false])
2016/07/08 13:25:17 - JOB_export_nielsen_info - Job execution finished
2016/07/08 13:25:17 - Kitchen - Finished!
2016/07/08 13:25:17 - Kitchen - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Finished with errors
2016/07/08 13:25:17 - Kitchen - Start=2016/07/08 13:23:16.423, Stop=2016/07/08 13:25:17.143
2016/07/08 13:25:17 - Kitchen - Processing ended after 2 minutes and 0 seconds (120 seconds total).
0 0
- kettle spoon excel
- kettle spoon
- Kettle Spoon
- kettle spoon pentaho
- Kettle Spoon入门教程
- kettle 启动spoon一闪而过
- ETL工具kettle spoon使用说明
- kettle spoon安装和部署
- java调用spoon kettle API
- Kettle Spoon初探-简单说
- kettle spoon配置资源库的数据库问题
- kettle excel
- KETTLE spoon 升级时会遇到的问题处理
- kettle安装,启动spoon之后一闪就没了问题
- kettle源码安装及Spoon启动界面修改
- Kettle安装后点击spoon.bat一闪而过处理
- Kettle Excel To Database
- Kettle Excel To JSON
- Android 豆瓣电影-网络缓存实现与分析
- github push rejected问题
- Ajax处理XML、HTML、JSON三种数据格式的方法
- java构造器的作用
- HTML 5 <div>属性理解
- kettle spoon excel
- 8.常用工具util 和 events.EventEmitter 事件
- java基础教程9:浏览java api
- RabbitMQ(七):适用于云计算集群的远程调用(RPC)
- Android 属性动画(Property Animation)一
- 网络视频播放Vitamio第三方
- Ubuntu 16.04 下安装HAProxy 1.5.11 做tcp负载均衡
- Oracle Sales Cloud简单介绍
- Catalan(卡特兰)数及定理的简要证明------附上简要代码