java 快速读取大容量数据库的方法
来源:互联网 发布:js重载div 编辑:程序博客网 时间:2024/04/30 15:25
转自:http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/
Reading row-by-row into Java from MySQL
Trying to read a large amount of data from MySQL using Java using one query is not as easy as one might think.
I want to read the results of the query a chunk at a time. If I readit all at once, the JVM understandably runs out of memory. In this caseI am stuffing all the resulting data into a Lucene index, but the same would apply if I was writing the data out to a file, another database, etc.
Naively, I assumed that this would just work by default. My initialprogram looked like this (I’ve left out certain things such as closingthe PreparedStatement):
public void processBigTable() {
PreparedStatement stat = connection.prepareStatement(
"SELECT * FROM big_table");
ResultSet results = stat.executeQuery();
while (results.next()) { ... }
}
Failed with the following error:
Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2823)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763)
...
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1657)
...
The line it failed at was the exceuteQuery. So as we can see fromthe stack backtrace, it’s clearly trying to load all the results intomemory simultaneously.
I tried all sorts of things but it was only after I took at theMySQL JDBC driver code did I find the answer. In StatementImpl.java:
protected boolean createStreamingResultSet() {
return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
&& (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
&& (fetchSize == Integer.MIN_VALUE));
}
This boolean function determines if it’s going to use the approach“read all data first” or “read rows a few at a time” (= “streaming” intheir terminology). I clearly need the latter.
You can specify, using the generic JDBC API, the number of rows youwant to fetch at once (the “fetchSize”). Why would you have to set thatto Integer.MIN_VALUE, which is stated to be −231, in order to get streaming data? I wouldn’t have guessed that.
Basically this very important decision about which approach to use,which in my case amounts to “program works” or “program crashes”, isleft to test whether three variables are set to various values. I amnot aware if this is in the documentation (I didn’t find it), nor ifthis decision is guaranteed to be stable, i.e. won’t change in somefuture driver version.
Now my code looks like the following:
public void processBigTable() {
PreparedStatement stat = c.prepareStatement(
"SELECT * FROM big_table",
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
stat.setFetchSize(Integer.MIN_VALUE);
ResultSet results = stat.executeQuery();
while (results.next()) { ... }
}
This code works, and reads chunks of rows at a time.
Well I’m not sure if it reads chunks of rows at a time, or just onerow at a time. I hope it doesn’t read one row at a time, because thatwould be very inefficient in terms of number of round trips from thesoftware to the database. I assumed this was what the fetchSizeparameter was controlling, so you could tune the size of the chunks tomeet your particular latency and memory setup. But being forced to setit to a large negative number in order to get it to work means one hasno control over the size of the chunks (as far as I can see).
(I am using Java 6 with MySQL 5.0 and the JDBC driver “MySQL Connector” 5.1.15.)
- java 快速读取大容量数据库的方法
- java 快速读取大容量数据库的方法 Java heap space
- SQL SERVER 大容量数据库的备份方法
- Java中遍历大容量map的正确方法
- Java中遍历大容量map的正确方法
- Python读取大容量的csv文件
- VC++实现批量大容量快速插入SqlServer数据库
- 数据库大容量字段的插入
- 大容量数据库服务器
- PB9读写大容量文件的方法
- 【经验】数据库容量性能测试-大数据量生成方法
- 对大容量的数据库操作的一些思路
- 合理有效的分解数据库大容量字段
- Altibase新数据库管理信息系统上市 可快速处理大容量数据
- 删除sql server 大容量日志的方法
- 删除SQL Server 大容量日志的方法
- 统计大容量文本里面单词的函数(方法)
- 大容量数据包的高效并行安全加解密方法
- [转]实战体会Java多线程编程精要
- WebBrowserDepthApi 一种WebBrowser与Script交互的接口
- xml文件的读取---根据XMl文件的ID读取对应的内容
- 转: vb调用webservice 另一方法
- IFrame的强制刷新
- java 快速读取大容量数据库的方法
- 3DGIS第三章 约束条件下二次误差度量简化方法
- 3DGIS第四章 视点相关的实时选择性细化
- 权限管理的好帮手,SU53+F5
- 3DGIS第五章 基于GPU的优化处理
- 非常有用的用户维护信息系统
- 匈牙利算法求二分图的最大匹配寻找增广路的几种思路(转)
- 我的 谷雅网 进入 google 搜索引擎了
- 3DGIS第六章 北京奥林匹克公园三维场景网络发布应用