java 快速读取大容量数据库的方法

来源：互联网发布：js重载div 编辑：程序博客网时间：2024/04/30 15:25

转自：http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/

Reading row-by-row into Java from MySQL

Trying to read a large amount of data from MySQL using Java using one query is not as easy as one might think.

I want to read the results of the query a chunk at a time. If I readit all at once, the JVM understandably runs out of memory. In this caseI am stuffing all the resulting data into a Lucene index, but the same would apply if I was writing the data out to a file, another database, etc.

Naively, I assumed that this would just work by default. My initialprogram looked like this (I’ve left out certain things such as closingthe PreparedStatement):

public void processBigTable() {
    PreparedStatement stat = connection.prepareStatement(
        "SELECT * FROM big_table");
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

Failed with the following error:

Exception in thread "main"
        java.lang.OutOfMemoryError: Java heap space
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2823)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2763)
    ...
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1657)
    ...

The line it failed at was the exceuteQuery. So as we can see fromthe stack backtrace, it’s clearly trying to load all the results intomemory simultaneously.

I tried all sorts of things but it was only after I took at theMySQL JDBC driver code did I find the answer. In StatementImpl.java:

protected boolean createStreamingResultSet() {
    return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
        && (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
        && (fetchSize == Integer.MIN_VALUE));
}

This boolean function determines if it’s going to use the approach“read all data first” or “read rows a few at a time” (= “streaming” intheir terminology). I clearly need the latter.

You can specify, using the generic JDBC API, the number of rows youwant to fetch at once (the “fetchSize”). Why would you have to set thatto Integer.MIN_VALUE, which is stated to be −2³¹, in order to get streaming data? I wouldn’t have guessed that.

Basically this very important decision about which approach to use,which in my case amounts to “program works” or “program crashes”, isleft to test whether three variables are set to various values. I amnot aware if this is in the documentation (I didn’t find it), nor ifthis decision is guaranteed to be stable, i.e. won’t change in somefuture driver version.

Now my code looks like the following:

public void processBigTable() {
    PreparedStatement stat = c.prepareStatement(
        "SELECT * FROM big_table",
        ResultSet.TYPE_FORWARD_ONLY,
        ResultSet.CONCUR_READ_ONLY);
    stat.setFetchSize(Integer.MIN_VALUE);
    ResultSet results = stat.executeQuery();
    while (results.next()) { ... }
}

This code works, and reads chunks of rows at a time.

Well I’m not sure if it reads chunks of rows at a time, or just onerow at a time. I hope it doesn’t read one row at a time, because thatwould be very inefficient in terms of number of round trips from thesoftware to the database. I assumed this was what the fetchSizeparameter was controlling, so you could tune the size of the chunks tomeet your particular latency and memory setup. But being forced to setit to a large negative number in order to get it to work means one hasno control over the size of the chunks (as far as I can see).

(I am using Java 6 with MySQL 5.0 and the JDBC driver “MySQL Connector” 5.1.15.)