性能调优应用篇之from DB全量加载数据

来源：互联网发布：你也配做姓赵知乎编辑：程序博客网时间：2024/06/05 13:10

<strong>经常存在这样的需求，工程启动需要重数据库中全量load一次数据。数据量太大， 那么问题来了， 我们该如何快速的加载全量数据？</strong>

最直观的解决方式：order by id limit + offset。

public List<XX> getAllXX(int limit , int offset){                String sql = "select * from table limit ? offset ?";                return  jdbcTemplate.query(sql, new Object[]{limit, offset})            }

1.在第一种方式上内存优化一下，流式加载

public List<HotelLinkage> getSiteParamAndHasPriceByWrapper(String wrapper) {        List<HotelLinkage> result = Lists.newArrayList();        Connection conn = null;        PreparedStatement st = null;        ResultSet rs = null;        try {            conn = jdbcTemplate.getDataSource().getConnection();            conn.setAutoCommit(false);            String sql = StringUtils.replace(SQL_SELECT_SITEPARAM_AND_HASPRICE, "?", "'" + wrapper + "'", 1);            st = conn.prepareStatement(sql);            st.setFetchSize(batchSize);            rs = st.executeQuery();            while (rs.next()) {                HotelLinkage hotelLinkage = new HotelLinkage();                hotelLinkage.setSiteParam(rs.getString("site_param"));                hotelLinkage.setHasPrice(rs.getInt("has_price"));                result.add(hotelLinkage);            }            logger.info("查询得到需要merge的tree大小为{}", result.size());        } catch (SQLException e) {            throw new RuntimeException(e);        } finally {            Closer.close(rs);            Closer.close(st);            try {                if (conn != null) {                    conn.setAutoCommit(true);                    conn.close();                }            } catch (SQLException e) {                logger.warn("close connection failed", e);            }        }        return result;    }

尽管优化了内存，但limit offset的耗时操作任然让人难以忍受.特别是数据量越大越到后面加载速度慢得离谱。

原因解析：首先查看一下order by id limit offset 执行计划

hs=# explain select id from log_analyse_result_2  order by id limit 100 offset 100;                                                                 QUERY PLAN                                                                  --------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=349.22..698.45 rows=100 width=4)   ->  Index Scan using log_analyse_result_2_id_idx on log_analyse_result_2  (cost=0.00..25857831.93 rows=7404399 width=4)(2 rows)

2. 空间、时间上的再次优化. 根据id分段 + limit 分段加载全量数据，每页加载速度基本一致, 排除limit、offset加载时间随着加载数据的上升出现

线性增加的情况。

首先获得最大与最小ID:

String getMinAndMaxIdSql = "select coalesce(min(id),1000000000) as minId, coalesce(max(id),-1) as maxId from hotel_rank_seq_level";private int[] getMinAndMaxId(ECon con) throws SQLException {        List<String[]> minAndMaxIdString = con.query(getMinAndMaxIdSql);        int[] minAndMaxId = new int[2];        minAndMaxId[0] = Integer.parseInt(minAndMaxIdString.get(0)[0]);        minAndMaxId[1] = Integer.parseInt(minAndMaxIdString.get(0)[1]);        return minAndMaxId;    }

hs=# explain select id from log_analyse_result_2 where id > 1000 order by id limit 10;                                                                 QUERY PLAN                                                                  --------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=0.00..34.95 rows=10 width=4)   ->  Index Scan using analyse_result_2_id_idx on log_analyse_result_2  (cost=0.00..25872792.32 rows=7403382 width=4)         Index Cond: (id > 1000)(3 rows)

3.第三种解决方式：发布之时生成字典文件。反序列化

0 0

性能调优应用篇之from DB全量加载 数据

性能调优应用篇之from DB全量加载数据