Paging in J2EE: Manage Large Result Sets Efficiently

来源:互联网 发布:腾讯微博刷粉丝软件 编辑:程序博客网 时间:2024/04/19 20:12
Pagination is the simplest and most common way to break up large amounts of data into more manageable chunks. Pagination is a key part of Web site design, both from the UI perspective (to manage limited screen real estate) and from the server perspective (to process large result sets efficiently, without causing resource spikes or servicing delays). In J2EE, efficient pagination (and good UI design) are essential for handling large query result sets in a resource-pooled environment.



Specifically, the following conditions make pagination necessary:
  • Screen real estate is limited. Your JSP/client UI may have insufficient space to display the entire results at once and your user may not be able to find what he or she wants quickly in a very large data set.
  • Your resources are pooled or constrained. Result sets are too large to be managed reasonably within your connection and memory limitations.

Paging Strategies That Work
Basic querying and paging strategies can start out simple but quickly lead to disaster if you fail to consider query evolution and data growth. Open-ended or unbounded queries that perform well over small test data sets may degrade sharply against real data. If you don't manage your precious DBMS and memory resources, your problems will only compound under load.

Efficient data retrieval and paging is a trade-off between memory usage and response-time requirements. Broadly speaking, most roll-your-own J2EE paging mechanisms fall into two basic categories:

  • Cache based: Results are cached for fast access in subsequent pages.
  • Query based: Results are fetched from the DBMS on demand as the user pages.

Cache-based approaches cache the query results over and above what the database provides, in an intermediate tier (on the HTTP session, in the server using a Stateful Session Bean or in a home-grown cache). Subsequent page requests hit the cache rather than the database. Such cache-based mechanisms are memory intensive and work well for low-volumes, recurring queries, and small to medium-sized result sets. The cache-hit rates are high and the I/O overheads are low, yielding good response times.

Query-based approaches don't use caches. They deliver pages on demand, direct from the DBMS to the client. They service additional pages by issuing more queries to the DBMS rather than pulling data from a cache. These approaches discard or skip unwanted rows from previous or future pages using low-level JDBC functions and context information from the last fetched page. Query-centric paging is memory efficient but requires additional round trips to the database, leading to slightly slower response times overall.

More sophisticated mechanisms adopt a hybrid approach. They constrain cache sizes by caching a fixed number of pages ahead, maintaining bounded cache windows, and using efficient query mechanisms and offline threads to backfill the results transparently to the client.

Caching vs. Query Trade-offs
Cache-based approaches are fast, but they have a few pitfalls. For example, unbounded or open-ended queries for large result sets can lead to uncontrolled memory growth as the results are serialised and cached. Large result sets take a long time to service and take up valuable connection and session resources. Whenever a connection is consumed for long periods of time, the number of connections available in your J2EE shared connection pool is effectively reduced. To cope with high load, your system should aim to service your requests as quickly as possible. If your requests take a long time to process, your overall throughput will drop. You don't want to occupy your resources with long-lived queries, because you won't have any connections and threads left to do anything else.

Long-lived queries can also lead to connection timeouts if your query takes more time to execute than your preset JDBC connection time limit allows. Avoid this in cache-based strategies by narrowing or constraining your search criteria so that your DBMS queries (and results) are bounded.

Caching is most beneficial when the cached data is hit repeatedly. However, if your user searches are consistently yielding distinct results, your caches can't be reused across requests. Worse, if the user rarely pages beyond the first page, your caching within the session is for naught. You should carefully manage caches so that they don't grow too large and they expire promptly when they're no longer required.

Hybrid or query-based approaches are more reliable than cache-based alternatives. They have slightly slower response times but can scale transparently for large result sets and high request volumes. An unbounded query will take a long time to execute but won't exceed your memory constraints or take too long to execute. The DBMS connection is relinquished once the required data (usually the first one or two pages) has been fetched. This means that the connection is held for the bare minimum before being returned to the connection pool. The only downside is that additional context information must be passed back and forth on your page requests.

Table 1 lists the issues you need to consider when deciding on a paging approach.

Conditions for PaginationCache-based ApproachHybrid/Query-based ApproachIs your result data bounded or unbounded?Applicable for fixed or bounded result sizes; search criteria is fixed or constrainedApplicable for unbounded results sets (such as searches with loose criteria) since the implementation inherently limits result sizeHow big are your result sets?Good for small to medium-sizedGood for medium-sized to largeWhat are your response-time requirements?When response time is criticalWhen response time is less criticalHow often is the query issued? Do you fetch the same data repeatedly, or is your result data different for each request?Applicable for when the same data is requested repeatedlyApplicable for when the search criteria and results vary from request to requestHow volatile is your (static vs. dynamic)? What are your requirements?Unchanging or static data can be serviced reliably from cachesIf data is dynamic, query every time to return the most up-to-date data

Table 1. Cache-based vs. Hybrid/Query-based Paging Strategies

 

原创粉丝点击