不要使用TRANSIENT的常规方式来缓存WP_QUERY

来源:互联网 发布:足球淘汰赛算法 编辑:程序博客网 时间:2024/05/19 11:47

WP_Query is one of the most complex classes in the WordPress codebase. It’s extremely powerful and flexible, but that flexibility often results in slower database queries, especially when working with metadata. To speed things up, WordPress developers tend to cache the results of such queries, but there are a few pitfalls you should be aware of.

Caching Queries

The Transients API is likely the top choice for caching WP_Query results, but what we often see is developers storing thewhole WP_Query object in a transient, and we believethat’s a mistake. Translating that into code, it looks something like this:

$cache_key = 'my-expensive-query';if ( ! $query = get_transient( $cache_key ) ) {    $query = new WP_Query( ... ); // some expensive query;    set_transient( $cache_key, $query, 24 * HOUR_IN_SECONDS );}while ( $query->have_posts() ) {    $query->the_post();    // ...}

Here you can see that we’re passing the $query object directly toset_transient(), so whenever we get a cache hit, we’ll have our query object available, along with all the usefulWP_Query properties and methods.

This is a bad idea, and while this works (or at least seems to work), you’ll want to know what’s happening behind the scenes when you callset_transient() in this particular case.

Serialize/Unseriazile

By default, transients in WordPress translate into the Options API. If you’re familiar with how options work internally, you’ll know that thevalues are serialized before hitting the database, and unseriaziled when retrieved. This is also true for most persistent object caching dropins, including Memcached and Redis.

As an example, just look at what happens when we serialize a small object in PHP:

$object = new stdClass;$object->foo = 1;$object->bar = 2;var_dump( maybe_serialize( $object ) );// string(47) "O:8:"stdClass":2:{s:3:"foo";i:1;s:3:"bar";i:2;}"

This allows us to store the object, along with all its properties, as a string, which works well in a MySQL table, in a Redis database, etc. When deserializing (or unserializing) such a string, the result is an identical copy of the object we previously had. This is great, but let’s consider a more complex object:

class A {}class B {    public $baz = 3;}class C {    public $qux = 4;}$object = new A;$object->foo = new B;$object->bar = new C;var_dump( maybe_serialize( $object ) );// string(84) "O:1:"A":2:{s:3:"foo";O:1:"B":1:{s:3:"baz";i:3;}s:3:"bar";O:1:"C":1:{s:3:"qux";i:4;}}"

This illustrates that PHP’s serialize() function will recursively serialize any object referenced by a property of another object.

Serializing WP_Query

Let’s try and put this in a WP_Query context by running a simple query and serializing it:

$query = new WP_Query( array(    'post_type' => 'post',    'post_status' => 'publish',    'posts_per_page' => 10,) );var_dump( maybe_serialize( $query ) );// string(22183) "O:8:"WP_Query":50:{s:5:"query";a:3:{s:9:"post_type";s:4:"post";s:11:"post_status";s:7:"publish";s:14:"posts_per_page";i:10;}s:10:"query_vars";a:65:{s:9:"post_type";s:4:"post";s:11:"post_status";s:7:"publish";s:14:"posts_per_page"; ... (about 22000 more characters)

The first thing you’ll notice is that the output is extremely long. Indeed, we’re serializing every property of ourWP_Query object, including all query variables, parsed query variables, the loop status and current position, all conditional states, a bunch ofWP_Post objects we retrieved, as well as any additional referenced objects.

Referenced objects? Let’s take a look at the WP_Query constructor:

public function __construct( $query = '' ) {    $this->db = $GLOBALS['wpdb'];    // ...

Now let’s take a closer look at our gigantic serialized string:

... s:5:"*db";O:4:"wpdb":62:{s:11:"show_errors";b:1;s:15:"suppress_errors";b:0; ...

Whoops! But that’s not all. That wpdb object we’re storing as a string in our database will contain ourdatabase credentials, all other database settings, as well as the full list of SQL queries along with their timing and stacktraces ifSAVEQUERIES was turned on.

The same is true for other referenced objects, such as WP_Meta_Query,WP_Tax_Query, WP_Date_Query, etc. Our goal was to speed that query up, and while we did, we introduced a lot of unnecessary overhead serializing and deserializing complex objects, as well as leaked potentially sensitive information.

But the overhead does not stop there.

Metadata, Terms, Posts & the Object Cache

Okay so now we have a huge serialized string containing the posts that we wanted to cache, along with a bunch of unnecessary data. What happens when we deserialize that string back to aWP_Query object? Well, nothing really…

When deserializing strings into objects, PHP does not run the constructor method (thankfully), but instead runs__wakeup() if it exists. It doesn’t exist in WP_Query, so that’s what happens —nothing, except of course populating all our properties with all those values from the serialized string, restoring nested objects, and objects nested inside those objects. It should be pretty fast, hopefully much faster than running our initial SQL query.

And after we’re done deserializing, even though at that point the WP_Query object is a bit crippled (serialize can’t store resource types, such asmysqli objects), we can still use it:

while ( $query->have_posts() ) {    $query->the_post();    the_title();}

Which doesn’t cause any additional queries against the wp_posts table, since we already have all the necessary data in the$query->posts array. Until we do something like this:

while ( $query->have_posts() ) {    $query->the_post();    the_title();    get_post_meta( get_the_ID(), 'key', true );}

And this is where things go south.

The Object Cache

When running a regular WP_Query, the whole process (by default) takes care of retrieving the metadata and terms data for all the posts that match our query, and storing all that in the object cache for the request. That happens in theget_posts() method of our object (_prime_post_caches()). But when re-creating theWP_Query object from a string, the method never runs, and so our term and meta caches are never primed.

For that reason, when running get_post_meta() inside our loop, we’ll see aseparate SQL query to fetch the metadata for that particular post. And this happens for every post. Separately. Which means that for 10 “cached” posts, we’re looking at 10 additional queries. Sure, they’re pretty fast, but still.

Now let’s add something like the_tags() to the same loop, and voila! We haveanother ten SQL queries to grab the terms now.

And finally… This is the best part. Let’s add something often done by a typical plugin that alters the post content or title in any way:

add_filter( 'the_title', function( $title ) {    $post = get_post( get_the_ID() );    // do something with $post->post_title and $title    return $title;} );

Now we’ll see an additional ten database queries for the posts. How did that happen? Didn’t we have those posts cached?

Yes we did, but we had them in our $query->posts array, and get_post() doesn’t know or care about any queries, it simply fetches data from the WordPress object cache, and it wasWP_Query‘s job to prime those caches with the data, which it failed to do upon deserializing. Tough luck.

So ultimately, by caching our WP_Query object in a transient, we went from four database queries (found rows, posts, metadata and terms) to only two (transient timeout and transient value) and an additionalthirty queries (posts_per_page * 3) if we want to use metadata, terms or anything that callsget_post().

To be fair, those thirty queries are likely much faster than our initial posts query because they’re lookups by primary key, but each one is still a round-trip to the (possibly remote) MySQL server. Sure, you can probably hack your way around it with_prime_post_caches(), but we don’t recommend that.

The Alternatives

Now that we have covered why you shouldn’t cache WP_Query objects, let’s look at a couple of better ways to cache those slow lookups.

The first, easiest and probably best method is to cache the complete HTML output, and PHP’s output buffering functions will help us implement that without moving too much code around:

$cache_key = 'my-expensive-query';if ( ! $html = get_transient( $cache_key ) ) {    $query = new WP_Query( ... );    ob_start();    while ( $query->have_posts() ) {        $query->the_post();        // output all the things    }    $html = ob_get_clean();    set_transient( $cache_key, $html, 24 * HOUR_IN_SECONDS );}echo $html;

This way we’re only storing the actual output in our transient, no posts, no metadata, no terms, and most importantly no database passwords. Just the HTML.

If your HTML string is very (very!) long, you may also consider compressing it withgzcompress() and storing it as a base64 encoded string in your database, which is especially efficient if you’re working with memory-based storage, such asRedis or Memcached. The compute overhead to compress/uncompress is very close to zero.

The second method is to cache post IDs from the expensive query, and later perform lookups by those cached IDs which will be extremely fast. Here’s a simple snippet to illustrate the point:

$cache_key = 'my-expensive-query';if ( ! $ids = get_transient( $cache_key ) ) {    $query = new WP_Query( array(        'fields' => 'ids',        // ...    ) );    $ids = $query->posts;    set_transient( $cache_key, $ids, 24 * HOUR_IN_SECONDS );}$query = new WP_Query( array(    'post__in' => $ids,) );// while ( $query->have_posts() ) ...

Here we have two queries. The first query is the slow one, where we can fetch posts by meta values, etc. Note that we askWP_Query to retrieve IDs only for that query, and later do a very fast lookup using thepost__in argument. The expensive query runs only if we don’t already have an array of IDs in our transient.

This method is a bit less efficient than caching the entire HTML output, since we’re (probably) still querying the database. But the flexibility is sometimes necessary, especially when you’d like to cache the query for much longer, but have other unrelated things that may impact your output, such as a shortcode inside the post content.

Profile

Caching is a great way to speed things up, but you have to know exactly what you’re caching, when, where and how, otherwise you risk facing unexpected consequences. If you’re uncertain whether something is working as intended, always turn toprofiling — look at each query against the database, look at all PHP function calls, watch for timing and memory usage.

0 0