Guide to Drupal cache

来源:互联网 发布:微信网络答题系统 编辑:程序博客网 时间:2024/05/22 00:15

Building complicated, dynamic content in Drupal is easy, but itcan come at a price. A lot of the stuff that makes a Web 2.0 siteso cool can spell 'performance nightmare' under heavy load,thrashing the database to perform complex queries and expensivecalculations every time a user looks at a node or loads aparticular page.

One solution is to turn on page caching on Drupal's performanceoptions administration page. That speeds things up for anonymoususers by caching the output of each page, greatly reducing thenumber of DB queries needed when they hit the site. That doesn'thelp with logged in users, however: because page level caching isan all-or-nothing affair, it only works for the standardized,always-the-same view that anonymous users see when they arrive.

Eventually there comes a time when you have to dig in to yourcode, identify the database access hot spots, and add cachingyourself. Fortunately, Drupal's built-in caching APIs and somesimple guidelines can make that task easy.

The basics

The first rule of optimization and caching is this: never dosomething time consuming twice if you can hold onto the results andre-use them. Let's look at a simple example of that principle inaction:

<?php
function my_module_function($reset=FALSE) {
  static
$my_data;
  if (!isset(
$my_data)||$reset) {
   
// Do your expensive calculationshere, and populate $my_data
    // with thecorrect stuff..
 
}
  return
$my_data;
}
?>

The important part to look at in this function is the staticvariable named $my_data. Static variables start out empty the firsttime a function is called, but they keep the data they're populatedwith even when the function is called again. That means that we cancheck if the variable is already populated, and if so return itimmediately without doing any more work.

This pattern appears all over the place in Drupal -- includingkey functions like node_load(). Calling node_load() for aparticular node ID requires database hits the first time, but theresulting information is kept in a static variable for the durationof the page load. That way, displaying a node once in a list, asecond time in a block, and a third time in a list of related links(for example) doesn't require three full trips to the database.

Another important feature is the use of the $reset variable.Caching is good, but occasionally you want to be sure you'regetting theabsolute freshest data available. Using a'reset' variable in your function, and always performing the'expensive' version of the function if it's set to TRUE, lets youbypass caching when you really need to.

Drupal's cache functions

You might notice that the static variable technique only storesdata for the duration of a single page load. For even betterperformance, it's often possible to cache data in a more permanentfashion...

<?php
function my_module_function($reset=FALSE) {
  static
$my_data;
  if (!isset(
$my_data)||$reset) {
    if(!
$reset &&($cache =cache_get('my_module_data')) &&!empty($cache->data)){
     
$my_data = unserialize($cache->data);
    }
    else {
     
// Do your expensive calculationshere, and populate $my_data
     // with the correct stuff..
     
cache_set('my_module_data','cache', serialize($my_data));
    }
  }
  return
$my_data;
}
?>

This version of the function still uses the static variable, butit adds another layer: database caching. Drupal's APIs providethree key functions you'll need to be familiar with:cache_get(),cache_set(),andcache_clear_all().Let's look at how they're used.

After the initial check of the static variable, this functionchecks Drupal's cache for data stored with a particular key. If itfinds it, and the $cache->data element isn't empty,it unserializes the stored data and sticks it into the $my_datavariable.

If no cached version is found (or if we called the functionusing the $reset parameter), the function does the actual work ofgenerating the data. Then it serializes it, and save it TO thecache so future requests will find it. The key that you pass in asthe first parameter can by anything you choose, though it'simportant to avoid colliding with any other modules' keys. Startingthe key with the name of your module is always a good idea.

The end result? A slick little function that saves time wheneverit can -- first checking for an in-memory copy of the data, thenchecking the cache, and finally calculating it from scratch ifnecessary. You'll see this pattern a lot if you dig into the gutsof data-intensive Drupal modules.

Keeping up to date

What happens, though, if the data that you've cached becomesoutdated and needs to be recalculated? By default, cachedinformation stays around until some module explicitly calls thecache_clear_all() function, emptying out your record. If your datais updated sporadically, you might consider simply callingcache_clear_all('my_module_data', 'cache') each time you save thechanges to it. If you're caching quite a few pieces of data(perhaps versions of a particular block for each role on the site),there's a third 'wildcard' parameter:

<?php
cache_clear_all
('my_module','cache', TRUE);
?>

This clears out all the cache values whose keys start with'my_module'.

If you don't need your cached data to be perfectlyup-to-the-second, but you want to keep it reasonably fresh, you canalso pass in an expiration date to the cache_set() function. Forexample:

<?php
cache_set
('my_module_data','cache', serialize($my_data),time()+ 360);
?>

The final parameter is a unix timestamp value representing the'expiration date' of the cache data. The easiest way to calculateit is to use the time() function, and add the data's desiredlifetime in seconds. Expired entries will be automaticallydiscarded as they pass that date.

Advanced caching

You might have noticed that cache_set()'s second parameter is'cache' -- the name of the table that stores the default cachedata. If you're storing large amounts of data in the cache, you canset up your own dedicated cache table and pass its name into thefunction. That will help keep your cache lookups speedy no matterwhat other modules are sticking into their own tables. The Viewsmodule uses that technique to maintain full control over when itscache data is cleared.

If you're really hoping to squeeze the most out of your server,Drupal also supports the use of alternative caching systems. Bychanging a single line in your site's settings.php file, you canpoint it to different implementations of the standard cache_set(),cache_get(), and cache_clear_all() functions. File-basedcaching, integration with the open source memcached project, andother approaches are all possible. As long as you've used thestandard Drupal caching functions, your module's code won't have tobe altered.

A few caveats

Like all good things, it's possible to overdo it with caching.Sometimes, it just doesn't make sense -- if you're looking up asingle record from a table, saving the result to a database cacheis silly. Using theDevel module is a good way tospot the functions where caching will pay off: it can log thequeries that are used on your site and highlight the ones that areslow, or the ones that are repeated numerous times on eachpage.

Other times, the data you're using will just be a bad fit forthe standard caching system. If you need to join cached data in SQLqueries, for example, cache_set()'s practice of string data as aserialized string will be a problem. In those cases, you'll need tocome up with a solution that's specific to your module. VotingAPImaintains one table full of individual votes and another table fullof calculated results (averages, sums, etc.) for quick joining whensorting and filtering nodes.

Finally, it's important to remember that the cache is notlong term storage! Since other modules can call cache_clear_all()and wipe it out, you should never put something into it if youcan't recalculate it again using the original source data.

Go west, young Drupaler!

Congratulations: you now have a powerful set of tools to speedup your code! Go forth, and optimize.

原创粉丝点击