Building Faster APIs with NodeJs and Redis

来源：互联网发布：新西兰博士含金量知乎编辑：程序博客网时间：2024/05/01 00:52

https://coligo.io/nodejs-api-redis-cache/

In this tutorial we will be building an API to compute the total number of stars a GitHub user has across all their public repositories. We'll be using:

The GitHub API to get information about all the user's repositories
NodeJs to handle the HTTP requests and compute the total stars
redis as a caching layer to speed things up!

For those of you who are not familiar with redis, it is an in-memory data structures store which can be used as a cache, database, or message broker.

One of the major strong points of redis is it's incredible performance, one of the reasons being that it holds the entire dataset in memory. For the purposes of this tutorial, we'll be focusing on using redis as a cache in the following manner:

NodeJs API Server with Redis as a cache

To put the above diagram in words:

A user makes a call to our API endpoint
Node check's whether the user's computed stars already exists in our redis cache
- if it does, we return it from the cache immediately
- if not, we call the GitHub API, compute the total stars, put it in our cache with an expiry of 1 minute, and return the result to the user

You can download the code for this tutorial from the GitHub repository or fire up your favorite text editor and follow along!

Setting Up Our Project

We'll start off by creating an empty directory for our project and cd-ing into it:

mkdir node-redis-cache && cd node-redis-cache

Now let's initialize a package.json and install our dependencies via NPM:

npm init -ynpm install --save express response-time redis axios

Let's go over what we'll be needing each of these dependencies for:

express will handle the routing for our API
response-time is a middleware to return the response time for requests in a X-Response-Time header. We will be using this to inspect how much of a speed up redis actually provides
redis is a redis client for NodeJs with a convenient API
axios will be used to make HTTP requests to the GitHub API using promises

Now that we have all our Node dependencies installed, let's go ahead and install redis. If you're on OSX you can install redis using Homebrew:

brew update && brew install redis

If you'd rather install redis manually, I'd encourage you to have a look at the official redis guide to set up redis on your OSX or Linux machine.

Developing the API Server

Let's create the basic structure for our app.js file in the root of our project directory which will house the logic for our API:

// require the dependencies we installedvar app = require('express')();var responseTime = require('response-time')var axios = require('axios');var redis = require('redis');// create a new redis client and connect to our local redis instancevar client = redis.createClient();// if an error occurs, print it to the consoleclient.on('error', function (err) {    console.log("Error " + err);});app.set('port', (process.env.PORT || 5000));// set up the response-time middlewareapp.use(responseTime());// if a user visits /api/facebook, return the total number of stars 'facebook'// has across all it's public repositories on GitHubapp.get('/api/:username', function(req, res) {});app.listen(app.get('port'), function(){  console.log('Server listening on port: ', app.get('port'));});

Now that we have the basic structure for our server in place, we can implement the functions that will call the GitHub API to fetch the information about a specific user's repositories and sum up the number of stars across all the repos to get a total count:

// call the GitHub API to fetch information about the user's repositoriesfunction getUserRepositories(user) {  var githubEndpoint = 'https://api.github.com/users/' + user + '/repos' + '?per_page=100';  return axios.get(githubEndpoint);}// add up all the stars and return the total number of stars across all repositoriesfunction computeTotalStars(repositories) {  return repositories.data.reduce(function(prev, curr) {    return prev + curr.stargazers_count  }, 0);}

Note: the GitHub API returns 30 items by default so we are using the ?per_page parameter to tell it to return 100 items (which is the limit). Although there could be the chance that a user has more than 100 repos, we won't worry too much about pagination for the sake of this tutorial as we can illustrate the main point with any number of repos.

Let's get to the interesting part: handling API requests. When a user visits our API endpoint and provides ausername as a URL parameter (eg: http://localhost:5000/api/coligo-io), we want to:

capture the username parameter, in this case: coligo-io
check if our redis cache contains a key equal to coligo-io
- if it does, return the value associated to that key. (The value is the total number of stars that the user coligo-io has across all it's repos. eg: "coligo-io":"150")
- if the key does not exist, compute the total stars using getUserRepositories() and computeTotalStars(), store the result in redis as "coligo-io":total_count with a 1 minute expiry, and return the result to the user

Let's put the steps above into code:

// if a user visits /api/facebook, return the total number of stars 'facebook'// has across all it's public repositories on GitHubapp.get('/api/:username', function(req, res) {  // get the username parameter in the URL  // i.e.: username = "coligo-io" in http://localhost:5000/api/coligo-io  var username = req.params.username;  // use the redis client to get the total number of stars associated to that  // username from our redis cache  client.get(username, function(error, result) {      if (result) {        // the result exists in our cache - return it to our user immediately        res.send({ "totalStars": result, "source": "redis cache" });      } else {        // we couldn't find the key "coligo-io" in our cache, so get it        // from the GitHub API        getUserRepositories(username)          .then(computeTotalStars)          .then(function(totalStars) {            // store the key-value pair (username:totalStars) in our cache            // with an expiry of 1 minute (60s)            client.setex(username, 60, totalStars);            // return the result to the user            res.send({ "totalStars": totalStars, "source": "GitHub API" });          }).catch(function(response) {            if (response.status === 404){              res.send('The GitHub username could not be found. Try "coligo-io" as an example!');            } else {              res.send(response);            }          });      }  });});

There are some points worth going over from the snippet above:

Storing the data in redis

Redis is different from plain key-value stores in that the value can hold more complex data structures as opposed to a traditional key-value store where you can only associate a string key to a string value. Instead of just as a plain string value, redis can have any of the following:

Binary-safe strings which can be up to 512MB in size
Lists which are a collection of strings
Sets (sorted and unsorted)
Hashes
Bit arrays and HyperLogLogs

I wont go over each of the above types and how to use them in redis. If you're interested in learning more about these types you can have a look at the redis data types topic.

For our purposes, we will be using a binary-safe string as our value. Assume a user requests the total number of stars for all the repositories that belong to the user coligo-io, then redis would store it as follows:

"coligo-io":"150"

where 150 is just an example of how many stars that user has across all their repositories.

Setting a key-value pair and expiry with `setex`

Let's quickly go over what the setex function is doing. If we try to access a key that does not exist, we want to fetch the total stars for that user from the GitHub API and set it in our redis cache with an expiry of 1 minute. This means that once we store the key-value pair in our redis cache, it will live for 1 minute and can be retrieved during that time. However, after the minute is over, the key-value pair will automatically be removed and if we try to access it, we will get null.

The setex function takes 3 parameters and an optional callback (setex(key, seconds, value)):

key (the unique GitHub username)
number of seconds before the key-value pair is removed from the cache
value (the total number of stars for that user)

It's worth noting that this operation is atomic and is the same as running the set and expire command in an atomic manner using a MULTI/EXEC block.

The choice of expiry time depends on your application's needs and the nature of your data. For our purposes, we can afford to have the total star count be off by a few stars for a minute as it's not crucial information and it's also unlikely that a repository is being starred a ton of times in the span of one minute.

In general, you would assess what data you can and can't cache as well as how critical the correctness of that data is to be able to come up with an expiry time that suits your application.

Here's what the final code for our app.js looks like:

// require the dependencies we installedvar app = require('express')();var responseTime = require('response-time')var axios = require('axios');var redis = require('redis');// create a new redis client and connect to our local redis instancevar client = redis.createClient();// if an error occurs, print it to the consoleclient.on('error', function (err) {    console.log("Error " + err);});app.set('port', (process.env.PORT || 5000));// set up the response-time middlewareapp.use(responseTime());// call the GitHub API to fetch information about the user's repositoriesfunction getUserRepositories(user) {  var githubEndpoint = 'https://api.github.com/users/' + user + '/repos' + '?per_page=100';  return axios.get(githubEndpoint);}// add up all the stars and return the total number of stars across all repositoriesfunction computeTotalStars(repositories) {  return repositories.data.reduce(function(prev, curr) {    return prev + curr.stargazers_count  }, 0);}// if a user visits /api/facebook, return the total number of stars 'facebook'// has across all it's public repositories on GitHubapp.get('/api/:username', function(req, res) {  // get the username parameter in the URL  // i.e.: username = "coligo-io" in http://localhost:5000/api/coligo-io  var username = req.params.username;  // use the redis client to get the total number of stars associated to that  // username from our redis cache  client.get(username, function(error, result) {      if (result) {        // the result exists in our cache - return it to our user immediately        res.send({ "totalStars": result, "source": "redis cache" });      } else {        // we couldn't find the key "coligo-io" in our cache, so get it        // from the GitHub API        getUserRepositories(username)          .then(computeTotalStars)          .then(function(totalStars) {            // store the key-value pair (username:totalStars) in our cache            // with an expiry of 1 minute (60s)            client.setex(username, 60, totalStars);            // return the result to the user            res.send({ "totalStars": totalStars, "source": "GitHub API" });          }).catch(function(response) {            if (response.status === 404){              res.send('The GitHub username could not be found. Try "coligo-io" as an example!');            } else {              res.send(response);            }          });      }  });});app.listen(app.get('port'), function(){  console.log('Server listening on port: ', app.get('port'));});

You can launch a separate terminal window and start the redis server using the following command:

redis-server

Once the redis server is up and running, you can start the API server we just built:

node app.js

and test it out by hitting the API endpoint with any GitHub username, ie: http://localhost:5000/api/coligo-io. You can also try out the demo on Heroku.

Measuring the Speedup

Open up your developer tools and under the network tab we can inspect the X-Response-Time header that is automatically set for us by the response-time middleware we put in place. This gives us the response time in milliseconds from when the response enters this middleware to when the header is written out to the client.

Testing our app against the repositories owned by facebook using this endpoint, we get a response time of623.472ms when the response is not in the cache and is returned from the GitHub API. (You'll notice the JSON object that the API returns has a source property which states whether the result was fetched from the redis cache or the GitHub API)

Response time for GitHub API

and if we hit our API endpoint again (before the item expires from our cache) we get a response time of only2.221ms. That's a 99.6% faster response!

Response time for GitHub API

Of course this number will depend on a number of factors such as the number of repositories that user has and the response time of the GitHub API. However, you will still see a significant speed up whenever the entry is returned from the cache. This shouldn't be very surprising as memory access is faster than I/O.

Concluding

Hopefully you can see the benefits of using redis as a caching layer for some of your API endpoints. Redis is an extremely fast and powerful data structures store when used in the right scenarios and for it's intended purposes - have a look at these performance benchmarks to see for yourself. There are numerous, extremely high-traffic websites such as StackOverflow, Pinterest, GitHub, Flickr, and many, many more!

You can apply the same concepts we used in this tutorial to cache other highly accessed resources such as certain database queries or views being rendered to your visitors. This is not to say that redis is the solution to all your performance problems as there are some things you simply can't cache.

Be sure to leave any questions you have in the comments section below or tweet them to me @coligo_io!

0 0