Asynchronous Code Design with Node.js

来源:互联网 发布:深蓝算不算人工智能 编辑:程序博客网 时间:2024/04/30 05:00
Asynchronous Code Design with Node.jsWritten by Marc Fasel   Friday, 26 August 2011 00:00

The asynchronous event-driven I/O of Node.js is currently evaluated by many enterprises as a high-performance alternative to the traditional synchronous I/O of multi-threaded enterprise application server. The asynchronous nature means that enterprise developers have to learn new programming patterns, and unlearn old ones. They have to undergo serious brain rewiring, possibly with the help of electroshocks. This article shows how to replace old synchronous programming patterns with shiny new asynchronous programming patterns.

Start Rewiring

To work with Node.js it is essential to understand how asynchronous programming works. Asynchronous code design is no simple matter, and requires some learning. Now it is time for some electroshocks: synchronous code examples will be presented alongside their asynchronous counterparts to show how synchronous code has to be changed to become aynchronous. The examples all revolve around the file-system (fs) module of Node.js, since it is the only module that contains synchronous I/O operations together with their asynchronous counterparts. With examples in both variants you can start rewiring your brain.

Dependent and Independent Code

Callback functions are the basic building block of asynchronous event-driven programming in Node.js. They are functions passed as an argument to an asynchronous I/O operation. They are called once the operation is finished. Callback functions are the implementation of events in Node.js.

The following shows an example of how to switch a synchronous I/O operation to the asynchronous counterpart, and shows the use of the callback function. The example reads the filenames of the current directory using the synchronous fs.readdirSync() call, then logs the names of the files to the console, and reads the process id for the current process.

SynchronousAsynchronous
var fs = require('fs'),    filenames,    i,    processId;filenames = fs.readdirSync(".");for (i = 0; i < filenames.length; i++) {    console.log(filenames[i]);}console.log("Ready.");processId = process.getuid();
var fs = require('fs'),    processId;fs.readdir(".", function (err, filenames) {    var i;    for (i = 0; i < filenames.length; i++) {        console.log(filenames[i]);    }    console.log("Ready.");});processId = process.getuid();

In the synchronous example the CPU waits at the fs.readdirSync() I/O operation, so this is the operation that needs to be changed. The asynchronous version of that function in Node.js is fs.readdir(). It is the same as fs.readdirSync(), but has the callback function as the second parameter.

The rule for using the callback function pattern is this: replace the synchronous function with its asynchronous counterpart, and place the code originally executed after the synchronous call inside the callback function. The code in the callback function does exactly the same as the code in the synchronous example. It logs the filenames to the console. It executes after the asynchronous I/O operation returns.

Just like the logging of filenames is dependent on the outcome of the fs.readdirSync() I/O operation, so does the logging of the number of files listed. The storage of the processId is independent on the outcome of the I/O operation. They therefore have to be moved to different spots in the asynchronous code.

The rule is to move the dependent code into the callback function, and leave the independent code where it is. The dependent code is executed once the I/O operation has finished, while the independent code is executed immediately after the I/O operation has been called.

Sequences

A standard pattern in synchronous code is a linear sequence: Several lines of code that all have to be executed one after the other, because each one depends on the outcome of the previous line. In the following example the code first changes the access mode of a file (like the Unix chmod command), renames the file, and then checks the renamed file if it is a symbolic link. Clearly this code cannot run out of order, otherwise the file is renamed before the mode is changed, or the check for symbolic link is done before the file is renamed. Both lead to an error. The order therefore must be preserved.

SynchronousAsynchronous
var fs = require('fs'),    oldFilename,    newFilename,    isSymLink;oldFilename = "./processId.txt";newFilename = "./processIdOld.txt";fs.chmodSync(oldFilename, 777);fs.renameSync(oldFilename, newFilename);isSymLink = fs.lstatSync(newFilename).isSymbolicLink();
var fs = require('fs'),    oldFilename,    newFilename;oldFilename = "./processId.txt";newFilename = "./processIdOld.txt";fs.chmod(oldFilename, 777, function (err) {       fs.rename(oldFilename, newFilename, function (err) {        fs.lstat(newFilename, function (err, stats) {            var isSymLink = stats.isSymbolicLink();        });    });});

In asynchronous code these sequences translate into nested callbacks. This example shows an fs.lstat() callback nested inside a fs.rename()callback nested inside a fs.chmod() callback.

Parallelisation

Asynchronous code is particularly suited for parallelisation of I/O operations: The execution of code does not block on the return of an I/O call. Multiple I/O operations can be started in parallel. In the following example the size of all files of a directory is added up in a loop to get the total number of bytes used by those files. Using synchronous code each iteration of the loop must wait until the I/O call retrieving the size of an individual file returns.

Asynchronous code allows making starting all I/O calls in the loop in rapid succession without waiting for the outcome. Whenever one of the I/O operations is done the callback function is called, and the size of the file can be added to the total number of bytes.

The only thing necessary is to have a proper stop criterion which determines when we’re done with processing, and the total number of bytes for all files has been calculated.

SynchronousAsynchronous
var fs = require('fs');function calculateByteSize() {    var totalBytes = 0,        i,        filenames,        stats;    filenames = fs.readdirSync(".");    for (i = 0; i < filenames.length; i ++) {        stats = fs.statSync("./" + filenames[i]);        totalBytes += stats.size;    }    console.log(totalBytes);}calculateByteSize();
var fs = require('fs');var count = 0,    totalBytes = 0;function calculateByteSize() {    fs.readdir(".", function (err, filenames) {        var i;        count = filenames.length;        for (i = 0; i < filenames.length; i++) {            fs.stat("./" + filenames[i], function (err, stats) {                totalBytes += stats.size;                count--;                if (count === 0) {                    console.log(totalBytes);                }            });        }    });}calculateByteSize();

The synchronous example is straightforward. In the asynchronous version first fs.readdir() is called to read the filenames in the directory. In the callback function fs.stat() is called for each file to return statistics for that file. This part is as expected.

The interesting thing happens in the callback function of fs.stat(), where the total number of bytes is calculated. The stop criterion used is the file count of the directory. The variable count is initialised with the file count, and counts down the number of times the callback function executes. Once the count is at 0 all I/O operations have called back, and the total number of bytes for all files has been computed. The calculation is done and the number of bytes can be logged to the console.

The asynchronous example has another interesting feature: it uses a closure. A closure is a function within a function, where the inner function accesses the variables declared in the outer function even after the outer function has finished. The callback function of fs.stat() is a closure, because it accesses the variables count and totalBytes that are declared in the callback function of fs.readdir() after that function has long finished. A closure has a context around itself. In this context variables can be placed that are accessed in the function.

Without closures both variables count and totalBytes would have to be made global. This is because the callback function of fs.stat() does not have any context in which to place a variable. The calculateBiteSize() function has long ended, only the global context is still there. This is where closures come to the rescue. Variables can be placed in this context so they can be accessed from within the function.

Code Reuse

Code fragments can be reused in JavaScript by wrapping them in functions. These functions can then be called from different places in the program. If an I/O operation is used in the function, some refactoring is needed when moving to asynchronous code.

The following synchronous example shows a function countFiles() that returns the number of files in a given directory. countFiles() uses the I/O operation fs.readdirSync() to determine the number of files. span style="font-family: courier new,courier;">countFiles() itself is called with two different input parameters:

SynchronousAsynchronous
var fs = require('fs');var path1 = "./",    path2 = ".././";function countFiles(path) {    var filenames = fs.readdirSync(path);    return filenames.length;}console.log(countFiles(path1) + " files in " + path1);console.log(countFiles(path2) + " files in " + path2);
var fs = require('fs');var path1 = "./",    path2 = ".././",    logCount;function countFiles(path, callback) {    fs.readdir(path, function (err, filenames) {        callback(err, path, filenames.length);    });}logCount = function (err, path, count) {    console.log(count + " files in " + path);};countFiles(path1, logCount); countFiles(path2, logCount);

Replacing the fs.readdirSync() with the asynchronous fs.readdir() forces the enclosing function cntFiles() also to become asynchronous with a callback, since the code calling cntFiles() depends on the result of that function.  After all the result is only available after fs.readdir() has returned. This leads to the restructure of cntFiles() to accept a callback function as well. The whole control flow suddenly is turned on its head: instead ofconsole.log() calling cntFiles(), which in turn calls fs.readdirSync(), in the asynchronous example cntFiles() calls fs.readdir(), which then calls console.log().

Conclusion

The article has highlighted some of the basic patterns of asynchronous programming. Switching the brain to asynchronous programming is by no means trivial, and will take some time to get used to. The payback for the added complexity is a dramatic improvement in concurrency. Together with the quick turnaround and ease of use of JavaScript, asynchronous programming in Node.js has the chance to put a dent in the market of enterprise applications, especially when it comes to the new breed of highly concurrent Web 2.0 apps.

Resources

Node.js website: http://nodejs.org
Learning Server-Side JavaScript with Node.js: http://bit.ly/dmMg9E
HowToNode: http://howtonode.org
Tim Caswell on Slideshare: http://www.slideshare.net/creationix