Nginx vs Apache

来源:互联网 发布:淘宝代运营骗局 深圳 编辑:程序博客网 时间:2024/05/21 11:34

 


what is the Nginx web and proxy server and how does it compare to Apache? Should you use one of these servers or both? Here we explore some answers to these questions.

The Apache web servers have been in use since 1995. Apache powers more websites than any other product; Microsoft IIS comes in second.

Because the open-source Apache web server has been available for so many years, and has so many users, lots of modules have been written to expand its functionality - most of these are open-source as well. One popular configuration, for instance, is to use Apache to server up static web pages and the mod_jk module to run Java and JSP code on Tomcat to make an interactive application. Another example is using mod_php to execute php scripts without having to use cgi.

But Apache slows down under heavy load, because of the need to spawn new processes, thus consuming more computer memory. It also creates new threads that must compete with others for access to memory and CPU. Apache will also refuse new connections when traffic reaches the limit of processes configured by the administrator.

Nginx is an open source web server written to address some of the performance and scalability issues associated with Apache. The product is open source and free, but Nginx offers support if you buy its Nginx Plus version.

Nginx says its web server was written to address the C10K problem, which is a reference to a paper written by Daniel Kegel about how to get one web server to handle 10,000 connections, given the limitations of the operating systems. In his paper, he cited another paper by Dean Gaudet who wrote, “Why don't you guys use a select/event based model like Zeus? It's clearly the fastest”.

Nginx is indeed event-based. They call their architecture “event-driven and asynchronous”. Apache relies on processes and threads. So, what’s the difference?

How Apache works and why it has limitations

Apache creates processes and threads to handle additional connections. The administrator can configure the server to control the maximum number of allowable processes. This configuration varies depending on the available memory on the machine. Too many processes exhaust memory and can cause the machine to swap memory to disk, severely degrading performance. Plus, when the limit of processes is reached, Apache refuses additional connections.

Apache can be configured to run in pre-forked or worker multi-process mode (MPM). Either way it creates new processes as additional users connect. The difference between the two is that pre-forked mode creates one thread per process, each of which handles one user request. Worker mode creates new processes too, but each has more than one thread, each of which handles one request per user. So one worker mode process handles more than one connection and one pre-fork mode process handles one connection only.

Worker mode uses less memory than forked-mode, because processes consume more memory than threads, which are nothing more than code running inside a process.

Moreover, worker mode is not thread safe. That means if you use non thread-safe modules like mod_php, to serve up php pages, you need to use pre-forked mode, thus consuming more memory. So, when choosing modules and configuration you have to confront the thread-versus-process optimization problem and constraint issues.

The limiting factor in tuning Apache is memory and the potential to dead-locked threads that are contending for the same CPU and memory. If a thread is stopped, the user waits for the web page to appear, until the process makes it free, so it can send back the page. If a thread is deadlocked, it does not know how to restart, thus remaining stuck.

Nginx

Nginx works differently than Apache, mainly with regard to how it handles threads.

Nginx does not create new processes for each web request, instead the administrator configures how many worker processes to create for the main Nginx process. (One rule of thumb is to have one worker process for each CPU.) Each of these processes is single-threaded. Each worker can handle thousands of concurrent connections. It does this asynchronously with one thread, rather than using multi-threaded programming.

The Nginx also spins off cache loader and cache manager processes to read data from disk and load it into the cache and expire it from the cache when directed.

Nginx is composed of modules that are included at compile time. That means the user downloads the source code and selects which modules to compile. There are modules for connection to back end application servers, load balancing, proxy server, and others. There is no module for PHP, as Nginx can compile PHP code itself.

Here is a diagram of the Nginx architecture taken from Andrew Alexeev´s deepanalysis of Nginx and how it works.

In this diagram we see that Nginx, in this instance, is using the FasCGI process to execute Python, Ruby, or other code and the Memcache memory object caching system. The worker processes load the main ht_core Ngix process to for HTTP requests. We also see that Nginx is tightly integrated with Windows and Linux kernel features to gain a boost in performance; these kernel features have improved over time, allowing Nginx to take advantage.

Nginx is said to be event-driven, asynchronous, and non-blocking. “Event” means a user connection. “Asynchronous” means that it handles user interaction for more than one user connection at a time. “Non-blocking” means it does not stop disk I/O because the CPU is busy; in that case, it works on other events until the I/O is freed up.

Nginx compared with Apache 2.4 MPM

Apache 2.4 includes the MPM event module. It handles some connection types in an asynchronous manner, as does Nginx, but not in exactly the same manner. The goal is to reduce memory requirements as load increases.

As with earlier versions, Apache 2.4 includes the worker and pre-forked modes we mentioned above but has added the mpm_event_module (Apache MPM event module) to solve the problem of threads that are kept alive waiting for that user connection to make additional requests. MPM dedicates a thread to handle sockets that are both in listening and keep-alive state. This addresses some of the memory issues associated with older versions of Apache, by reducing the number of threads and processes created. To use this, the administrator would need to download the Apache source code and include the mpm_event_module, and compile Apache instead of using a binary distribution.

The Apache MPM event model is not exactly the same as Nginx, because it still spins off new processes as new requests come in (subject to the limit set by the administrator). Nginx does not set up more processes per user connection. The improvement that comes with Apache 2.4 is that those processes that are spin offs create fewer threads than normal Apache worker mode would. This is because one thread can handle more than one connection, instead of requiring a new process for each connection.

从以下几方面说明两者之间的区别

1  Connection Handling Architecture

One big difference between Apache and Nginx is the actual way that they handle connections and traffic. This provides perhaps the most significant difference in the way that they respond to different traffic conditions.

Apache

Apache provides a variety of multi-processing modules (Apache calls these MPMs) that dictate how client requests are handled. Basically, this allows administrators to swap out its connection handling architecture easily. These are:

  • mpm_prefork: This processing module spawns processes with a single thread each to handle request. Each child can handle a single connection at a time. As long as the number of requests is fewer than the number of processes, this MPM is very fast. However, performance degrades quickly after the requests surpass the number of processes, so this is not a good choice in many scenarios. Each process has a significant impact on RAM consumption, so this MPM is difficult to scale effectively. This may still be a good choice though if used in conjunction with other components that are not built with threads in mind. For instance, PHP is not thread-safe, so this MPM is recommended as the only safe way of working with mod_php, the Apache module for processing these files.
  • mpm_worker: This module spawns processes that can each manage multiple threads. Each of these threads can handle a single connection. Threads are much more efficient than processes, which means that this MPM scales better than the prefork MPM. Since there are more threads than processes, this also means that new connections can immediately take a free thread instead of having to wait for a free process.
  • mpm_event: This module is similar to the worker module in most situations, but is optimized to handle keep-alive connections. When using the worker MPM, a connection will hold a thread regardless of whether a request is actively being made for as long as the connection is kept alive. The event MPM handles keep alive connections by setting aside dedicated threads for handling keep alive connections and passing active requests off to other threads. This keeps the module from getting bogged down by keep-alive requests, allowing for faster execution. This was marked stable with the release of Apache 2.4.

As you can see, Apache provides a flexible architecture for choosing different connection and request handling algorithms. The choices provided are mainly a function of the server's evolution and the increasing need for concurrency as the internet landscape has changed.

Nginx

Nginx came onto the scene after Apache, with more awareness of the concurrency problems that would face sites at scale. Leveraging this knowledge, Nginx was designed from the ground up to use an asynchronous, non-blocking, event-driven connection handling algorithm.

Nginx spawns worker processes, each of which can handle thousands of connections. The worker processes accomplish this by implementing a fast looping mechanism that continuously checks for and processes events. Decoupling actual work from connections allows each worker to concern itself with a connection only when a new event has been triggered.

Each of the connections handled by the worker are placed within the event loop where they exist with other connections. Within the loop, events are processed asynchronously, allowing work to be handled in a non-blocking manner. When the connection closes, it is removed from the loop.

This style of connection processing allows Nginx to scale incredibly far with limited resources. Since the server is single-threaded and processes are not spawned to handle each new connection, the memory and CPU usage tends to stay relatively consistent, even at times of heavy load.

2  Static vs Dynamic Content

In terms of real world use-cases, one of the most common comparisons between Apache and Nginx is the way in which each server handles requests for static and dynamic content.

Apache

Apache servers can handle static content using its conventional file-based methods. The performance of these operations is mainly a function of the MPM methods described above.

Apache can also process dynamic content by embedding a processor of the language in question into each of its worker instances. This allows it to execute dynamic content within the web server itself without having to rely on external components. These dynamic processors can be enabled through the use of dynamically loadable modules.

Apache's ability to handle dynamic content internally means that configuration of dynamic processing tends to be simpler. Communication does not need to be coordinated with an additional piece of software and modules can easily be swapped out if the content requirements change.

Nginx

Nginx does not have any ability to process dynamic content natively. To handle PHP and other requests for dynamic content, Nginx must pass to an external processor for execution and wait for the rendered content to be sent back. The results can then be relayed to the client.

For administrators, this means that communication must be configured between Nginx and the processor over one of the protocols Nginx knows how to speak (http, FastCGI, SCGI, uWSGI, memcache). This can complicate things slightly, especially when trying to anticipate the number of connections to allow, as an additional connection will be used for each call to the processor.

However, this method has some advantages as well. Since the dynamic interpreter is not embedded in the worker process, its overhead will only be present for dynamic content. Static content can be served in a straight-forward manner and the interpreter will only be contacted when needed. Apache can also function in this manner, but doing so removes the benefits in the previous section.

3  Distributed vs Centralized Configuration

For administrators, one of the most readily apparent differences between these two pieces of software is whether directory-level configuration is permitted within the content directories.

Apache

Apache includes an option to allow additional configuration on a per-directory basis by inspecting and interpreting directives in hidden files within the content directories themselves. These files are known as.htaccess files.

Since these files reside within the content directories themselves, when handling a request, Apache checks each component of the path to the requested file for an .htaccess file and applies the directives found within. This effectively allows decentralized configuration of the web server, which is often used for implementing URL rewrites, access restrictions, authorization and authentication, even caching policies.

While the above examples can all be configured in the main Apache configuration file, .htaccess files have some important advantages. First, since these are interpreted each time they are found along a request path, they are implemented immediately without reloading the server. Second, it makes it possible to allow non-privileged users to control certain aspects of their own web content without giving them control over the entire configuration file.

This provides an easy way for certain web software, like content management systems, to configure their environment without providing access to the central configuration file. This is also used by shared hosting providers to retain control of the main configuration while giving clients control over their specific directories.

Nginx

Nginx does not interpret .htaccess files, nor does it provide any mechanism for evaluating per-directory configuration outside of the main configuration file. This may be less flexible than the Apache model, but it does have its own advantages.

The most notable improvement over the .htaccess system of directory-level configuration is increased performance. For a typical Apache setup that may allow .htaccess in any directory, the server will check for these files in each of the parent directories leading up to the requested file, for each request. If one or more .htaccess files are found during this search, they must be read and interpreted. By not allowing directory overrides, Nginx can serve requests faster by
doing a single directory lookup and file read for each request (assuming that the file is found in the conventional directory structure).

Another advantage is security related. Distributing directory-level configuration access also distributes the responsibility of security to individual users, who may not be trusted to handle this task well. Ensuring that the administrator maintains control over the entire web server can prevent some security missteps that may occur when access is given to other parties.

Keep in mind that it is possible to turn off .htaccess interpretation in Apache if these concerns resonate with you.

4  File vs URI-Based Interpretation

How the web server interprets requests and maps them to actual resources on the system is another area where these two servers differ.

Apache

Apache provides the ability to interpret a request as a physical resource on the filesystem or as a URI location that may need a more abstract evaluation. In general, for the former Apache uses <Directory> or<Files> blocks, while it utilizes <Location> blocks for more abstract resources.

Because Apache was designed from the ground up as a web server, the default is usually to interpret requests as filesystem resources. It begins by taking the document root and appending the portion of the request following the host and port number to try to find an actual file. Basically, the filesystem hierarchy is represented on the web as the available document tree.

Apache provides a number of alternatives for when the request does not match the underlying filesystem. For instance, an Alias directive can be used to map to an alternative location. Using <Location> blocks is a method of working with the URI itself instead of the filesystem. There are also regular expression variants which can be used to apply configuration more flexibly throughout the filesystem.

While Apache has the ability to operate on both the underlying filesystem and the webspace, it leans heavily towards filesystem methods. This can be seen in some of the design decisions, including the use of .htaccess files for per-directory configuration. The Apache docs themselves warn against using URI-based blocks to restrict access when the request mirrors the underlying filesystem.

Nginx

Nginx was created to be both a web server and a proxy server. Due to the architecture required for these two roles, it works primarily with URIs, translating to the filesystem when necessary.

This can be seen in some of the ways that Nginx configuration files are constructed and interpreted.Nginx does not provide a mechanism for specifying configuration for a filesystem directory and instead parses the URI itself.

For instance, the primary configuration blocks for Nginx are server and location blocks. The serverblock interprets the host being requested, while the location blocks are responsible for matching portions of the URI that comes after the host and port. At this point, the request is being interpreted as a URI, not as a location on the filesystem.

For static files, all requests eventually have to be mapped to a location on the filesystem. First, Nginx selects the server and location blocks that will handle the request and then combines the document root with the URI, adapting anything necessary according to the configuration specified.

This may seem similar, but parsing requests primarily as URIs instead of filesystem locations allows Nginx to more easily function in both web, mail, and proxy server roles. Nginx is configured simply by laying out how to respond to different request patterns. Nginx does not check the filesystem until it is ready to serve the request, which explains why it does not implement a form of .htaccess files.

5  Modules

Both Nginx and Apache are extensible through module systems, but the way that they work differ significantly.

Apache

Apache's module system allows you to dynamically load or unload modules to satisfy your needs during the course of running the server. The Apache core is always present, while modules can be turned on or off, adding or removing additional functionality and hooking into the main server.

Apache uses this functionality for a large variety tasks. Due to the maturity of the platform, there is an extensive library of modules available. These can be used to alter some of the core functionality of the server, such as mod_php, which embeds a PHP interpreter into each running worker.

Modules are not limited to processing dynamic content, however. Among other functions, they can be used used for rewriting URLs, authenticating clients, hardening the server, logging, caching, compression, proxying, rate limiting, and encrypting. Dynamic modules can extend the core functionality considerably without much additional work.

Nginx

Nginx also implements a module system, but it is quite different from the Apache system. In Nginx, modules are not dynamically loadable, so they must be selected and compiled into the core software.

For many users, this will make Nginx much less flexible. This is especially true for users who are not comfortable maintaining their own compiled software outside of their distribution's conventional packaging system. While distributions' packages tend to include the most commonly used modules, if you require a non-standard module, you will have to build the server from source yourself.

Nginx modules are still very useful though, and they allow you to dictate what you want out of your server by only including the functionality you intend to use. Some users also may consider this more secure, as arbitrary components cannot be hooked into the server. However, if your server is ever put in a position where this is possible, it is likely compromised already.

Nginx modules allow many of the same capabilities as Apache modules. For instance, Nginx modules can provide proxying support, compression, rate limiting, logging, rewriting, geolocation, authentication, encryption, streaming, and mail functionality.

6  Support, Compatibility, Ecosystem, and Documentation

A major point to consider is what the actual process of getting up and running will be given the landscape of available help and support among other software.

Apache

Because Apache has been popular for so long, support for the server is fairly ubiquitous. There is a large library of first- and third-party documentation available for the core server and for task-based scenarios involving hooking Apache up with other software.

Along with documentation, many tools and web projects include tools to bootstrap themselves within an Apache environment. This may be included in the projects themselves, or in the packages maintained by your distribution's packaging team.

Apache, in general, will have more support from third-party projects simply because of its market share and the length of time it has been available. Administrators are also somewhat more likely to have experience working with Apache not only due to its prevalence, but also because many people start off in shared-hosting scenarios which almost exclusively rely on Apache due to the .htaccess distributed management capabilities.

Nginx

Nginx is experiencing increased support as more users adopt it for its performance profile, but it still has some catching up to do in some key areas.

In the past, it was difficult to find comprehensive English-language documentation regarding Nginx due to the fact that most of the early development and documentation were in Russian. As interest in the project grew, the documentation has been filled out and there are now plenty of administration resources on the Nginx site and through third parties.

In regards to third-party applications, support and documentation is becoming more readily available, and package maintainers are beginning, in some cases, to give choices between auto-configuring for Apache and Nginx. Even without support, configuring Nginx to work with alternative software is usually straight-forward so long as the project itself documents its requirements (permissions, headers, etc).

          7  linux底下流媒体配置采用nginx比较简单

Using Nginx and Apache both

Apache is known for its power and Nginx for speed. This means Nginx can serve up static content quicker, but Apache includes the modules needed to work with back end application servers and run scripting languages.

Both Apache and Nginx can be used as proxy servers, but using Nginx as a proxy server and Apache as the back end is a common approach to take. Nginx includes advanced load balancing and caching abilities. Apache servers can, of course, be deployed in great numbers. A load balancer is needed in order to exploit this. Apache has a load balancer module too, plus there are hardware based load balancers.

Another configuration is to deploy Nginx with the separate php-fpm application. We say that php-fpm is an application, because it is not a .dll or .so loaded at execution time, as is the case with Apache modules. Php-fpm (the FastCGI Process Manager) can be used with Nginx to deliver php scripting ability to Nginx to render non-static content.

When is Apache preferred over Nginx?

Apache comes with built in support for PHP, Python, Perl, and other languages. For example, the mod_python and mod_php Apache modules process PHP and Perl code inside the Apache process. mod_python is more efficient that using CGI or FastCGI, because it does not have to load the Python interpreter for each request. The same is true for mod_rails and mod_rack, which give Apache the ability to run Ruby on Rails. These processes run faster inside the Apache process.

So if your website is predominately Python or Ruby, Apache might be preferred for your application, as Apache does not have to use CGI. For PHP, it does not matter as Nginx supports PHP internally.

Here we have given some explanation of how Nginx is different from Apache and how you might consider using one or both of them, and which of the two might fit your needs more closely. Plus we have discussed how Apache 2.4 brings some of the Nginx improvements in thread and process management to the Apache web server. So you can choose the best solution for your needs.

0 0
原创粉丝点击