The Apache Software Foundation is a non-profit umbrella organization that develops a significant number of open-source software projects; the flagship product is the Apache HTTP Server. The Apache HTTP Server is arguably one of the most successful open source projects. It is certainly one of the most well known; many of the uninitiated frequently recognize only Linux and Apache as open source projects.
Like OpenSSH, Apache is included in almost every Linux distribution out there. However, also like OpenSSH, Apache is such an important and useful part of many Linux systems (and is such a great example of a particular style of configuration) that I include it as an example.
The Apache web server began life as a variant of the original NationalCenter for Supercomputing Applications (NCSA) httpd web server. It was a collection of patches against the core NCSA software, and so it was "a patchy" server (hence "Apache"). Eventually, the NCSA stopped maintaining their server, but by then the Apache developers had rewritten effectively all of the original NCSA code. Apache became a separate project, and active development continues on it today.
The objectives of the Apache server are reliability, flexibility, and robustness, rather than blazing speed. As a result, Apache is an incredibly sophisticated and feature-rich product. Apache can be used in a variety of capacities; in addition to basic web serving tasks, Apache can be integrated with various web development languages, such as Perl, PHP, and Java. Apache also provides an extensive API that allows developers to augment its capabilities so that it can do almost anything.
The Apache team is not focused on raw speed, though that's not to say that Apache isn't fast. In fact, it's quite speedy. However, since the Apache developers aren't interested in eking out every last ounce of performance they can, a number of products are faster than Apache. Very few are more stable or feature-rich, though.
The Apache HTTP Server is an extremely flexible and sophisticated piece of software. This article describes the process of installing Apache and getting it running. This article also discusses the basics of customizing Apache to meet the needs of a specific site. However, Apache is far too sophisticated to go into much detail. Instead, this article focuses on making generalizations about Apache that illustrate techniques that will be useful with other software.
Installing Apache
Most Linux distributions contain a preconfigured version of Apache. However, Apache is a very sophisticated and complex product, and so it frequently happens that a user or an administrator needs a custom installation of Apache with options and functionality not included in the distribution's default configuration. In such cases, the administrator will have to build a custom Apache from source code.
Apache really is a very complex productâ€â€Âso complex, in fact, that entire books have been written on it. There is no way that this article can pretend to be a comprehensive coverage of the Apache HTTP Server. Our goals:
·Demonstrate a real-world installation of a highly complicated server
·Provide the basics of configuring Apache
Apache is one of the most sophisticated software packages around, and so in completing this article you will take a major step in your understanding of software installations.
Depending on your role and reason for reading this article, this article will probably be useful to you in one of two ways. Perhaps you're just an average user who doesn't have much more than a passing interest in Apache, or maybe you're a system administrator who may actually be responsible for a production deployment of Apache in a real web site. In the first case, reading this article will be a great tutorial on how to install a complex piece of software. In the second case, this article will be a good primer on Apache installation, but it probably isn't going to be comprehensive enough to bet your job on. If you need more information, you should definitely consult Apache's documentation at httpd.apache.org or get a copy of one of the excellent Apache reference books.
Compile-Time Options
Even though Apache is a very complex package, it is a straightforward GNU autoconf installation. This part covers compile-time options for the Apache HTTP Server.
At the risk of being repetitive, remember that Apache is a very complicated product. A quick ./configure –help command shows that it has a very long list of compile-time options, and many of these are interrelated and even mutually exclusive. In other words, figuring out Apache's compile-time options can be a bear.
However, remember that the Apache developers are not out to make life needlessly complicated, and so all the options are actually necessary. Because of this, unfortunately, there's no easy way out, and there's no magic trick to configuring Apache. To install Apache, you have to understand its compile-time options, and the only way to do so is to consult the documentation for the software. Fortunately, Apache's documentation is outstanding and comprehensive; the web site at httpd.apache.org contains all there is to know about installing (and running) Apache, and a copy of the manual is actually included with the source code and installations.
That said, you can draw a few generalizations about the types of options that Apache supports. The remainder of this part discusses the following compiletime options:
·Installation directory
·Support for Dynamically Loadable Objects
·Selection of a Multi-Processing Module (MPM)
·Other important options
With these options, you can create a basic, reasonable Apache configuration. Almost any production environment will have to customize these options, but they'll be a good starting point. (Actually, the configuration as discussed will be very similar to Apache's default configuration, but it's worthwhile to see it broken down a little.)
Choosing an Installation Directory
As you've probably come to expect by now, the first step in installing a software package is to decide where to install it. The /usr/local directory is a good place to put software needed by many users that is upgraded infrequently, while /opt is a good place to put software that needs to be self-contained or that needs to be frequently or systematically updated.
So, where should Apache be installed? Well, the answer to that depends on the type of system that Apache's being installed on. If the target is an actual production machine, then chances are the system is going to be very closely monitored and upgrades are going to be done infrequently at best; in this case, you'll probably never need more than one copy installed at once, and it's probably just fine to install Apache into /usr/local. Most distributions that include Apache typically assume that the machine is going to be more or less a production server, or perhaps just someone's personal web server, and that upgrades can be handled via the package mechanism (such as RPM). Thus, most distributions install Apache into /usr; therefore, if you're installing a custom Apache from source code for such a system, you should probably use /usr/local.
There are other common use cases for Apache, though. For example, you might be a developer installing Apache on your desktop system. In this case, you might need to track the latest and greatest changes to Apache or have different copies installed, each with a different configuration. In this case, you probably want your Apache installation or installations to be more isolated, and so it's better to put them in a subdirectory of /opt; for example, /opt/apache. This is the case assumed by this article.
For the purposes of this article, then, the installation directory will be /opt/apache. To actually set this as a compile-time option via the ./configure program, you need to use the –prefix option; to wit, –prefix=/opt/apache. (If you think you'll need to upgrade frequently, another option would be to include the version number in the path, such as –prefix=/opt/apache-2.0.36 for Apache v2.0.36, and create /opt/apache as a symbolic link pointing to your version. This allows you to upgrade Apache more easily, by installing the new version separately and replacing the symbolic link after it's been tested.)
Enabling Loadable Module Support
Apache has a vast amount of functionality, and this means that if it's all enabled at compile-time, the resulting run-time memory footprint will be very large. Most installations, meanwhile, won't need all that functionality and so it's really a waste of memory. I will discuss how to enable the loadable modules that Apache supports to address this issue.
Apache uses Dynamic Shared Objects (DSOs), which are also referred to as Apache modules or just modules. Each module is a file on disk that contains some specific functionality for the Apache server. For example, there are separate DSO modules for monitoring server status and for spell checking.
Normally, if you want the functionality from a specific module, you have to compile that module into the Apache installation. (For example, you have to compile Apache to include the spell-checking module if you want that functionality.) However, if you later change your mind, you can't get rid of that spell-checking support without recompiling Apache. You can always just leave it in and not use it, of course, but in that case it's wasting memory and perhaps processing time, and it might even be a security risk if an attacker figures out a way to exploit it. For these reasons, you generally don't want Apache to include modules that aren't being used, even if they sit idle.
Apache addresses this issue by allowing each module to be loaded dynamically when the server starts up. This way, removing support for a given module is as easy as editing Apache's configuration file to remove the line that loads the module and then restarting the Apache server process. For example, disabling that spell-checking module is as easy as commenting out or deleting a line in a file, and then the spell-checking module will not only be idle, it won't even be present.
Another major benefit of loadable modules is the inverse case, which is adding a module to include support that you didn't think of or didn't exist when you built Apache. Without DSO support, for example, if you actually did want spell-checking support but forgot to compile it in, you'd have to recompile Apache to get it. If you enabled DSO support, though, you can easily add the module later and turn it on by restarting the server. This technique is also useful for making use of third-party modules that aren't included with Apache proper: You can simply compile such third-party modules as DSOs and load them into Apache at run-time. The alternative without dynamically loadable modules would be to somehow cause Apache to compile these third-party files along with the standard Apache files. Apache actually does support this but it's somewhat more cumbersome, as you'll see later.
Actually using modules (i.e., compiling them, installing them, activating them in the configuration file, and so on) is discussed later. The beauty of DSOs, after all, is that once you've built the core of Apache, you can add and remove modules later at your leisure. For now, all you have to do is enable support for DSOs within Apache. This is accomplished with two options: the –enable-mods-shared and –enable-so options.
The –enable-so option simply activates support for DSOs within Apache; however, it does not actually cause any specific modules to be build as DSOs. The –enable-mods-shared option to ./configure is the one that allows you to add or enable specific modules. For example, if you wanted to add that spell-checking module as a DSO, you would use –enable-mods-shared=mod_speling. However, there's also a convenience form you can useâ€â€Ânamely, –enable-mods-share=all that saves you the trouble of having to type out all the individual modules explicitly. This is the option used by this article, and its result is that all modules that can be enabled as DSOs are so enabled.
Choosing an MPM
Some Apache modules cannot be loaded as DSOs. These include the core module, which contains things such as the capability to actually load the DSO modules, and the Multi-Processing Module (MPM) to be used for the installation.
An MPM is a module that handles connections to clients and governs how Apache behaves while fulfilling those clients' requests. Each MPM can use a different technique for this "multiprocessing", and each has a different performance profile. By allowing the MPM to be customized, Apache can use a different technique for each operating system it runs on, allowing the best possible performance on each platform. MPM functionality was added in Apache's 2.0 release.
Because the MPM being used is the foundation for the rest of Apache, the MPM cannot be built as a dynamically loadable module; it has to be hardcompiled into the core Apache program. In other words, once Apache has been compiled and installed, changing to a different MPM requires that Apache itself be recompiled.
Generally, the ./configure program can detect which operating system Apache is being compiled on and it selects the most appropriate MPM for that platform. Some platforms, however, support multiple MPMs from which you can choose. In these cases, the ./configure program selects a reasonable default MPM, which is usually the most stable one.
Switching to a different MPM is not a small matter, since it has major performance implications. If you want to use an alternate MPM, you should consult the Apache documentation on the subject. Just to advise you of the presence of MPMs in Apache; the actual configuration used simply accepted the default MPM that Apache selected for the system. No custom MPM-related options were passed to ./configure.
Other Options
As I've mentioned repeatedly, Apache supports a very wide variety of compiletime options. Exhaustively covering these options is beyond the scope of this article, and indeed, exploring the features offered by Apache is a diversion of several hours just by itself. If you'd truly like to master Apache, you should consult the Apache documentation.
Completing the Installation
Now that you've selected the compile-time options, you can install Apache. This is a very straightforward example of a GNU autoconf installation, meaning that it consists of essentially three commands: ./configure with the appropriate arguments, make, and make install. Here's the actual ./configure command used for this example:
After the ./configure command and the make commands have completed, Apache will be installed in the /opt/apache directory.
You'll be able to apply many of the same techniques and modes of thought to installing any other server program, such as the OpenLDAP directory server or Samba file-and print-sharing server. Installing these things is pretty easy you just read the documentation to find out what options you need, then run ./configure, make, and make install, and voila! You've got your software installed. Of course, even then the work will only be half done it also has to be configured
Configuring Apache
We will discuss configuring Apache, after it's been installed. We previous discussed configuring Apache's compile-time options, which really boils down to enabling or disabling specific features. The run-time options, however, govern the actual behavior of Apache as it runs.
Administration Programs
In addition to the most important server program and support programs, there are also a few convenience utilities that are included to make users' lives easier. The apxs program is the "Apache Extension" utility. This program is used to compile and install Apache DSO modules. DSOs provide a way to add functionality to Apache without having to recompile the entire server. If you find yourself needing to install a new Apache DSO module, apxs will be very useful. It's pretty simple to use, but if you have trouble, check out Apache's documentation on it.
The logresolve program is used to convert numeric IP addresses in Apache log files into host names. A typical Apache server gets a lot of hits from web browsers on the Internet. Each hit contains information about the client, including its IP address. Apache logs these hits (as well as other things, such as errors), but the log entries include only the IP address. This is because the process of looking up a client's host name from the Domain Nameservers (DNSs) can be very expensive and needlessly waste bandwidth. The logresolve program is included as an easy way to take Apache's log files (which contain only IP addresses) and "reformat" them with host names fetched from the DNS service. You can use logresolve any time you need or want to view your site's logs with more than just IP addresses.
The rotatelogs program manages Apache's log files. As the HTTP Server runs for a long time, it keeps logging various events (especially hits from web browsers) into its log files. These files can become very large as a result. The rotatelogs command "rotates" Apache's log files; essentially, it just renames the current log files and then starts Apache logging to fresh files. The user can then delete the old log files after reviewing them, in order to conserve disk space.
User Database Management Programs
A common feature required by web sites is the restriction of access to certain pages or sets of pages to specific users. (These pages and sets of pages are known as realms.) Apache supports this functionality by various directives in its configuration files. However, another key component in supporting user authentication is, of course, to have a database of users.
There is any number of ways to set up a database for users. Commercial web sites, with which most of you will be familiar, use some kind of highcapacity relational database or directory to store user account information. The authentication that Apache supports out of the box, however, is more modest. It supports user databases stored on local disks, in a few different formats. (There are, of course, third-party modules that you can use to provide support for Apache for high-capacity data stores; they're just not included with the standard Apache distribution.)
The three programs dbmmanage, htdigest, and htpasswd are utilities that manage these user databases. The nuances of which data format is best for a given application, and when you can use Apache's basic mechanism or switch to a high-capacity database are advanced topics and aren't really relevant to this article. If you need more information, please consult the manual pages for these programs (included with the Apache installation) or the documentation on the web site at httpd.apache.org.
Configuration Files in the conf Directory
The conf directory contains all configuration files for Apache. (In reality, the httpd program, which is the actual Apache HTTP Server program, can be told which configuration file to use as a command-line parameter, so the Apache configuration directory isn't hard-coded or forced to a specific value.)
The canonical name for Apache's configuration file is httpd.conf. (Again, however, this doesn't have to be the case; the httpd program can be passed an arbitrary file name.) A typical installation of Apache will have several sample httpd.conf files, such as a configuration tuned for maximum performance and one that enables SSL support, as well as a standard "vanilla" httpd.conf. As a result, there will usually be several files with a ".conf" extension in the conf subdirectory of the Apache installation; only one of these files actually needs to be there. Additionally, for each .conf file there is a -std.conf file. The -std files are copies of the original files; their purpose is to act as a backup in case a tinkering administrator really mucks up a .conf file. However, the -std files are also not required for Apache to run.
You can actually remove all the .conf and -std.conf files except for httpd.conf if you don't like them cluttering up the directory. That will leave the traditional Apache configuration file as a starting point to begin customizing the server. However, it definitely pays to take a look at the other sample files before removing them if you choose to do so; there's a lot of very valuable information and sample configurations in those files. Of course, there's still no substitute for reading the actual documentation. The sample configuration files are just that and are not comprehensive.
There are two other files in the conf subdirectory: mime.types and magic. MIME is the Multimedia Internet Mail Extension, and it is an industry standard for specifying the type of a file, such as "executable" or "plain text" or "HTML." (Don't be confused by the "Mail" in MIMEâ€â€Âthe MIME standard can be used just about anywhere!) Apache's mime.types file is just a mapping that assigns MIME types to file extensions. (For example, the extension ".txt" is mapped to a MIME type of "text/plain".) Usually, you won't have to modify this file, since it contains reasonable defaults. If you do, carefully read the Apache documentation on it, since altering mime.types can have subtle consequences. The magic file works similarly to mime.types, except that it is used to assign MIME types to files where the file extension may be unreliable, based on "magic numbers" contained in the file. This is frequently used for items such as audio files.
Log Files in the logs Directory
The logs subdirectory contains, intuitively enough, the log files for the Apache HTTP Server. Apache's logging mechanism is very flexible, and it can be configured to log a variety of events to any number of files in any format. Altering this configuration is seldom necessary, and so I don't cover it here. Fortunately, Apache has a default logging configuration that is almost always good enough.
Content Directories
The point of the Apache HTTP Server is, of course, to serve up documents such as HTML web pages. Apache also supports many different programming languages and environments that web developers can use to create and run dynamic, interactive web applications. The Apache installation uses the htdocs and cgi-bin directories to contain the files that actually make up the site's content.
The htdocs directory is the home of "static" HTML files. These are files that don't have any dynamically generated content, such as tables of contents, default home pages, standard header and footer HTML files, and so on. The htdocs directory also contains any other static files needed by the site, such as graphical image files, tarball (.tar.gz) files that are downloaded from the site, and so on. Literally, any file placed into the htdocs directory (and that the httpd process has permission to access) becomes available for download over the web site. In other words, don't put anything in htdocs that you don't want people to be able to download!
The cgi-bin directory contains scripts and programs that make use of the Common Gateway Interface (CGI) standard for dynamic web applications. CGI defines a standard, fixed way in which a web server (such as Apache) interacts with the programs used to generate content. Web developers write programs in a programming language such as Perl, Python, C, or some other language that understands the CGI standard, and this allows them to work within a web server that supports CGI, such as Apache. Apache's cgi-bin directory contains all the CGI programs used on the site. The directory is separate from the htdocs directory for security reasons and to make it easier to manage.
Note that the htdocs and cgi-bin directories are for the "standard" way of using Apache. Apache is extremely flexible and supports a wide variety of ways to serve up and dynamically generate content. For example, Apache can be used with web applications using Sun's Java Servlets technology, which defines a very different way that web application data is stored and served up. Java Servlet–based sites frequently don't use either htdocs or cgi-bin.
Documentation Directories
There are two directories with similar names in a standard Apache installation: manual and man. No, this wasn't a typo! These directories contain documentation for Apache, and for its related support utilities.
The manual directory contains a copy of the Apache user's manual, which is in HTML. You can use any web browser to browse this manual, which is a copy of the documentation on Apache's web site. These files are not required for Apache to run, however, so if you're low on disk space they're safe to delete (though they are definitely useful to have around!). Depending on the configuration, you may also need to remove the reference to the manual directory in the httpd.conf file; check that file for details. At any rate, you can access the manual from your server by accessing the URL http://www.localhost/manual with your favorite browser.
The man directory, meanwhile, contains Unix manual pages for the program binaries in the bin directory. This is where the manual pages for httpd, apachectl, and so on are located. However, if you've installed Apache into a directory such as /usr/local or /opt/apache, you may not be able to access these pages because the man program itself doesn't know to look in those directories for the pages. In order to access these manual pages, use a command similar to this one:
man -M /opt/apache/man
You'll have to substitute your actual installation directory for /opt/apache.
Other Directories
Several other directories are included in an Apache installationâ€â€Ânamely, build, error, icons, include, lib, and modules. In most cases, these directories are used by Apache and the related tools, and users and administrators don't have to do anything in or with these directories. If you'd like more information on them, you can check the Apache web site and other documentation.
Customizing Apache
So far, I've discussed only the installation of Apache and very basic configuration tasks required to get it to run. Apache's purpose, however, is to run a web site, and so the next step is to discuss how to customize Apache to run your web site.
It's imperative to remember that Apache is a very sophisticated product, and it is capable of an amazingly wide range of behavior. The goal of this article is not to exhaustively describe how to customize Apache for every possible circumstance. Instead, the goal is to illustrate the mechanics of customizing Apache and also to demonstrate a particular methodology for configuring a server. After you read this article, you'll not only have a basic understanding of how to configure Apache, but you'll also understand how to go about figuring out other servers' configurations.
Perhaps the most common customization you'll need to do to an Apache HTTP Server is configuring specific directories. Generally, a given web site has many directories, and sometimes each directory needs specific behaviors or features. For example, most directories might be publicly accessible, but one directory needs to be "locked down" in terms of security and to require a password. To support this functionality, Apache allows you to specify properties or enable features for individual directories, by using directives within "Directory" blocks in the httpd.conf file.
A directive is, in Apache's parlance, a statement in the httpd.conf file that instructs the server to take some action. For example, a line that enables SSL/TLS encryption support would be a directive, as would a line that turns off the ability to list the contents of a directory. Apache directives, in turn, are enclosed in various "blocks" that determine the scope of their effect.
There are many types of these enclosing blocks, but the simplest is the Directory block. A Directory block has a definite beginning (marked by a …> line), and a definite ending (marked by a line). And directives inside the block affect only that block (and any children of that block).
A typical Apache httpd.conf configuration file contains an entry for the toplevel directory on the site. This entry contains some directives that are applied to the root directory. Since they also apply to any children of the root directory, this effectively sets up site-wide defaults.
The path of /opt/apache/htdocs in the preceding directory block points to the location in the installation where HTML content is to be placed, and the directives enclosed in the block will be applied to that directory. Since all of the content has to be located under that directory, meanwhile, this block also effectively serves as a site-wide default. (Describing what each of those directives does would take too much time and isn't really relevantâ€â€Âsee the Apache documentation mentioned earlier for details.)
It is possible to override these defaults on a case-by-case basis, however. For example, perhaps you wish to add files named home.html to the DirectoryIndex directive, but only for the directory /opt/apache/htdocs/home. The following Directory block will accomplish this; the DirectoryIndex directive will be updated for that specific directory, but all other directives will remain unchanged.
Most of the customization of an Apache site revolves around creating these Directory blocks (and other blocks) as appropriate, and then using the correct directives within each block. This model is actually very powerful; since it lets you customize the behavior of each individual area of the site. Unfortunately, this is as much detail as I can really go into here, since I just couldn't do justice to the full power of Apache's configuration system in the space I have. Read the Apache manual cited earlier for full details.
All this terminology about Directories and directives aside, what this really boils down to is that Apache provides you with a set of options (directives) and provides you with a mechanism (Directory blocks) to scope the options to different parts of the system. This is actually a very common model for configuration of server daemon software: Create scope blocks somehow, and then set options within each scope block.
In the Apache world, these scope blocks are created by pairs of and lines; in other programs, they might be set up through pairs of braces ({ and }) or simply through positioning of the lines in the file (as with the OpenSSH server). Even though these programs differ in their syntax, it's the same general idea in each case. When you find yourself trying to configure a new, unfamiliar program, try and identify cases where you establish "scopes" and create "options" within them.
In my next article i will explain more about Apache for the advanced users.