Access Log

来源:互联网 发布:cdrx4软件下载 绿色 编辑:程序博客网 时间:2024/05/16 08:34

Access Log

Related Modules Related Directives
  • mod_log_config
  • mod_setenvif
  • CustomLog
  • LogFormat
  • SetEnvIf

The server access log records all requests processed by the server. The location and content of the access log are controlled by the CustomLog directive. The LogFormat directive can be used to simplify the selection of the contents of the logs. This section describes how to configure the server to record information in the access log.

Of course, storing the information in the access log is only the start of log management. The next step is to analyze this information to produce useful statistics. Log analysis in general is beyond the scope of this document, and not really part of the job of the web server itself. For more information about this topic, and for applications which perform log analysis, check the Open Directory or Yahoo.

Various versions of Apache httpd have used other modules and directives to control access logging, including mod_log_referer, mod_log_agent, and the TransferLog directive. The CustomLog directive now subsumes the functionality of all the older directives.

The format of the access log is highly configurable. The format is specified using a format string that looks much like a C-style printf(1) format string. Some examples are presented in the next sections. For a complete list of the possible contents of the format string, see the mod_log_config format strings.

Common Log Format

A typical configuration for the access log might look as follows.

LogFormat "%h %l %u %t /"%r/" %>s %b" common
CustomLog logs/access_log common

This defines the nickname common and associates it with a particular log format string. The format string consists of percent directives, each of which tell the server to log a particular piece of information. Literal characters may also be placed in the format string and will be copied directly into the log output. The quote character (") must be escaped by placing a back-slash before it to prevent it from being interpreted as the end of the format string. The format string may also contain the special control characters "/n" for new-line and "/t" for tab.

The CustomLog directive sets up a new log file using the defined nickname. The filename for the access log is relative to the ServerRoot unless it begins with a slash.

The above configuration will write log entries in a format known as the Common Log Format (CLF). This standard format can be produced by many different web servers and read by many log analysis programs. The log file entries produced in CLF will look something like this:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Each part of this log entry is described below.

127.0.0.1 (%h)
This is the IP address of the client (remote host) which made the request to the server. If HostnameLookups is set to On, then the server will try to determine the hostname and log it in place of the IP address. However, this configuration is not recommended since it can significantly slow the server. Instead, it is best to use a log post-processor such as logresolve to determine the hostnames. The IP address reported here is not necessarily the address of the machine at which the user is sitting. If a proxy server exists between the user and the server, this address will be the address of the proxy, rather than the originating machine.
- (%l)
The "hyphen" in the output indicates that the requested piece of information is not available. In this case, the information that is not available is the RFC 1413 identity of the client determined by identd on the clients machine. This information is highly unreliable and should almost never be used except on tightly controlled internal networks. Apache httpd will not even attempt to determine this information unless IdentityCheck is set to On.
frank (%u)
This is the userid of the person requesting the document as determined by HTTP authentication. The same value is typically provided to CGI scripts in the REMOTE_USER environment variable. If the status code for the request (see below) is 401, then this value should not be trusted because the user is not yet authenticated. If the document is not password protected, this entry will be "-" just like the previous one.
[10/Oct/2000:13:55:36 -0700] (%t)
The time that the request was received. The format is:

[day/month/year:hour:minute:second zone]
day = 2*digit
month = 3*letter
year = 4*digit
hour = 2*digit
minute = 2*digit
second = 2*digit
zone = (`+' | `-') 4*digit

It is possible to have the time displayed in another format by specifying %{format}t in the log format string, where format is as in strftime(3) from the C standard library.
"GET /apache_pb.gif HTTP/1.0" (/"%r/")
The request line from the client is given in double quotes. The request line contains a great deal of useful information. First, the method used by the client is GET. Second, the client requested the resource /apache_pb.gif, and third, the client used the protocol HTTP/1.0. It is also possible to log one or more parts of the request line independently. For example, the format string "%m %U%q %H" will log the method, path, query-string, and protocol, resulting in exactly the same output as "%r".
200 (%>s)
This is the status code that the server sends back to the client. This information is very valuable, because it reveals whether the request resulted in a successful response (codes beginning in 2), a redirection (codes beginning in 3), an error caused by the client (codes beginning in 4), or an error in the server (codes beginning in 5). The full list of possible status codes can be found in the HTTP specification (RFC2616 section 10).
2326 (%b)
The last entry indicates the size of the object returned to the client, not including the response headers. If no content was returned to the client, this value will be "-". To log "0" for no content, use %B instead.

Combined Log Format

Another commonly used format string is called the Combined Log Format. It can be used as follows.

LogFormat "%h %l %u %t /"%r/" %>s %b /"%{Referer}i/" /"%{User-agent}i/"" combined
CustomLog log/access_log combined

This format is exactly the same as the Common Log Format, with the addition of two more fields. Each of the additional fields uses the percent-directive %{header}i, where header can be any HTTP request header. The access log under this format will look like:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

The additional fields are:

"http://www.example.com/start.html" (/"%{Referer}i/")
The "Referer" (sic) HTTP request header. This gives the site that the client reports having been referred from. (This should be the page that links to or includes /apache_pb.gif).
"Mozilla/4.08 [en] (Win98; I ;Nav)" (/"%{User-agent}i/")
The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.

Multiple Access Logs

Multiple access logs can be created simply by specifying multiple CustomLog directives in the configuration file. For example, the following directives will create three access logs. The first contains the basic CLF information, while the second and third contain referer and browser information. The last two CustomLog lines show how to mimic the effects of the ReferLog and AgentLog directives.

LogFormat "%h %l %u %t /"%r/" %>s %b" common
CustomLog logs/access_log common
CustomLog logs/referer_log "%{Referer}i -> %U"
CustomLog logs/agent_log "%{User-agent}i"

This example also shows that it is not necessary to define a nickname with the LogFormat directive. Instead, the log format can be specified directly in the CustomLog directive.

Conditional Logs

There are times when it is convenient to exclude certain entries from the access logs based on characteristics of the client request. This is easily accomplished with the help of environment variables. First, an environment variable must be set to indicate that the request meets certain conditions. This is usually accomplished with SetEnvIf. Then the env= clause of the CustomLog directive is used to include or exclude requests where the environment variable is set. Some examples:

# Mark requests from the loop-back interface
SetEnvIf Remote_Addr "127/.0/.0/.1" dontlog
# Mark requests for the robots.txt file
SetEnvIf Request_URI "^/robots/.txt$" dontlog
# Log what remains
CustomLog logs/access_log common env=!dontlog

As another example, consider logging requests from english-speakers to one log file, and non-english speakers to a different log file.

SetEnvIf Accept-Language "en" english
CustomLog logs/english_log common env=english
CustomLog logs/non_english_log common env=!english

Although we have just shown that conditional logging is very powerful and flexible, it is not the only way to control the contents of the logs. Log files are more useful when they contain a complete record of server activity. It is often easier to simply post-process the log files to remove requests that you do not want to consider.

 

日志文件

要有效地管理Web服务器,就有必要反馈服务器的访问、性能以及出现的问题。Apache HTTP服务器提供了非常全面而灵活的事件记录功能。本文阐述如何配置和如何理解事件记录。

  • 安全警告
  • 出错日志
  • 访问日志
  • 日志的回卷
  • 管道日志
  • 虚拟主机
  • 其他日志文件
top

安全警告

因为,任何人只要对Apache存放日志文件的目录具有写权限,也就当然地可以获得通常是root的uid的权限,所以,绝对 不要 随意给予任何人存放日志文件目录的写权限。细节请参见security tips。

另外,日志文件可能会包含未加转换的来自用户的信息,用户就有机会恶意插入控制符,所以处理原始日志时应该注意这个问题。

top

出错日志

相关模块 相关指令  
  • ErrorLog
  • LogLevel

出错日志是最重要的日志文件,其文件名和位置取决于ErrorLog指令。Apache httpd将在这个文件中存放诊断信息和处理请求中出现的错误,由于这里经常包含了出错细节以及如何解决,如果服务器启动或运行中有问题,首先就应该查看这个出错日志。

出错日志通常被写入一个文件 (unix系统上一般是error_log,Windows and OS/2上一般是error.log)。在unix系统中,出错日志还可能被转向syslog,或者 通过管道操作转给一个程序。

出错日志的格式相对灵活,并可以附加文字描述。某些信息会出现在绝大多数记录中,一个典型的例子是:

[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

其中,第一项是事件发生的日期和时间;第二项是事件的严重性, LogLevel指令使只有高于指定严重性级别的事件才会被记录;第三项是产生事件的IP地址;此后是事件本身,在此例中,服务器拒绝了这个客户的访问。 服务器在记录被访问文件时,用的是文件系统路径,而不是Web路径。

出错日志中会包含类似上述例子的多种类型的信息。此外,CGI脚本中任何输出 stderr 的信息会作为调试信息原封不动地记录到出错文件。

用户可以增加或删除出错日志的项。但是,对某些特殊请求,在 访问日志中也会有相应的记录,比如,上述例子中,在访问日志中会有相应的记录,其状态码是403,因此,可以从访问日志中得到事件的更多资料。

在测试中,对任何问题持续监视出错文件是非常有用的。在unix系统中,可以这样做:

tail -f error_log

top

访问日志

相关模块 相关指令
  • mod_log_config
  • mod_setenvif
  • CustomLog
  • LogFormat
  • SetEnvIf

访问日志中会记录服务器所处理的所有请求,其文件名和位置取决于 CustomLog 指令,LogFormat指令可以简化日志的内容。这里阐述访问日志的服务器配置。

实施日志管理,首先当然必须产生访问日志,然后才能分析日志从而得到有用的统计信息。日志分析不是Web服务器的职责,已超出本文的范畴,更多资料和有关分析工具的信息,可以查看 Open Directory 和 Yahoo.

不同版本的Apache httpd用了不同的模块和指令来控制对访问的记录,包括mod_log_referer, mod_log_agent, 模块和 TransferLog 指令。现在,CustomLog 指令包含了旧版本中相关指令的所有功能。

访问日志的格式是高度灵活的,使用很象C的printf(1)函数的格式字符串,下面有几个例子,完整的说明可以查看 mod_log_config 和 格式字符串。

普通记录格式

这是一个典型的记录格式

LogFormat "%h %l %u %t /"%r/" %>s %b" common
CustomLog logs/access_log common

上述定义了一种特定的记录格式字符串,并给它起了个 别名common,其中的"%"指示服务器用某种信息替换,其他字符信息则不作替换。引号(")必须加转义符反斜杠,以避免被解释为字符串的结束。格式字符串还可以包含特殊控制符,如换行"/n"、制表符"/t"。

CustomLog指令建立一个新的使用指定记录格式的日志文件,除非其文件名以斜杠开头,否则其路径是一个相对于ServerRoot的相对路径。

上述配置是一种普通记录格式,被称为Common Log Format (CLF),它被许多不同的Web服务器所采用,并可以为许多日志分析程序所辩识,它产生的事件记录有如:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

记录的各部分说明如下。

127.0.0.1 (%h)
这是发送请求到服务器的客户的IP地址。如果 HostnameLookups 设为 On,则服务器会尝试解析这个IP地址的主机名,但是,并不推荐这样配置,因为会显著拖慢服务器,最好是用一个日志后续处理器还判断主机名,比如 logresolve。如果客户和服务器之间存在代理,那么记录中的这个IP地址就是那个代理,而不是客户面前的那个机器的IP地址。
- (%l)
这是由客户端 identd 判断的RFC 1413身份,输出中的符号 "-" 表示此处信息无效。除非在严格控制的内部网络中,此信息通常并不可靠,不应该被使用。只有在 IdentityCheck 设为 On时,Apache才会试图得到这项信息。
frank (%u)
这是由HTTP认证系统得到的访问该网页的客户名称,环境变量 REMOTE_USER 会被设为该值并提供给CGI脚本。如果状态码是401,表示客户没有通过认证,则此值没有意义。如果网页没有设置密码保护,则此项应该是"-"。
[10/Oct/2000:13:55:36 -0700] (%t)
这是服务器完成对请求的处理时的时间,其格式是:

[day/month/year:hour:minute:second zone]
day = 2*digit
month = 3*letter
year = 4*digit
hour = 2*digit
minute = 2*digit
second = 2*digit
zone = (`+' | `-') 4*digit

可以在格式字符串中使用 %{format}t 改变时间的输出形式,format与C标准库中的 strftime(3) 用法相同。
"GET /apache_pb.gif HTTP/1.0" (/"%r/")
引号中是客户发出的包含了许多有用信息的请求内容。可以看出,该客户的动作是GET,请求的资源是/apache_pb.gif,使用的协议是HTTP/1.0。另外,还可以记录其他信息,如:格式字符串 "%m %U%q %H" 会记录动作、路径、请求串、协议,结果其输出会和"%r" 一样。
200 (%>s)
这个是服务器返回给客户端的状态码。这个信息非常有价值,因为它指示了请求的结果,或者是被成功响应了(以2开头),或者被转向了(以3开头),或者出错了(以4开头),或者产生了服务器端错误(以5开头)。完整的状态码列表参见HTTP specification (RFC2616 section 10).
2326 (%b)
最后这项是返回给客户端的不包括响应头的字节数。如果没有信息返回,则此项应该是 "-",如果希望记录为 "0" 的形式,就应该用%B

组合记录格式

另一种常用的记录格式是组合记录格式(Combined Log Format),形如:

LogFormat "%h %l %u %t /"%r/" %>s %b /"%{Referer}i/" /"%{User-agent}i/"" combined
CustomLog log/acces_log combined

这种格式大致与普通日志格式相同,但是多了两个 %{header}i 项,其中的 header 可以是任何请求头。这种格式的记录形如:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

其中,多出来的项是:

"http://www.example.com/start.html" (/"%{Referer}i/")
此项指明了该请求是从被哪个网页提交过来的,这个网页应该包含有/apache_pb.gif或者其连接。
"Mozilla/4.08 [en] (Win98; I ;Nav)" (/"%{User-agent}i/")
此项是客户浏览器提供的浏览器识别信息。

多文件访问日志

可以简单地在配置文件中用多个 CustomLog 指令来建立多文件访问日志。如下例,既记录基本的CLF信息,又有提交网页和浏览器信息,最后两行 CustomLog 示范了如何模拟 ReferLog and AgentLog 指令的效果。

LogFormat "%h %l %u %t /"%r/" %>s %b" common
CustomLog logs/access_log common
CustomLog logs/referer_log "%{Referer}i -> %U"
CustomLog logs/agent_log "%{User-agent}i"

此例也说明了,记录格式可以直接由 CustomLog 指令来指令,而并不一定要用 LogFormat 指令来起一个别名。

有条件地记录日志

许多时候,使用 环境变量 排除某些客户请求会带来便利。首先,需要用SetEnvIf指令来标识符合某种条件的请求,然后用CustomLog 指令的env=从句,来包含或者排除被记录的请求。例如:

# Mark requests from the loop-back interface
SetEnvIf Remote_Addr "127/.0/.0/.1" dontlog
# Mark requests for the robots.txt file
SetEnvIf Request_URI "^/robots/.txt$" dontlog
# Log what remains
CustomLog logs/access_log common env=!dontlog

再例,记录使用英语的请求到一个日志,而记录非英语的请求到另一个日志:

SetEnvIf Accept-Language "en" english
CustomLog logs/english_log common env=english
CustomLog logs/non_english_log common env=!english

虽然上述已经展示了有条件日志记录的强大和灵活,但这不是控制日志内容的唯一手段,还可以用日志后处理程序来剔除你不关心的内容,而使日志更有用。

top

日志的回卷

即使一个不是很繁忙的服务器,日志文件的信息量也会很大,一般每10,000个请求,访问日志就会增加1MB或更多。这就有必要定期回卷日志文件。由于, Apache会保持该文件的打开,并持续写入信息,因此服务器运行时不能执行回卷操作,移动或者删除日志文件以后,必须 重新启动 服务器让它打开新的日志文件。

较温柔的 方法重新启动,可以使服务器启用新的日志文件,而不丢失原有的和尚未写入的信息。为此,有必要等待一点时间,让服务器在处理完毕正在处理的请求,并将记录写入到原来的日志文件。以下是一个典型的日志回卷和为节省存储空间压缩旧日志的例子:

mv access_log access_log.old
mv error_log error_log.old
apachectl graceful
sleep 600
gzip access_log.old error_log.old

另一种执行回卷的方法是使用下一节阐述的 管道日志。

top

管道日志

Apache httpd 可以通过管道将访问记录和出错信息转交给另一个进程,而不是写入一个文件,由于无须对主服务器编程,这个功能显著地增强了日志的灵活性。只要用管道操作符"|",后面跟可执行文件名,就可以使这个程序从标准输入设备获得事件记录。Apache在启动时,会同时启动这个管道日志进程,而且在运行中,如果这个进程崩溃了,会重新启动这个进程 (所以我们称这个技术为“可靠管道日志”)。

管道日志进程由其父进程Apache httpd 产生,并继承其用户权限,这意味着管道进程通常是作为root运行的,所以保持这个程序简单而安全极为重要。

一些使用管道日志的例子,其用法在访问日志和出错日志中都相同:

# compressed logs
CustomLog "|/usr/bin/gzip -c >> /var/log/access_log.gz" common
# almost-real-time name resolution
CustomLog "|/usr/local/apache/bin/logresolve >> /var/log/access_log" common

注意,调用管道程序的参数要用引号括起来。

管道日志的一种重要用途是,允许日志回卷而无须重新启动服务器,为此,Apache HTTP服务器提供了一个简单的程序 rotatelogs。每24小时回卷日志一次的例子如下:

CustomLog "|/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common

在其他站点,有一个类似而更灵活的日志回卷程序叫 cronolog。

如果有较简单的离线后处理日志方案,就不应该用有条件日志和管道日志,即使它们非常强大。

top

虚拟主机

如果一个服务器配有若干 虚拟主机,那么还有几个控制日志文件的功能。首先,可以把日志指令放在 <VirtualHost> 段以外,使它们使用与主服务器相同的对日志文件和出错文件的设置,但是这样就不能统计单个虚拟主机的信息了;而把 CustomLog 或者 ErrorLog 指令放在<VirtualHost>段以内,所有对这个虚拟主机的请求和错误信息会记录在其私有的文件中,这种方法对虚拟主机较少的服务器很有用,但对虚拟主机非常多的,就会带来管理困难的问题,还经常会产生 文件描述符短缺 的问题。

对于访问日志,有一个很好的折衷方案,在同一个日志文件中记录所有的事件,而每个事件都注明虚拟主机的信息,日后再把记录拆开存入不同的文件。例如:

LogFormat "%v %l %u %t /"%r/" %>s %b" comonvhost
CustomLog logs/access_log comonvhost

%v用来附加虚拟主机的信息。有个程序叫 split-logfile 可以用来把各虚拟主机的信息拆分存入不同的文件。

top

其他日志文件

相关模块 相关指令
  • mod_cgi
  • mod_rewrite
  • PidFile
  • RewriteLog
  • RewriteLogLevel
  • ScriptLog
  • ScriptLogBuffer
  • ScriptLogLength

PID 文件

Apache httpd 启动时,会把进程ID存入文件 logs/httpd.pid,其文件名可以用PidFile 指令改变。该进程ID可以被管理员利用来重新启动或者终止服务器后台监控程序。在Windows中,可以使用命令行参数 -k。更多资料请参见 停止和重新启动。

脚本日志

为了方便调试,可以用ScriptLog 指令来记录CGI脚本的输入和输出。此功能应该仅用于测试,而不应该用于日常工作的服务器。更多资料请参见 mod_cgi的文档。

重写日志

在使用强大且灵活的 mod_rewrite 时,几乎都有必要用 RewriteLog 来帮助调试。这个日志提供了重写引擎如何转换请求的详细分解信息,其详细程度取决于RewriteLogLevel 指令。