JAVA NIO 中文

来源:互联网 发布:网络推广培训总结 编辑:程序博客网 时间:2024/05/16 15:27
 

Chapter 1. Introduction
Get the facts first. You can distort them later.
—Mark Twain

第一章。介绍

首先取得事实。接着你就能歪曲他们。
--马克*太晚(太晚是哪个大爷,说得啥西?

这大爷叫马克吐温,意思是真理和事实有差距?这个是美式幽默,还是不太明白。

Let's talk about I/O. No, no, come back. It's not really all that dull. Input/output (I/O) is not
a glamorous topic, but it's a very important one. Most programmers think of I/O in the same
way they do about plumbing: undoubtedly essential, can't live without it, but it can be
unpleasant to deal with directly and may cause a big, stinky mess when not working properly.
This is not a book about plumbing, but in the pages that follow, you may learn how to make
your data flow a little more smoothly.

让我们讨论I/O。不,不,回来。它不是真的很无聊。输入/输出不是一个特别富有魅力的话题,但它是一个非常重要的话题。大部分程序员认为输入/输出就像自来水管一样:毫无疑问是基本的,没有就不能活,但是它会被让人不高兴地直接处理,然后导致一个大的,臭的杂乱,如果没有正确处理。这不是一本关于水管的书,但是接下来,你可以学习如何让您的数据流得更舒畅一些。

Object-oriented program design is all about encapsulation. Encapsulation is a good thing: it
partitions responsibility, hides implementation details, and promotes object reuse. This
partitioning and encapsulation tends to apply to programmers as well as programs. You may
be a highly skilled Java programmer, creating extremely sophisticated objects and doing
extraordinary things, and yet be almost entirely ignorant of some basic concepts underpinning
I/O on the Java platform. In this chapter, we'll momentarily violate your encapsulation and
take a look at some low-level I/O implementation details in the hope that you can better
orchestrate the multiple moving parts involved in any I/O operation.

面向对象程序设计全是关于封装的。封装是个好东西:它分割职责,隐藏实现,提高重用。分割责任和隐藏实现不仅仅被程序员使用,也被程序使用。您也许是一个高水平JAVA程序员,创建了极其专业的对象并完成了令人惊奇的事情,但是仍旧可能整个地忽略了一些JAVA平台的基础性的I/O概念。在本章,我们将会偶尔违反“封装”,看看一些底层I/O实现,希望您能更好地安排I/O操作中的多重的动态的各个部分。

1.1 I/O Versus CPU Time
Most programmers fancy themselves software artists, crafting clever routines to squeeze a few
bytes here, unrolling a loop there, or refactoring somewhere else to consolidate objects. While
those things are undoubtedly important, and often a lot of fun, the gains made by optimizing
code can be easily dwarfed by I/O inefficiencies. Performing I/O usually takes orders of
magnitude longer than performing in-memory processing tasks on the data. Many coders
concentrate on what their objects are doing to the data and pay little attention to the
environmental issues involved in acquiring and storing that data.

1.1 I/O 和CPU时间
很多程序员试图把程序做成艺术,或者精巧聪明地把日常程序挤压出一些字节,或者展开一个循环,或者重构代码使之巩固。这些事情无疑是重要的,而且常常很有趣,但得到的就像侏儒一样,如果I/O没有效率的话。I/O操作一贯比内存数据任务长巨多。很多程序员把注意力放在了对象如何处理数据上,而对包括在存取数据在内的环境问题很少考虑。

Table 1-1 lists some hypothetical times for performing a task on units of data read from and
written to disk. The first column lists the average time it takes to process one unit of data,
the second column is the amount of time it takes to move that unit of data from and to disk,
and the third column is the number of these units of data that can be processed per second.
The fourth column is the throughput increase that will result from varying the values in
the first two columns.

Table 1-1列出了,假设的,从磁盘读写一些数据的任务所需要的时间。第一列是处理一个单元数据的平均时间,第二列是从磁盘读写该数据的时间,第三列是每秒可以完成的任务(包括数据处理和磁盘读写),第四列是受第一列第二列的影响,各种不同条件下,完成任务的增长率


Table 1-1. Throughput rate, processing versus I/O time

Process time (ms)

I/O time (ms)

Throughput (units/sec)

Gain (%)

51009.52(benchmark)2.51009.762.4411009.93.9659010.5310.5357512.531.2555018.1890.915204032051066.67600    

The first three rows show how increasing the efficiency of the processing step affects
throughput. Cutting the per-unit processing time in half results only in a 2.2% increase in
throughput. On the other hand, reducing I/O latency by just 10% results in a 9.7% throughput
gain. Cutting I/O time in half nearly doubles throughput, which is not surprising when you see
that time spent per unit doing I/O is 20 times greater than processing time.

开始三行显示,处理数据能力的提高如何影响到吞吐量。把处理时间缩短一半仅仅提高了吞吐量2.2%。而减小I/O延迟10%,就能获得9.7%的吞吐量增加。减小I/O一半,吞吐量几乎提高一倍,其实不用惊奇,因为I/O的耗时是内存处理耗时的20倍。

These numbers are artificial and arbitrary (the real world is never so simple) but are intended
to illustrate the relative time magnitudes. As you can see, I/O is often the limiting factor in
application performance, not processing speed. Programmers love to tune their code, but I/O
performance tuning is often an afterthought, or is ignored entirely. It's a shame, because even
small investments in improving I/O performance can yield substantial dividends.

这些数据是人造的,随心所欲的(真实世界不会这么简单),但是试图分析相关数据的巨大差别。如您所看到的那样,I/O经常是应用性能的限制因素,而不是处理器。程序员喜欢摆弄代码,但是性能调优经常被排后了,或者根本被忽略了。这让人遗憾(shame?),因为I/O上小小的投资能带来巨大的红利。(财迷啊。。。)

1.2 No Longer CPU Bound
To some extent, Java programmers can be forgiven for their preoccupation with optimizing
processing efficiency and not paying much attention to I/O considerations. In the early days of
Java, the JVMs interpreted bytecodes with little or no runtime optimization. This meant that
Java programs tended to poke along(啥意思), running significantly slower than natively compiled code
and not putting much demand on the I/O subsystems of the operating system.

1.2 不再有CPU限制

在一定程度上, 可以谅解JAVA程序员,他们先入为主地优化处理效率而没有把注意力放在I/O上。因为开始的时候,JAVA虚拟机的字节码很少或没有运行时优化。 这意味着JAVA程序明显比本地编译的代码慢很多,对操作系统的I/O子系统的需求不是很多。

But tremendous strides have been made in runtime optimization. Current JVMs run bytecode
at speeds approaching that of natively compiled code, sometimes doing even better because of
dynamic runtime optimizations. This means that most Java applications are no longer CPU
bound (spending most of their time executing code) and are more frequently I/O bound
(waiting for data transfers).

但是在运行时优化已经完成了巨大的跨越。当前虚拟机运行字节码的速度速度和本地编译的代码接近了,有时候甚至更快,因为有动态运行时优化。这说明大部分JAVA程序不再有CPU约束了(就是大部分时间用在运行代码)而有了频繁的I/O约束(等待数据传输)

But in most cases, Java applications have not truly been I/O bound in the sense that the
operating system couldn't shuttle data fast enough to keep them busy. Instead, the JVMs have
not been doing I/O efficiently. There's an impedance mismatch between the operating system
and the Java stream-based I/O model. The operating system wants to move data in large
chunks (buffers), often with the assistance of hardware Direct Memory Access (DMA). The
I/O classes of the JVM like to operate on small pieces — single bytes, or lines of text. This
means that the operating system delivers buffers full of data that the stream classes of
java.io spend a lot of time breaking down into little pieces, often copying each piece
between several layers of objects. The operating system wants to deliver data by the
truckload. The java.io classes want to process data by the shovelful. NIO makes it easier to
back the truck right up to where you can make direct use of the data (a ByteBuffer object).

但在大部分情况下,JAVA应用没有真正遇到I/O瓶颈,因为操作系统不能产生足够导致程序忙的I/O压力。相反地,虚拟机也没有把I/O处理得很有效率。在操作系统和JAVA的I/O模型中存在一个障碍。操作系统是以大块(缓存)方式传输数据,在硬件DMA的支持下这很常见。而虚拟机的I/O类喜欢操作小片数据--一个字节,或若干行文本。这就是说操作系统分发的大块数据被java.io中的stream类打碎成小块,每一小块数据经常在不同层的对象之间拷贝。操作系统想一卡车一卡车处理数据,而java.io的类是一铲一铲处理数据。NIO让您轻松地回到卡车模式,那里可以直接使用数据(通过字节缓存对象)。

This is not to say that it was impossible to move large amounts of data with the traditional I/O
model — it certainly was (and still is). The RandomAccessFile class in particular can be quite
efficient if you stick to the array-based read( ) and write( ) methods. Even those methods
entail at least one buffer copy, but are pretty close to the underlying operating-system calls.

这不是说用以前的I/O模型移动大量数据是不可能的 — 它以前可以(现在仍旧可以。这个 RandomAccessFile 类,尤其能十分有效,如果您坚持使用基于数组的read()和write()方法。甚至这些方法哪怕entail一个缓存拷贝(是不是读一块数据?),但是和操作系统性能非常接近。

As illustrated by Table 1-1, if your code finds itself spending most of its time waiting for I/O,
it's time to consider improving I/O performance. Otherwise, your beautifully crafted code may
be idle most of the time.

就像Table 1-1所揭示的那样,如果您的代码花了很多时间在I/O上,那么就该考虑提高I/O性能。不然那些精巧的代码在大部分时间将无所事事。(感觉原著作者用词遣句很骄傲啊。。。)

1.3 Getting to the Good Stuff
Most of the development effort that goes into operating systems is targeted at improving I/O
performance. Lots of very smart people toil very long hours perfecting techniques for
schlepping data back and forth. Operating-system vendors expend vast amounts of time and
money seeking a competitive advantage by beating the other guys in this or that published
benchmark.

1.3 Getting to the Good Stuff
大部分操作系统的开发努力是针对提高I/O。为了完美地把数据带来带去,很多非常聪明的人们辛苦了很长时间。操作系统厂商花了大把时间和金钱去寻找竞争优势,通过发布各种的基准测试去打击他们的同行。

Today's operating systems are modern marvels of software engineering (OK, some are more
marvelous than others), but how can the Java programmer take advantage of all this wizardry
and still remain platform-independent? Ah, yet another example of the TANSTAAFL
principle.1

今天的操作系统是软件工程的奇迹(OK,一些比另外更神奇),但是JAVA程序员如何才能在得到这些好处的同时,仍旧保持平台独立?哦,这是另一个TANSTAAFL原则的例子(您不可能得到所有操作系统的每一个特性)。

The JVM is a double-edged sword. It provides a uniform operating environment that shelters
the Java programmer from most of the annoying differences between operating-system
environments. This makes it faster and easier to write code because platform-specific
idiosyncrasies are mostly hidden. But cloaking the specifics of the operating system means
that the jazzy, wiz-bang stuff is invisible too.

JAVA虚拟机是个双刃剑。它提供了统一的操作环境,遮蔽了大部分恼人的操作系统差别。这让程序编制更快更容易,因为平台相关的气质不见了。但是给操作系统披上斗篷意味着爵士舞 等好东西也看不见了。

What to do? If you're a developer, you could write some native code using the Java Native
Interface (JNI) to access the operating-system features directly. Doing so ties you to a specific
operating system (and maybe a specific version of that operating system) and exposes the
JVM to corruption or crashes if your native code is not 100% bug free. If you're an operatingsystem
vendor, you could write native code and ship it with your JVM implementation to
provide these features as a Java API. But doing so might violate the license you signed to
provide a conforming JVM. Sun took Microsoft to court about this over the JDirect package
which, of course, worked only on Microsoft systems. Or, as a last resort, you could turn to
another language to implement performance-critical applications.

做什么?如果你是一个开发人员,你可以通过JNI直接访问平台特有的特性。这样做让您绑定在一个特别的平台上(或者操作系统的某个版本),而且会让虚拟机崩溃或腐朽,如果你的平台相关代码不是100%没有bug。如果您是一个操作系统厂商,你写特定的代码,并且通过java api提高JAVA虚拟机的实现。但这么做,有可能违反您签署的许可,您提供一个符合规定的JVM。Sun因为JDIirect包,把微软带上了法庭。因为这个东东只能在微软的windows运行 。最后的措施,您可以选择别的语言去实现性能关键型应用。

The java.nio package provides new abstractions to address this problem. The Channel and
Selector classes in particular provide generic APIs to I/O services that were not reachable
prior to JDK 1.4. The TANSTAAFL principle still applies: you won't be able to access every
feature of every operating system, but these new classes provide a powerful new framework
that encompasses the high-performance I/O features commonly available on commercial
operating systems today. Additionally, a new Service Provider Interface (SPI) is provided in
java.nio.channels.spi that allows you to plug in new types of channels and selectors
without violating compliance with the specifications.

java.nio 包提供了新的抽象来对付这些问题。尤其是Channel和Selector类为I/O服务提供了通用的API,这些东东在JDK1.4之前是没有的。TANSTAAFL原则仍旧有效:您不可能得到每个操作系统的每个特性,但是这些新的类提供了新的强有力框架,包括了现在的商业操作系统的通常高性能的I/O特性,而且一个新的服务提供接口,允许您插入新的channel和selector,而不会违反规格。

With the addition of NIO, Java is ready for serious business, entertainment, scientific and
academic applications in which high-performance I/O is essential.

有了NIO的加盟,Java为一系列需要高性能I/O的商业,娱乐,科学和学术应用做好了基础。

The JDK 1.4 release contains many other significant improvements in addition to NIO. As of
1.4, the Java platform has reached a high level of maturity, and there are few application areas
remaining that Java cannot tackle. A great guide to the full spectrum of JDK features in 1.4 is
Java In A Nutshell, Fourth Edition by David Flanagan (O'Reilly).

JDK1.4发行版还包括了除NIO之外的很多有意义的提高。到了1.4,Java平台已经到了一个很高的成熟度。很少有Java不能处理的应用领域。 包括1.4所有特性的各层次的伟大指导是《Java In A Nutshell, FourthEdition by David Flanagan (O'Reilly).》

1.4 I/O Concepts
The Java platform provides a rich set of I/O metaphors. Some of these metaphors are more
abstract than others. With all abstractions, the further you get from hard, cold reality,
the tougher it becomes to connect cause and effect. The NIO packages of JDK 1.4 introduce
a new set of abstractions for doing I/O. Unlike previous packages, these are focused on
shortening the distance between abstraction and reality. The NIO abstractions have very real
and direct interactions with real-world entities. Understanding these new abstractions and, just
as importantly, the I/O services they interact with, is key to making the most of I/O-intensive
Java applications.

1.4 I/O 概念
Java 平台提供了丰富的I/O暗喻(暗喻?晕倒啊,啥意思?)。其中一些比另外一些更加抽象。所有这些抽象,如果您对冰硬的现实知道得越多,您就会发现原因和结果之间的联系越紧密。1.4NIO包引入了一个新的抽象集来处理I/O。和以前的包不一样,以前的包关注如何缩短抽象和实际之间的距离。NIO抽象则和真实世界的东东(entities)直接打交道。理解这些新的抽象,还有一样重要的它们与之打交道的I/O服务,是大部分I/O敏感应用的关键。

This book assumes that you are familiar with basic I/O concepts. This section provides
a whirlwind review of some basic ideas just to lay the groundwork for the discussion of how
the new NIO classes operate. These classes model I/O functions, so it's necessary to grasp
how things work at the operating-system level to understand the new I/O paradigms.

本书假定您熟悉基本I/O概念。这一节提供了旋风一样的回顾,主要为讨论新的NIO操作,罗列一下基础工作中的基本观念。这些类把I/O功能模型化了,所以必须领悟在操作系统层面如何工作的,才能理解新的I/O样式。

In the main body of this book, it's important to understand the following topics:
• Buffer handling
• Kernel versus user space
• Virtual memory
• Paging
• File-oriented versus stream I/O
• Multiplexed I/O (readiness selection)

在本书大部分中,理解下列概念很要紧:
• Buffer handling,缓存处理
• Kernel versus user space,核心和用户空间
• Virtual memory,虚拟内存
• Paging,页交换
• File-oriented versus stream I/O,文件模式和流模式
• Multiplexed I/O (readiness selection)多路模式I/O

1.4.1 Buffer Handling
Buffers, and how buffers are handled, are the basis of all I/O. The very term "input/output"
means nothing more than moving data in and out of buffers.

1.4.1 缓存处理
缓存,和缓存如何处理,是I/O的基础。这个特别的短语"input/output"除了说把数据从缓存里弄进弄出,并无他意。

Processes perform I/O by requesting of the operating system that data be drained from
a buffer (write) or that a buffer be filled with data (read). That's really all it boils down to. All
data moves in or out of a process by this mechanism. The machinery inside the operating
system that performs these transfers can be incredibly complex, but conceptually, it's very
straightforward.

处理器执行I/O操作,是通过请求操作系统从缓存中排出(写)和充满(读)数据实现的。所有的事情归结为这点。所有通过处理器的数据都是这个机制。操作系统里的处理这些传输的机制可能会难以置信地复杂,但是,从概念上说,它是非常直接了当的。

Figure 1-1 shows a simplified logical diagram of how block data moves from an external
source, such as a disk, to a memory area inside a running process. The process requests that
its buffer be filled by making the read( ) system call. This results in the kernel issuing
a command to the disk controller hardware to fetch the data from disk. The disk controller
writes the data directly into a kernel memory buffer by DMA without further assistance from
the main CPU. Once the disk controller finishes filling the buffer, the kernel copies the data
from the temporary buffer in kernel space to the buffer specified by the process when it
requested the read( ) operation.

图1-1是一个简单的逻辑示意图,说明数据如何从一个外部源,比如磁盘,到达一个运行中进程里的内存里去。进程调用系统的read()方法,使它的缓存里装满数据。这导致内核发出一个命令给硬件控制器去磁盘读数据。磁盘控制器通过DMA把数据直接写到了核心的内存区中没有经过CPU。一旦磁盘控制器写完了缓存,内核再把它的临时缓存中的数据拷贝到进程发起read()请求时,指定的缓存当中。

Figure 1-1. Simplified I/O buffer handling


This obviously glosses over a lot of details, but it shows the basic steps involved.

这个显而易见的注解覆盖了很多细节,但是它给出了基本的步骤。

Note the concepts of user space and kernel space in Figure 1-1. User space is where regular
processes live. The JVM is a regular process and dwells in user space. User space is
a nonprivileged area: code executing there cannot directly access hardware devices, for
example. Kernel space is where the operating system lives. Kernel code has special privileges:
it can communicate with device controllers, manipulate the state of processes in user space,
etc. Most importantly, all I/O flows through kernel space, either directly (as decsribed here) or
indirectly (see Section 1.4.2).

注意图1-1.中的用户空间概念和内核空间概念。用户空间是规则进程所在的地方。JVM是一个规则进程(regular process是啥东东),住在用户空间。用户空间是一个没有被授权的地方:比如在那里执行的代码不能直接访问硬件。内核空间是操作系统所在的地方。内核代码拥有特别权限:它可以和设备控制器通信,操作用户空间进程的状态等。更重要的是,所有I/O全部经过内核空间,不管是直接的(如本节所述)还是间接的(见1.4.2)

When a process requests an I/O operation, it performs a system call, sometimes known as
a trap, which transfers control into the kernel. The low-level open( ), read( ), write( ), and
close( ) functions so familiar to C/C++ coders do nothing more than set up and perform the
appropriate system calls. When the kernel is called in this way, it takes whatever steps are
necessary to find the data the process is requesting and transfer it into the specified buffer in
user space. The kernel tries to cache and/or prefetch data, so the data being requested by the
process may already be available in kernel space. If so, the data requested by the process is
copied out. If the data isn't available, the process is suspended while the kernel goes about
bringing the data into memory.

当一个进程需要I/O操作,它调用系统命令,有时被叫做陷阱,被用来把控制权交给内核。c/c++程序员熟悉的open(),read(),write(),close(),无非是调用合适的系统命令。当内核这样被调用时,它采取必要的步骤找到进程所需数据并传输到指定的用户空间的缓存中。内核试图缓存或者预先取数据,所以进程所需要的数据可能已经在内核空间中存在了。如果这样,进程所需的数据就被拷贝出来。如果不是,进程挂起,而内核出去把数据拿到内存里。

Looking at Figure 1-1, it's probably occurred to you that copying from kernel space to the
final user buffer seems like extra work. Why not tell the disk controller to send it directly to
the buffer in user space? There are a couple of problems with this. First, hardware is usually
not able to access user space directly.2 Second, block-oriented hardware devices such as disk
controllers operate on fixed-size data blocks. The user process may be requesting an oddly
sized or misaligned chunk of data. The kernel plays the role of intermediary, breaking down
and reassembling data as it moves between user space and storage devices.

看图 Figure 1-1,或许您觉得从内核空间拷贝数据到用户空间是多余的。为啥不让磁盘控制器直接送数据到用户空间?这里有两个问题。首先,硬件通常对用户空间是不开放的。第二磁盘控制器等块设备操作固定长度的数据。而用户进程或许会要奇怪尺寸的数据或者错位的大块数据。内核扮演中间较色,在用户空间和外部存储之间打碎和重组数据。

1.4.1.1 Scatter/gather
Many operating systems can make the assembly/disassembly process even more efficient. The
notion of scatter/gather allows a process to pass a list of buffer addresses to the operating
system in one system call. The kernel can then fill or drain the multiple buffers in sequence,
scattering the data to multiple user space buffers on a read, or gathering from several buffers
on a write (Figure 1-2).

1.4.1.1 Scatter/gather
很多操作系统能把装/拆过程做得甚至更有效率。scatter/gather短语意思是允许一个进程在一个系统调用就可传递一列缓存给操作系统。内核然后依次注满或者排空这些缓存,一次读就给若干个用户空间给数据,或者一次写时,收集很多缓存。(Figure 1-2).

Figure 1-2. A scattering read to three buffers

This saves the user process from making several system calls (which can be expensive) and
allows the kernel to optimize handling of the data because it has information about the total
transfer. If multiple CPUs are available, it may even be possible to fill or drain several buffers
simultaneously.

这节约了若干次系统调用(很昂贵)而且允许内核优化数据处理,因为它现在拥有了所有的传输信息。如果有多个CPU,它还能同时处理多个缓存。

1.4.2 Virtual Memory
All modern operating systems make use of virtual memory. Virtual memory means that
artificial, or virtual, addresses are used in place of physical (hardware RAM) memory
addresses. This provides many advantages, which fall into two basic categories:
1. More than one virtual address can refer to the same physical memory location.
2. A virtual memory space can be larger than the actual hardware memory available.

所有现代操作系统使用虚拟内存。虚拟内存就是人为的虚拟的物理内存地址。这带来很多好处,可分成两方面:
1. 多个虚拟地址可以指向同一物理地址。
2. 虚拟内存空间可以比实际的物理内存大。

The previous section said that device controllers cannot do DMA directly into user space, but
the same effect is achievable by exploiting item 1 above. By mapping a kernel space address
to the same physical address as a virtual address in user space, the DMA hardware (which can
access only physical memory addresses) can fill a buffer that is simultaneously visible to both
the kernel and a user space process. (See Figure 1-3.)

上一节说外设不能直接向用户空间做DMA。但是由上面第一天竟然可得到这个效果。通过映射一个内核地址,它在用户空间是虚拟地址,这俩同一物理地址,DMA设备(可访问物理内存)能让一块缓存同时对内核和用户空间的进程可见。 (See Figure 1-3.)

Figure 1-3. Multiply mapped memory space

This is great because it eliminates copies between kernel and user space, but requires the
kernel and user buffers to share the same page alignment. Buffers must also be a multiple of
the block size used by the disk controller (usually 512 byte disk sectors). Operating systems
divide their memory address spaces into pages, which are fixed-size groups of bytes. These
memory pages are always multiples of the disk block size and are usually powers of 2 (which
simplifies addressing). Typical memory page sizes are 1,024, 2,048, and 4,096 bytes. The
virtual and physical memory page sizes are always the same. Figure 1-4 shows how virtual
memory pages from multiple virtual address spaces can be mapped to physical memory.

这很好因为这消除了内核和用户空间之间的拷贝,但是需要内核和用户缓存共享同样的页面对齐。缓存必须是磁盘块大小的整数倍(通常512字节每扇区)。操作系统吧内存地址空间分成了很多页,页是固定大小的字节数。这些内存页始终是磁盘块大小的整数倍,而且常常是2的指数。典型的内存页大小是1024,2408,4096字节。这些虚拟和物理的内存页大小始终一样。Figure 1-4显示了来自多个虚拟地址的虚拟内存如何映射到物理内存。

Figure 1-4. Memory pages



1.4.3 Memory Paging
To support the second attribute of virtual memory (having an addressable space larger than
physical memory), it's necessary to do virtual memory paging (often referred to as swapping,
though true swapping is done at the process level, not the page level). This is a scheme
whereby the pages of a virtual memory space can be persisted to external disk storage to make
room in physical memory for other virtual pages. Essentially, physical memory acts as a
cache for a paging area, which is the space on disk where the content of memory pages is
stored when forced out of physical memory.

1.4.3 Memory Paging
为了支持虚拟内存的第二个属性(比物理内存更大的空间),必须做内存分页(通常指交换,尽管真的交换是在进程层面完成而不是页面层)。这是一个让虚拟内存空间能持久化到外部磁盘的计划,这样就可以吧物理内存空出来给别的虚拟内存。基本上,物理内存充当了页面的缓冲,那些页面是不得不退出物理内存时用来存放内存数据的磁盘空间。

Figure 1-5 shows virtual pages belonging to four processes, each with its own virtual memory
space. Two of the five pages for Process A are loaded into memory; the others are stored on
disk.

Figure 1-5 中虚拟页属于四个进程,每个都有自己的虚拟内存空间。进程A有5页,其中两页载入内存,其余在磁盘中。

Figure 1-5. Physical memory as a paging-area cache

Aligning memory page sizes as multiples of the disk block size allows the kernel to issue
direct commands to the disk controller hardware to write memory pages to disk or reload
them when needed. It turns out that all disk I/O is done at the page level. This is the only way
data ever moves between disk and physical memory in modern, paged operating systems.

内存页大小是磁盘块的整数倍使得内核能直接发命令给磁盘控制器,把内存写到磁盘或者在需要时候重新载入。这说明所有磁盘I/O都是页面水平完成的。这是现代分页操作系统在磁盘和物理内存之间移动数据的唯一方式。

Modern CPUs contain a subsystem known as the Memory Management Unit (MMU). This
device logically sits between the CPU and physical memory. It contains the mapping
information needed to translate virtual addresses to physical memory addresses. When
the CPU references a memory location, the MMU determines which page the location resides
in (usually by shifting or masking the bits of the address value) and translates that virtual page
number to a physical page number (this is done in hardware and is extremely fast). If there is
no mapping currently in effect between that virtual page and a physical memory page, the
MMU raises a page fault to the CPU.

现代CPU包含一个叫MMU的子系统。这个设备逻辑上存在于CPU和物理内存之间。它包含这把虚拟内存映射到物理内存的信息。当CPU指向某个内存时,MMU决定这个地方在哪一页(通常通过移动或者屏蔽等位操作)并把虚拟页数翻译成物理页数(这通过硬件完成巨快)。如果当前无有效映射,MMU会抛出一个错误给CPU。

A page fault results in a trap, similar to a system call, which vectors control into the kernel
along with information about which virtual address caused the fault. The kernel then takes
steps to validate the page. The kernel will schedule a pagein operation to read the content of
the missing page back into physical memory. This often results in another page being stolen
to make room for the incoming page. In such a case, if the stolen page is dirty (changed since
its creation or last pagein) a pageout must first be done to copy the stolen page content to
the paging area on disk

一个页错误导致一个陷阱,类似系统调用,把控制权转给内核,还有导致错误的虚拟地址。内核采取步骤确认页面。内核会计划一个读页操作把没找到的页面重新读入物理内存。这常常会引起另外一个页面被偷走,因为要让地方。在这种情况下,如果被偷的页面是脏的(在创建后改变了或者上次载入后改变了),那么首先还需完成写页操作,以把被偷的脏页面数据写回到磁盘。

原创粉丝点击