虚拟内存（一）

来源：互联网发布：海岛奇兵烟雾升级数据编辑：程序博客网时间：2024/06/05 07:15

虚拟内存（一）

译自美国乔治梅森大学计算机科学系网上教程《Virtual Memory Module》

作者:Jill Bobbin and Priscilla McAndrews

刘建文略译（http://blog.csdn.net/keminlau）

KEY：虚拟内存

引子

Virtual memory was invented in 1959 to hide the memory hierarchy and significantly simplify programming. Now so common that no one pays much attention to it, virtual memory is one of the great engineering triumphs（凯旋, 欢欣, 胜利） of the computer age.

虚拟内存于1959年被发明用来隐藏计算机存储系统的分层结构，从而极大地简化后续编程任务。现今虚拟内存如此的常见，我们很少有人注意它的存在，深入了解它的原理。虚拟内存是计算机时代的一项伟大的工程发明。

SUBWAY MAP

Why Do We Need Virtual Memory?

Storage allocation has always been an important consideration in computer programming due to the high cost of main memory and the relative abundance and lower cost of secondary storage. Program code and data required for execution of a process must reside in main memory to be executed, but main memory may not be large enough to accomodate the needs of an entire process. Early computer programmers divided programs into sections that were transferred into main memory for a period of processing time. As the program proceeded, new sections moved into main memory and replaced sections that were not needed at that time. In this early era of computing, the programmer was responsible for devising this overlay system.

由于主存储器总是相对稀有和昂贵，而辅存则相对充足（abundance）和廉价，如何更好利用有限的主存是一直备受重视的问题。程序代码和数据必须被放入主存才能够运行，但实际主存容量很可能不足以容纳（accomodate ）程序运行所需要的存储需求量（KEMIN：现在的程序在运行时需要多少存储空间是多半动态变化的，这是存储管理系统难设计的一大原因）。早期存储管理系统的设计是为了解决存储的可用性问题，而不是现在所了解的主要是存储效率问题。程序员必须负责开发一种前内存管理系统——覆盖系统（overlay system）来解决存储器可用性问题。因为那个时候操作系统还没有出现，程序员通过显式编码来实现overlay system。程序员把程序划分为不同的段（sections ），每个段都会被整个载入主存执行一段时间。随着程序的运行推进，新的程序段会被载入，替换已经不需要的程序段。

As higher level languages became popular for writing more complex programs and the programmer became less familiar with the machine, the efficiency of complex programs suffered from poor overlay systems. The problem of storage allocation became more complex.

到了五十年代中期，高级语言被引入后，程序员被建议专注于解题，不必太关注内存管理的细节。但是随着程序的复杂度的增长，内存管理的难度跟着增长。在五十年代后期，不加设计的overlay system已经严重制约着程序的性能，成为一个急待解决的问题。而在高级语言庇护（shielded）下的程序员们已经被难被说服回到低级语言，进行对内存管理的优化。这种情况一直拖延到大容量内存的出现。

Two theories for solving the problem of ineffiecient memory management emerged -- static and dynamic allocation. Static allocation assumes that the availability of memory resources and the memory reference string of a program can be predicted. Dynamic allocation relies on memory usage increasing and decreasing with actual program needs, not on predicting memory needs.

解决低效内存管理的两大理论被提出——静态分配和动态分配。静态分配假设内存资源的可用性和程序的内存访问行为（这个行为被抽象为一个叫memory reference string的概念）都可以被预测到的。动态分配则根据程序的实际需要进行内存分配，不对内存需要进行预测。

什么是存储分配问题？存储分配问题的本质是什么？

These two approaches differ on their assumptions about the most fundamental aspect of the storage allocation problem, prediction, both (1) of the availability of memory resources, and (2) of certain properties of a program's "reference string," i.e. its sequence of references to information.

解决存储分配问题就是

第一，对内存资源的可用性的预言（prediction）；

第二，对程序的reference string的某些性质（比如对信息的引用序列）的预言。

KEMIN：现代的应用开发人员可能很难理解什么是对存储分配的预言，因为高级语言给他们的全是逻辑的东西，所有分配细节给交给了编译器。另一个原因是这个预言很难想像具有可能性，因为要在程序运行前为其分配好所有的存储需要，这可能吗？请看Peter J. Denning 1970年发表的论文《Virtual Memory》

reference string

The reference string is the series of virtual page references generated during the execution of a program. The reference string is represented by w = r(1) r(2) r(3) ... r(T), where r(T) is the virtual page in memory and T is the virtual processing time for a program [D3]. The reference string is used in

* observing program behavior
* determining lifetime curve of a program for various replacement policies.

reference string译为中文是“引用字串”，它指的是程序运行过程中产生的对虚拟内存页的引用的序列。用公式表达：w = r(1) r(2) r(3) ... r(T)，其中，r(T)代表虚页，T代表处理时间。reference string可以用来：

* 分析观察（observing ）程序行为；
* 描绘程序在使用不同的替换策略时的生命曲线（ lifetime curve ）。

Program objectives and machine advancements in the '60s made the predictions required for static allocation difficult, if not impossible. Therefore, the dynamic allocation solution was generally accepted, but opinions about implementation were still divided. One group believed the programmer should continue to be responsible for storage allocation, which would be accomplished by system calls to allocate or deallocate memory. The second group supported automatic storage allocation performed by the operating system, because of increasing complexity of storage allocation and emerging importance of multiprogramming. In 1961, two groups proposed a one-level memory store. One proposal(提议, 求婚, 计划 ) called for a very large main memory to alleviate(减轻, 使缓和 ) any need for storage allocation. This solution was not possible due to very high cost. The second proposal is known as virtual memory . [D1]

进入六十年代后，随着计算机应用的转移（比如从计算集中转向数据处理集中）和计算机技术的发展，静态分配不能叫不可能，但也变得非常困难了。动态分配思想被普通接受。不过在动态分配支持者的阵营里，对于具体实现的方法还是存在分歧。一种意见认为程序员必须继续负责存储分配，通过操作系统提供的系统调用（system calls），因为程序员本身对程序的行为（比如算法）是很了解的，应该尽量通过手工存储分配提高程序的运行效率；另一种意见则认为存储分配越来越复杂，再加上多道程序是主流，手工分配的成本过高了，操作系统必须提供完全自动的存储分配机制。存储分配的问题主要来源于可编程的主存太小，1961年两组人提出过一些解决提议（proposal），其中一个提议引入大容量主存，但由于对当时来说成本过高而没有实现；另一个提议就是有名的虚拟内存。

Definition

Virtual memory is a technique that allows processes that may not be entirely in the memory to execute by means of automatic storage allocation upon request. The term virtual memory refers to the abstraction of separating LOGICAL memory--memory as seen by the process--from PHYSICAL memory--memory as seen by the processor. Because of this separation, the programmer needs to be aware of only the logical memory space while the operating system maintains two or more levels of physical memory space.

虚拟内存是一种通过利用按需要自动分配存储空间的方法来让进程不必整个装入主存就可以运行的技术。术语虚拟内存意味着分隔物理内存（处理器可见内存）与逻辑内存（进程可见内存）。有了这种分隔，程序员编程时只需关心逻辑存储空间，操作系统负责实际的物理存储空间的使用。

The virtual memory abstraction is implemented by using secondary storage to augment the processor's main memory. Data is transferred from secondary to main storage as and when necessary and the data replaced is written back to the secondary storage according to a predetermined replacement algorithm. If the data swapped is designated a fixed size, this swapping is called paging; if variable sizes are permitted and the data is split along logical lines such as subroutines or matrices, it is called segmentation. Some operating systems combine segmentation and paging. [D1] [S2]

虚拟内存技术是利用辅存来对（处理器的）主存进行扩容的。数据是何时何处以何种方式进出主存是由替换算法决定；数据的大小是固定的叫分页式虚存，数据大小不固定（数据有大小可能根据逻辑线划分，比如子程序大小或一个矩阵的大小）的叫分段式虚存。一些操作系统会结合两种方式实现虚拟内存。

The diagram illustrates that a program generated address ( 1 ) or "logical address" consisting of a logical page number plus the location within that page (x) must be interpreted or "mapped" onto an actual (physical) main memory address by the operating system using an address translation function or mapper ( 2 ). If the page is present in the main memory, the mapper substitutes the physical page frame number for the logical number ( 3 ). If the mapper detects that the page requested is not present in main memory, a fault occurs and the page must be read into a frame in main memory from secondary storage ( 4 , 5 ). [D2] pp. 161-165

上图展示了虚拟内存的基本原理：

程序产生的一个逻辑地址引用，逻辑地址由逻辑页号加上页内位置（location ）X组成；
操作系统的地址映射机构（mapper）负责对逻辑地址进行转换，
1. 如果所需的页面在主存，mapper用物理页号（叫页帧page frame）替换逻辑页号；
2. 如果所需的页面不在主存，也就是产生缺页事件，mapper必须先处理缺页事件，把所需的页面从辅存读入主存一个页帧。

What do you think are the main considerations in implementing this virtual memory system?

从上面的基本原理来看，你认为要实现虚拟内存系统最主要问题或任务是什么？