4、操作系统对计算机硬件系统由哪几部分组成资源的管理主要分为CPU 管理、内存管理和磁盘管理。请阐

[翻译]《深入解析windows操作系统第6版下册》第10章:内存管理(第一部分)
前段时间买了一本中文版的《深入解析windows操作系统第6版上册》,价格不菲,128米,但是翻译的质量还是比较高,作者是大名鼎鼎的windows内核先驱Mark Russ,以及另外2位windows架构大师,译者也是有名的潘大(《windows内核原理与实现》一书作者)和范大;
在上册中文版的最后提到,下册中文版将于年内(当时是2014年)推出.可惜过了一年多,迟迟不见问市,
电子工业出版社网站也没有相关信息(只能搜索到上册中文版);
于是突发奇想自行翻译英文原版下册.我在网上找到了英文原版下册的PDF版,最近在尝试翻译第10章: 内存管理,由于之前学习了一些相关的知识,因此翻译起来还不算太吃力。有兴趣的朋友可以一起来翻译下册(借助搜索引擎可以找到下载资源,51cto以及csdn社区都有),提出你想要翻译的章节,然后在这个子板块发表。
众所周知,该系列书籍是探索 windows 内部机理的首选材料,我们可以在翻译的过程中学到很多知识,同时提高自身的水平,遇到翻译困难的部分也可以互相交流学习;
本帖为原创翻译,采用中英对照,译注是自行添加的说明,内容会持续更新,由于工作的关系,翻译速度大约是一天1~2页原文,本帖仅翻译下册第10章(内容较多,可能会发表后续部分),上册的所有章节已经有中文版实体书和PDF版,可以搜索相关信息;个人所学有限,译文的错误之处还请提出指正,不胜感激。第一部分译文预计翻译的内容如下:
此外,考虑到帖子中,能够作为附件插入的图片有其数量上限,所有图片都外链到我的51cto博客(http://shayi1983./)的相应博文中,它们实际上是在那里被上传的,因此特定图片带有水印,并且原始博文的更新进度会比这里的慢一些,希望各位别介意。
下面是原文+译文
CHAPTER 10 Memory Management
In this chapter, you’ll learn how Windows implements virtual memory and how it manages the subset of virtual memory kept in physical memory. We’ll also describe the internal structure and components that make up the memory manager, including key data structures and algorithms. Before examining these mechanisms, we’ll review the basic services provided by the memory manager and key concepts such as reserved memory versus committed memory and shared memory.
第10章:内存管理
在本章中,你将学习windows如何实现虚拟内存,以及如何管理驻留在物理内存中的虚拟内存子集。我们也会描述组成“windows内存管理器”的内部结构和组件,包括关键数据结构和算法。
在考察这些机制之前,我们先回顾一下内存管理器提供的基础服务,以及诸如 reserved memory , committed memory ,shared memory 这些重要概念。
Introduction to the Memory Manager
By default, the virtual size of a process on 32-bit Windows is 2 GB. If the image is marked specifically as large address space aware, and the system is booted with a special option (described later in this chapter), a 32-bit process can grow to be 3 GB on 32-bit Windows and to 4 GB on 64-bit Windows.
The process virtual address space size on 64-bit Windows is 7,152 GB on IA64 systems and 8,192 GB on x64 systems. (This value could be increased in future releases.)
windows内存管理器简介
默认情况下,32位 windows上的一个进程的虚拟大小(地址空间)为2GB,如果该进程对应的二进制映像文件被特别标注了“large address space aware”(察觉到大地址空间),并且系统以特殊选项引导(本章稍后讨论),那么 32 位进程在 32 位 windows上的虚拟大小可达到 3GB;在64 位 windows 上的虚拟大小可达到 4GB;运行在 Intel IA-64 体系结构的 64 位 windows 上的 64 位进程的虚拟地址空间大小为 7152GB;在 x64 体系结构上则为 8192 GB;(在处理器硬件和操作系统软件的后续发布版中,这些值可能会增大)
(译注:用 visual studio 打开任意解决方案或工程文件,打开要进行编译的 .cpp 或其它格式的源文件,从主菜单中选择 “Project” -& “xxx Prooerties”,xxx是你的软件项目名称,将打开如下图所示的属性配置界面,从右侧的树型结构中选择“Linker” -&“System”,然后就可以配置链接器在生成该二进制文件时,标记启用大地址空间)
As you saw in Chapter 2, “System Architecture,” in Part 1 (specifically in Table 2-2), the maximum amount of physical memory currently supported by Windows ranges from 2 GB to 2,048 GB, depending on which version and edition of Windows you are running. Because the virtual address space
might be larger or smaller than the physical memory on the machine, the memory manager has two primary tasks:
■ Translating, or mapping, a process’s virtual address space into physical memory so that when a thread running in the context of that process reads or writes to the virtual address space, the correct physical address is referenced. (The subset of a process’s virtual address space that is
physically resident is called the working set. Working sets are described in more detail later in this chapter.)
■ Paging some of the contents of memory to disk when it becomes overcommitted—that is, when running threads or system code try to use more physical memory than is currently available— and bringing the contents back into physical memory when needed.
在本书上册第二章“系统架构”中的表2-2提到,windows 当前支持的最大物理内存从 2 GB 到 2048 GB 不等,这取决于你运行的 windows版本和位数;由于虚拟地址空间可能比机器上安装的物理内存总量要大,也有可能比它小;
(译注:在第三部分译文中会讲到:在写作原文的时间点上,Intel 与 AMD 的 x64 体系结构仅使用了 64 位地址总线中的 48 位,因此 x64 处理器当前仅支持最多 256 TB 的虚拟内存—— 2 的 48 幂;而 64 位的 Windows 在此基础上,进一步限制了只能使用 16 TB 虚拟内存——用户空间与内核空间各占 8 TB,如前文所述。在这种情况下,64 位 Windows 支持的虚拟内存上限,即
16 TB——就比它支持的物理内存上限—— 2 TB—— 还要大;另外一种情况,即虚拟内存上限小于物理内存上限,大家应该都非常熟悉了——在你的 32 位 Windows 上,没有启用 PAE 和前述的“large address space aware”,那么无论你购置了多少 RAM 条,进程的虚拟内存上限顶多就 2 GB)
因此内存管理器的二个主要任务为:
■ 将一个进程的虚拟地址空间翻译或映射成物理内存,从而使运行在该进程上下文中的线程读写虚拟地址空间时,能够引用正确的物理地址。
( windows 将驻留在物理内存中的进程虚拟地址空间子集称为“工作集”,它的更多细节将在本章后面描述)
■ 当物理内存过载时(例如,当运行的线程或内核代码尝试请求比当前可用内存更多的时候),将其中部分内容换出至磁盘,以及在需要时将这些内容换回(入)物理内存。
In addition to providing virtual memory management, the memory manager provides a core set of services on which the various Windows environment subsystems are built. These services include memory mapped files (internally called section objects), copy-on-write memory, and support for applications using large, sparse address spaces. In addition, the memory manager provides a way for a process to allocate and use larger amounts of physical memory than can be mapped into the process virtual address space at one time (for example, on 32-bit systems with more than 3 GB of physical memory). This is explained in the section “Address Windowing Extensions” later in this chapter.
除了提供虚拟内存管理服务外,内存管理器还为构建在它上层的各种 windows 环境子系统提供一组核心服务;这些服务包括内存映射文件(在 windows 中叫做 section 对象 ),写时复制内存,以及支持应用程序使用大规模,稀疏(非连续)的地址空间。此外,内存管理器提供了一种方法,给进程一次性分配和使用大量的物理内存,可以超过映射入进程虚拟地址空间的上限(例如,在 32 位系统上,可以分配多于3GB 的物理内存,这就超过了进程的 3GB 虚拟内存上限)。在本章后面的“Address Windowing Extensions”(地址窗口扩展?)部分,将对其进行解释。
There is a Control Panel applet that provides control over the size, number, and locations of the paging files, and its nomenclature suggests that “virtual memory” is the same thing as the paging file. This is not the case. The paging file is only one aspect of virtual memory. In fact, even if you run with no page file at all, Windows will still be using virtual memory. This distinction is explained in more detail later in this chapter.
在控制面板中有一项提供对页面文件的大小,数量,以及位置的控制(译注:即“系统”选项卡-& 高级系统设置 -&高级选项卡 -& 单击性能栏目的“设置”,再切换到“高级选项卡”,单击“虚拟内存”栏目的“更改”);其命名法指出“virtual memory”(虚拟内存)与分页文件是相同的概念;情况并非如此。分页文件仅仅是虚拟内存的一个方面(或子集)。实际上,即便你设置成完全不使用页面文件,windows 将依旧使用虚拟内存。本章后面将详细解释这之间的区别。
Memory Manager Components
The memory manager is part of the Windows executive and therefore exists in the file Ntoskrnl.exe.
No parts of the memory manager exist in the HAL. The memory manager consists of the following components:
■ A set of executive system services for allocating, deallocating, and managing virtual memory,most of which are exposed through the Windows API or kernel-mode device driver interfaces
■ A translation-not-valid and access fault trap handler for resolving hardware-detected memory management exceptions and making virtual pages resident on behalf of a process
内存管理器的组件
内存管理器是 windows 执行体的一部分,因此它存在于文件 Ntoskrnl.exe 之中。内存管理器没有任何部分是位于 HAL(译注:硬件抽象层)中的。内存管理器由下列组件构成:
■ 一组用于分配,释放,以及管理虚拟内存的执行体系统服务,其中多数通过Windows API或者内核模式设备驱动接口对外暴露;
■ 一个翻译无效和访问错误陷阱处理程序,用于解决硬件探测到的内存管理异常,以及代表一个进程,让它的部分虚拟页面驻留在物理内存中;(这句翻译的不好,原文是A translation-not-valid and access fault trap handler for resolving hardware-detected memory management exceptions and making virtual pages resident on behalf of a process。谁知道怎么翻译较好?)
■ Six key top-level routines, each running in one of six different kernel-mode threads in the System process (see the experiment “Mapping a System Thread to a Device Driver,” which shows how to identify system threads, in Chapter 2 in Part 1):
■&&6个关键的顶级(译注:这里应该是指线程调度的优先级,数值越大,优先级越高)例程,每个例程作为 System 进程中6种不同的内核模式线程之一运行(回顾上册第2章的实验“将一个系统线程映射到一个设备驱动”,该实验展示了辨别系统线程的方法):
(译注:System 进程的 PID 为 4,它是一种特殊线程的宿主,这种特殊线程只能在内核模式下运行,称为“内核模式系统线程”,这意味着,它们没有用户空间地址,所以这些线程临时申请任何存储空间时,通常会从内核模式堆/系统内存池中分配,具体参考第二部分译文。此外,各种执行体组件中的例程,以及设备驱动程序,都可以通过由执行体组件——进程/线程管理器——导出的 PsCreateSystemThread() 例程,来在 System 进程中创建系统线程——这仅仅是默认的行为,换言之,该例程支持指定其它进程作为要创建的系统线程宿主。通过 sysinternal 的进程浏览器工具,查看 System 进程的属性,选择“Thread”标签,可以看到当前运行在其中的所有系统线程,正常情况下应该有100多个左右;按照每次采样间隔的上下文切换次数排序。即 CSwitch Delta,可以列出因上下文切换导致占用最多 CPU 时钟周期的系统线程;如下图所示,线程的启动地址以 ntkrnlpa.exe! 开头的,一般就是各种执行体组件通过 PsCreateSystemThread() 例程创建的系统线程;而线程的启动地址以 .sys 后缀开头的,表示由相应的设备驱动程序创建的系统线程。 )
(译注:需要特别指出,由于某种原因,你使用进程浏览器查看 System 进程中的系统线程时,可能无法找到下文描述的 6 个系统线程,例如平衡集管理器,即 KeBalanceSetManager() ,如果你要进一步研究这个线程的内部逻辑,来验证原文对它的描述是否正确,则可以使用 KD.exe/Windbg.exe 的 uf 调试器命令,反汇编该例程中的机器代码,当然,也可以使用 IDA PRO 打开 ntkrnlpa.exe 映像,然后分析相应的位置,如下图所示
另外一种更直观的方法是,用内核调试器的 !process 0 0 扩展命令列出当前系统上运行的所有进程,在其中找到 System 进程的 EPROCESS 结构的虚拟地址,然后用这个地址作为前述扩展命令的第一个参数;用 0xf 作为其第二个参数再次执行该命令,按照这种方式,能够列出进程浏览器不可见的所有 6 个下文描述的系统线程,包括进程浏览器无法查询的调用栈信息,下图展示了这种方法。)
如何理解上图中的线程启动地址“ntkrnlpa.exe!ObfDereferenceObjectWithTag+0xa9”?如果我们反汇编 ObfDereferenceObjectWithTag() 例程,可以发现地址 84295d7f 被标记为 nt!ObfDereferenceObjectWithTag+0xa7 ,该地址处的指令为 0x90,即 nop,或称空指令;
而紧接其后的地址——84295d80——也就是 KeBalanceSetManager() 例程的第一条机器指令的地址,按照这种偏移法来表达,就是“ntkrnlpa.exe!ObfDereferenceObjectWithTag+0xa8”,由此可见,进程浏览器的线程启动地址信息基本正确,只有一字节的误差,并且也验证了“ntkrnlpa.exe!ObfDereferenceObjectWithTag+0xa9”这个启动地址确实位于 KeBalanceSetManager() 例程起始处,因此这个系统线程就是平衡集管理器线程,参考下图。
1.&&The balance set manager (KeBalanceSetManager, priority 16). It calls an inner routine, the working set manager (MmWorkingSetManager), once per second as well as when free memory falls below a certain threshold. The working set manager drives the overall memory management policies, such as working set trimming, aging, and modified page writing.
1.&&balance set manager(KeBalanceSetManager,平衡集管理器,优先级16)。它调用一个内部例程,叫做&&
working set manager
(MmWorkingSetManager,工作集管理器),后者每秒被调用一次;
(注意例程名称前缀的暗示:平衡集管理器以 Ke 开头,表明它是在更底层的内核中实现的;工作集管理器以 Mm 开头,表明它是在执行体组件——内存管理器——中实现的)。
此外,当可用内存低于某个阈值时,它也会被调用。工作集管理器会驱动整体的内存管理策略,例如工作集大小微调(裁剪),增加页面的使用年龄(译注:用于页面置换算法确定最近最少使用的页面,作为牺牲页换出内存),并且,如果牺牲页已被修改过,先将原修改数据写回磁盘上的交换空间(分页文件),然后再将该页面用于存储新数据。这个过程由 MiModifiedPageWriter() 例程负责,参见下文
2.&&The process/stack swapper (KeSwapProcessOrStack, priority 23) performs both process and kernel thread stack inswapping and outswapping. The balance set manager and the thread-scheduling code in the kernel awaken this thread when an inswap or outswap operation needs to take place.
2.&&process/stack swapper(KeSwapProcessOrStack,进程/栈交换器,优先级23)。执行进程与内核线程栈的换入换出操作。平衡集管理器与内核中的线程调度代码(译注:即操作系统调度器;在 Linux 和UNIX上,任务调度通常以进程为单位,而 windows 则支持线程粒度的调度)在需要发生进程或者内核线程栈的换入和换出操作时,就会唤醒这个线程。(译注:在 UNIX 变体如 4.3 BSD 上,执行相同任务的是一个叫做
swapper 的系统进程,通常在系统的空闲物理页框即物理内存不足,或者某些进程长时间没有获得调度从而变成非活动进程时,swapper 进程和 windows 的
process/stack swapper 线程被唤醒,将这些进程换出内存,从而释放空间。KeSwapProcessOrStack() 例程的操作会影响到线程的调度状态改变,例如,假设一个线程准备好执行,但它的内核栈[内核模式线程的情况]或调用栈/所属进程[用户模式线程的情况],被换出了内存,则该线程进入转换状态。一旦这些栈或所属进程被换回至内存中,该线程进入就绪状态。关于对线程调度状态的讨论,请参考本书上册第五章的 5.7 节——线程调度)
3.&&The modified page writer (MiModifiedPageWriter, priority 17) writes dirty pages on the modified list back to the appropriate paging files. This thread is awakened when the size of the modified list needs to be reduced.
3.&&modified page writer (MiModifiedPageWriter,已修改页面写回器,优先级 17)。将已修改页列表中的“脏”页写回适当的分页文件。(译注:通常由处理器的组件 MMU——内存管理单元——在向一个 PTE 页表项负责的 4KB 地址空间中某个地址写入数据时设置该 PTE 的修改位,Dirty bit ,即脏位;如果设置了脏位,内核在将该页面用于存储新数据前,应该首先将原数据写回硬盘上的分页文件,以反映修改结果;Intel x86/64体系结构中,提供了一条特权指令供内核清除该位) 。当需要减小已修改页列表的尺寸时,该线程就会被唤醒。
4.&&The mapped page writer (MiMappedPageWriter, priority 17) writes dirty pages in mapped files to disk (or remote storage). It is awakened when the size of the modified list needs to be reduced or if pages for mapped files have been on the modified list for more than 5 minutes. This second modified page writer thread is necessary because it can generate page faults that result in requests for free pages. If there were no free pages and there was only one modified page writer thread, the system could deadlock waiting for free pages.
4.&&mapped page writer (MiMappedPageWriter,映射页面写入器,优先级 17)。将位于内存中的“映射文件”的“脏”页写回磁盘(或远程存储),以更新修改结果。当需要减小已修改页列表的大小,或者映射文件中的脏页面位于已修改页列表超过 5 分钟,该线程就会被唤醒。
(译注:WRK1.2 版源码中的 mminit.c 模块包含内存管理器系统的初始化逻辑,其中第 171~178 行定义了这个 5 分钟时限,节录如下:)
[COLOR="Black"][B]// Default is a 300 second life span for modified mapped pages -
// This can be overridden in the registry.
[FONT="微软雅黑"]ULONG MmModifiedPageLifeInSeconds = 300;[/FONT][/B][/COLOR]
源码及其注释证实了,位于已修改页列表中的脏页的生存期就是 300 秒,如果超时将被写回磁盘
,注释还提到,该值可以通过编辑注册表的对应项来覆盖,但是没有给出具体的键路径。因为
MmModifiedPageLifeInSeconds 是一个全局变量,它可以被映射页面写入器访问,并据此作为页面回写磁盘的超时标准。
(这第二个)已修改页面写入器线程是必须的;如果只有一个已修改页面写回器线程,并且当前没有空闲页面,该线程可能因请求空闲页面而生成页面错误,此时该线程将被阻塞在等待产生空闲页面事件,但是又没有第二个能够将已修改页面写入磁盘,从而释放出空闲页面的线程可供调度,于是整个系统会进入死锁状态来等待空闲页面。(译注:这段译文经过自行润色,原文直译不好理解,希望没有偏离作者要表达的意思)
(译注:我们分析 WRK 1.2 版源码中的 modwrite.c 模块,它实现将已修改页面或已修改的映射文件写回磁盘的逻辑,在
其中可以学习到如何正确地初始化事件,等待事件变成有信号,然后继续执行;同时还可以检验原文讲到的相关内容,对内核工作机制有更直观的理解。
首先,在 MiModifiedPageWriter() 这个系统线程中(modwrite.c 的 2943 行),调用 KeInitializeEvent() 初始化全局的 MmMappedPageWriterEvent 事件,注意,此次使用一个通知事件(因此我们将在后面看到,需要在 KeSetEvent() 后调用 KeClearEvent() 清除事件的有信号状态),且初始化状态为无信号。然后它创建 MiMappedPageWriter() 系统线程,这就是原文提到的“第二个已修改页面写入器线程”——用于写入映射文件页面(第一个则是
MiModifiedPageWriter() 自身)。并且初始化一个由 LIST_ENTRY 组成的全局双向链表头部,如下所示:
由 LIST_ENTRY 组成的双向链表示意图如下,在实践中通常会把 LIST_ENTRY 内嵌在一个更大的母结构体内部,每个这样的母结构体称为链表中的“表项”;一个表项可以通过自身的 LIST_ENTRY 成员引用前一个和后一个表项:
而在 MiMappedPageWriter() 中(modwrite.c 的 4397 行),它会调用并阻塞在 KeWaitForSingleObject() 内部,等待其它函数处理这个链表,如下所示:
MiModifiedPageWriter() 在创建 MiMappedPageWriter() 线程后,它会调用
MiModifiedPageWriterWorker(),后者进一步调用 MiGatherMappedPages()——它在返回前调用 InsertTailList(),在全局的双向链表尾部插入表项,然后调用 KeSetEvent() 把 MmMappedPageWriterEvent 事件设置为有信号,从而“唤醒”MiMappedPageWriter() 线程(从 KeWaitForSingleObject() 中返回),如下所示:
MiGatherMappedPages() 返回到它的调用者后,MiModifiedPageWriterWorker() 显式调用 KeClearEvent() 来清除事件。回忆一下,前面说到通知事件需要在设置后手动清除,如下所示:
另一方面,前几张的截图显示出,MiMappedPageWriter() 线程被唤醒后,会调用 IsListEmpty() 例程来判断 MiGatherMappedPages() 处理过的全局双向链表是否为空,然后再进行相应的处理。
再次强调:本系列书籍作者由于其职位敏感性,不能总是把源码摆出台面上来分析系统机制,
只能用一些技术性较强的句子来描述,这又造成了我们翻译的困难,因此当你对原文或译文中某些内容感觉云里雾里时,像我这样搜索 WRK 源码中的相关实现逻辑,甚至结合内核调试器来动态分析,对理解幕后原理和设计思想都会很有帮助,还能够学习这些内核例程,数据结构的用法,提升你的驱动编程水平。)
5.&&The segment dereference thread (MiDereferenceSegmentThread, priority 18) is responsible for cache reduction as well as for page file growth and shrinkage. (For example, if there is no virtual address space for paged pool growth, this thread trims the page cache so that the paged pool used to anchor it can be freed for reuse.)
5.&&segment dereference thread (MiDereferenceSegmentThread,暂译为“内存段解引用线程”,优先级 18)。负责减少系统高速缓存数量(译注:这里不是CPU芯片内的硬件 L1~L3 cache,应该是指 windows 将磁盘上的分页文件用作物理内存的高速缓存这一概念)以及负责分页文件的增长和收缩。(例如,若没有虚拟地址空间可用于分页池增长,该线程将裁剪页面缓存的大小——解引用部分页面缓存——这样就能够释放页面缓存占用的虚拟内存并重用于分页池)(译注:这句也不太好翻译,原文是For example, if there is no virtual address space for paged pool growth, this thread trims the page cache so that the paged pool used to anchor it can be freed for reuse )
(译注:有条件获取 WRK1.2 版源码的童鞋,可以参考 sectsup.c 源文件中,从第 805 行开始的 MiDereferenceSegmentThread() 例程定义,sectsup.c 源文件中的 MiSectionInitialization() 例程用于初始化“Section”对象类型——MmSectionObjectType ,并且它会调用 PsCreateSystemThread() ,后者在 System 进程中创建 MiDereferenceSegmentThread 线程,鉴于 WRK 的源码许可协议规定每次引用的代码量不能大于 50 行,下面节录了最相关的代码片段: )
[FONT="微软雅黑"][SIZE="4"][COLOR="Black"]HANDLE ThreadH
if (!NT_SUCCESS(PsCreateSystemThread (&ThreadHandle,
THREAD_ALL_ACCESS,
&ObjectAttributes,
[B] //第 4 个参数(HANDLE
ProcessHandle)指明要在哪个进程中创建系统线程,如果此参数为 NULL,就在 System 进程中创建系统线程。[/B]
MiDereferenceSegmentThread,
return FALSE;
ZwClose (ThreadHandle);[/COLOR][/SIZE][/FONT]
(上面代码片段中,开始处定义的局部变量 ThreadHandle ,此刻为到 MiDereferenceSegmentThread() 线程的句柄。如原文所述,它是一个重要的系统线程,在必要时会释放(解引用)“系统缓存”类型的内核虚拟地址空间,从而保证其它类型的内核虚拟地址空间——例如可分页池——有足够的虚拟内存可用。所以该线程必须一直运行,这里关闭它的句柄,避免其它内核组件误操作而终止该线程。)
6.&&The zero page thread (MmZeroPageThread, base priority 0) zeroes out pages on the free list so that a cache of zero pages is available to satisfy future demand-zero page faults.
Unlike the other routines described here, this routine is not a top-level thread function but is called by the top-level thread routine
Phase1Initialization. MmZeroPageThread never returns to its caller, so in effect the Phase 1 Initialization thread becomes the zero page thread by calling this routine. Memory zeroing in some cases is done by a faster function called MiZeroInParallel. See the note in the section “Page List Dynamics” later in this chapter.
Each of these components is covered in more detail later in the chapter.
6.&&zero page thread (译注:MmZeroPageThread,零页线程,基础优先级为 0,该线程属于可变优先级类线程,此类线程的当前优先级以基础优先级为下限,可以动态变化,例如:一,如果此类线程由于等待I/O事件而被挂起,内核在调度运行其它就绪线程前,将提升此类线程的当前优先级;二,如果此类线程用完了本次分配给它的时间片而被挂起,内核降低其当前优先级,并用于下一轮调度时判断的标准)。
零页线程将空闲页列表(译注:可能通过类似单向链表的数据结构实现)中的页全部用 0 填充,然后换出内存,从而使得分页文件缓存中有全 0 的页面可用于满足将来的“零页需求”类型的页面错误,并且能够被换入内存。与这里描述的其它 5 类例程不同,此例程并非顶级线程函数,但是它会被一个叫做 Phase1Initialization 的顶级线程例程调用;
(译注: 系统引导分成第一阶段内核,执行体初始化,以及第二阶段的执行体初始化。在第一阶段的执行体初始化中的函数调用流程如下:
ExpInitializeExecutive() -& PsInitSystem() -& PspInitPhase0()
PspInitPhase0() 首先调用 PsGetCurrentProcess(),将自身转变为 Idle 进程,然后创建 system 进程,并在其中创建系统线程 Phase1Initialization,最后执行上下文切换将控制转交给 Phase1Initialization 线程,由后者负责第二阶段的执行体初始化;
而 Phase1Initialization 线程在最后则通过调用 MmZeroPageThread() ,将自身转变为零页线程。下面这张引用了《Windows 内核设计思想》一书中的图片,比较全面地概括了系统启动的流程:)
零页线程从不返回(到它的调用者),因此实际上,Phase1 初始化线程函数(在最后)通过调用此例程变成零页线程。内存清零操作有时候通过一个叫“MiZeroInParallel”的函数完成,其速度更快。更多细节请查看本章后面的“Page List Dynamics”(动态页列表?直译。。。);
本章后面会涵盖上述 6 个内存管理器组件的更多细节。
(译注:为了避免只见树木不见林的弊端,下图给出内存管理器在整个Windows系统架构中的位置。注意,为求简洁,这张系统架构图省略了许多用户模式子系统DLL,内核模式组件,TCP/IP协议栈如tcpip.sys等。重点突出函数调用的轨迹,以及模块之间的依赖性。完整的系统架构图,各位可以参考本书上册第2章“关键的系统组件”一节)
Internal Synchronization
Like all other components of the Windows executive, the memory manager is fully reentrant and supports simultaneous execution on multiprocessor systems—that is, it allows two threads to acquire resources in such a way that they don't corrupt each other's data. To accomplish the goal of being fully reentrant, the memory manager uses several different internal synchronization mechanisms, such as spinlocks, to control access to its own internal data structures. (Synchronization objects are discussed in Chapter 3, “System Mechanisms,” in Part 1.)
Some of the systemwide resources to which the memory manager must synchronize access include:
■ Dynamically allocated portions of the system virtual address space
■ System working sets
■ Kernel memory pools
■ The list of loaded drivers
■ The list of paging files
■ Physical memory lists
■ Image base randomization (ASLR) structures
■ Each individual entry in the page frame number (PFN) database
Per-process memory management data structures that require synchronization include the working set lock (held while changes are being made to the working set list) and the address space lock (held whenever the address space is being changed). Both these locks are implemented using pushlocks.
正如windows执行体的所有其它组件一样,内存管理器是完全可重入的,并且支持在多处理器系统上同时执行——换句话说,以这样的方式能够允许2个线程在不损坏彼此数据的情况下获取资源。为了实现可完全重入这一目标,内存管理器使用了几种不同的内部同步机制,例如自旋锁,用于控制对系统自身内部数据结构的访问(同步对象在本书上册第3章“系统机制”中讨论)
内存管理器必须对其访问进行同步化的一些系统范围资源包括:
■&&系统虚拟地址空间的动态分配部分;
■ 系统工作集;
■ 内核内存池;
■ 已加载驱动程序的列表;
■ 分页文件列表;
■ 物理内存列表;
■ 地址空间布局随机化(ASLR)使用的相关结构;
■ 页框号(PFN)数据库中,每个单独的条目;(译注:这里的页框号数据库,类似操作系统维护的页表;页框号即物理页号;页表中的每个条目称为页表项,即PTE。因此 PFN 中单一条目的作用就类似于 PTE)
内存管理事务涉及的每进程数据结构中,需要同步的包括:工作集锁(当工作集列表正在变更时持有该锁),地址空间锁(每当地址空间正被改变时持有该锁)。这些锁都使用推锁(pushlocks)实现。
Examining Memory Usage
The Memory and Process performance counter objects provide access to most of the details about system and process memory utilization. Throughout the chapter, we'll include references to specific performance counters that contain information related to the component being described. We've included relevant examples and experiments throughout the chapter.
One word of caution, however:
different utilities use varying and sometimes inconsistent or confusing names when displaying memory information. The following experiment illustrates this point. (We'll explain the terms used in this example in subsequent sections.)
审查内存使用
“内存与进程性能计数器对象”提供对绝大多数与系统和进程内存使用率细节相关的访问。贯穿本章,我们将引用特定性能计数器,这些计数器包含与本章描述的内存管理器组件有关的信息,我们也涵盖了相应的例子与实验。然而,需要提醒一下:当显示内存信息时,不同的工具使用不同的——有时是不一致或让人困惑的名称。下面的实验说明了这一点。(我们将在后续部分解释这个例子中使用的术语)
EXPERIMENT: Viewing System Memory Information
The Performance tab in the Windows Task Manager, shown in the following screen shot, displays basic system memory information. This information is a subset of the detailed memory information available through the performance counters. It includes data on both physical and virtual memory usage.
实验:查看系统内存信息
如下的屏幕截图所示,windows任务管理器中的性能标签,显示基本的系统内存信息,这个信息仅是性能计数器提供的详细内存信息的一组子集,它包含物理内存和虚拟内存使用率相关的数据。
(原文使用EN-US语系的系统截图,我把它替换成自己机器上的ZH-CN语系截图,主要是方便大家对照下面的表格来理解图中每个术语的含义)
The following table shows the meaning of the memory-related values.
下表解释任务管理器中使用的内存相关术语的含义:
Memory bar histogram
内存的条柱形图
Bar/chart line height shows physical memory in use by Windows (not available as a performance counter). The remaining height of the graph is equal to the Available counter in the Physical Memory section,
described later in the table. The total height of the graph is equal to the Total counter in that section. This represents the total RAM usable by the operating system, and does
not include BIOS shadow pages, device memory, and so on.
该条柱形图的行高显示windows使用的物理内存情况(亮绿色区域,该区域没有相应的性能计数器)。该图中剩余的高度(暗绿色区域)相当于“物理内存(MB)”栏位中的“可用”计数器,后续的表格会讲到。该图的总高度相当于栏位中的“总数”计数器;总数表示操作系统能够使用的物理内存总量,并且不包含BIOS shadow pages(直译为BIOS影子页面,也就是将一些外围硬件设备自带的 BIOS ROM 映射到系统内存)与设备内存等(将一些外围硬件设备自带的存储器或缓存映射到系统内存)。
Physical Memory (MB): Total
物理内存(以MB,百万字节为单位):总数
Physical memory usable by Windows
即windows可用的物理内存,如前所述,等于内存条形图的总高;
Physical Memory (MB): Cached
Sum of the following performance counters in the Memory object:
Cache Bytes, Modified Page List Bytes, Standby Cache Core Bytes,
Standby Cache Normal Priority Bytes, and Standby Cache Reserve Bytes (all in Memory object)
内存对象中的一些性能计数器总合,包括Cache Bytes,Modified Page List Bytes,Standby Cache Core Bytes,Standby Cache Normal Priority Bytes,以及 Standby Cache Reserve Bytes(译注:这里保持原文,避免翻译引起的语义准确性争议)
Physical Memory (MB):Available
Amount of memory that is immediately available for use by the operating system, processes, and drivers. Equal to the combined size of the standby, free, and zero page lists.
可以由操作系统,进程,驱动程序立即使用的物理内存数量,它等于备用(standby),空闲(free),以及零页列表(zero page lists)三者之和。这三者相加应等于前述的内存柱形图中的暗绿色(可用物理内存)区域。(译注:打开任务管理器中的资源监视器 ,在“物理内存”栏目中通过简单的加法即可验证,需要注意,简体中文语系windows 7 客户机系列的翻译出了一点小错误:最右边的方格图例应该是“空闲”,而非“可用”)
Physical Memory (MB): Free
Free and zero page list bytes
空闲页和零页列表中的页面总字节(译注:系统自身给出的解释为“不包含任何有价值数据&零页?&,以及当进程,驱动程序,操作系统需要更多内存时将首先使用的内存”)
Kernel Memory (MB): Paged
内核内存(以MB,百万字节为单位):分页数
Pool paged bytes. This is the total size of the pool, including both free and allocated regions
分页池的总字节,包含空闲和已分配区域;
Kernel Memory (MB): Nonpaged
Pool nonpaged bytes. This is the total size of the pool, including both free and allocated regions
不可分页池的总字节,包含空闲和已分配区域;
System: Commit (two numbers shown)
系统栏位中的“提交”(以GB,十亿字节为单位)
Equal to performance counters Committed Bytes and Commit Limit, respectively
前后显示2个数字,分别等于Committed Bytes和Commit Limit这2个性能计数器;
To see the specific usage of paged and nonpaged pool, use the Poolmon utility, described in the “Monitoring Pool Usage” section.
使用在“监控页面池使用率”小节中讨论的工具Poolmon,可以查看分页池和非分页池的具体使用情况。
The Process Explorer tool from Windows Sysinternals (/technet/sysinternals) can show considerably more data about physical and virtual memory. On its main screen, click View and then System Information, and then choose the Memory tab. Here is an example display from a 32-bit Windows system:
来自Windows Sysinternals (/technet/sysinternals) 的Process Explorer(进程浏览器或进程资源管理器)能够显示更多有关物理内存和虚拟内存的数据。在其主界面中,单击View菜单-&System Information,在打开的界面中选择Memory选项卡即可查看。下面这个显示的例子来自一个32位的windows系统:
We will explain most of these additional counters in the relevant sections later in this chapter.
Two other Sysinternals tools show extended memory information:
■ VMMap shows the usage of virtual memory within a process to an extremely fine level of detail.
■ RAMMap shows detailed physical memory usage.
These tools will be featured in experiments found later in this chapter.
Finally, the !vm command in the kernel debugger shows the basic memory management information available through the memory-related performance counters. This command can be useful if you're looking at a crash dump or hung system. Here's an example of its output from a 4-GB Windows client system:
我们将在本章后续相关部分解释这些附加的计数器。
另外2个Sysinternals工具能够显示扩展的内存信息:
■ VMMap将一个进程内的虚拟内存使用情况显示到一个极端细致的水平;
■ RAMMap显示物理内存使用情况的细节;
本章后续将通过实验来展示这些工具的特色。
最后,内核调试器中的 !vm 命令通过内存相关的性能计数器显示可用的基本内存管理信息。如果你正检查一个崩溃转储或挂掉的系统,该命令可能有用。下面的例子来自于一个4GB物理内存的windows客户机系统上的输出:
[FONT="微软雅黑"][SIZE="4"]1: kd& !vm
*** Virtual Memory Usage ***
Physical Memory: 851757 ( 3407028 Kb)
Page File: \??\C:\pagefile.sys
Current: 3407028 Kb Free Space: 3407024 Kb
Minimum: 3407028 Kb Maximum: 4193280 Kb
Available Pages: 699186 ( 2796744 Kb)
ResAvail Pages: 757454 ( 3029816 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 370673 ( 1482692 Kb)
Modified Pages: 9799 ( 39196 Kb)
Modified PF Pages: 9798 ( 39192 Kb)
NonPagedPool Usage: 0 ( 0 Kb)
NonPagedPoolNx Usage: 8735 ( 34940 Kb)
NonPagedPool Max: 522368 ( 2089472 Kb)
PagedPool 0 Usage: 17573 ( 70292 Kb)
PagedPool 1 Usage: 2417 ( 9668 Kb)
PagedPool 2 Usage: 0 ( 0 Kb)
PagedPool 3 Usage: 0 ( 0 Kb)
PagedPool 4 Usage: 28 ( 112 Kb)
PagedPool Usage: 20018 ( 80072 Kb)
PagedPool Maximum: 523264 ( 2093056 Kb)
Session Commit: 6218 ( 24872 Kb)
Shared Commit: 18591 ( 74364 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 2151 ( 8604 Kb)
PagedPool Commit: 20031 ( 80124 Kb)
Driver Commit: 4531 ( 18124 Kb)
Committed pages: 179178 ( 716712 Kb)
Commit limit: 1702548 ( 6810192 Kb)
Total Private: 66073 ( 264292 Kb)
0a30 CCC.exe 11078 ( 44312 Kb)
0548 dwm.exe 6548 ( 26192 Kb)
091c MOM.exe 6103 ( 24412 Kb)[/SIZE][/FONT]
We will describe many of the details of the output of this command later in this chapter.
我们将在本章稍后描述该命令输出的众多细节。
Services Provided by the Memory Manager
The memory manager provides a set of system services to allocate and free virtual memory, share memory between processes, map files into memory, flush virtual pages to disk, retrieve information about a range of virtual pages, change the protection of virtual pages, and lock the virtual pages into memory.
Like other Windows executive services, the memory management services allow their caller to supply a process handle indicating the particular process whose virtual memory is to be manipulated.
The caller can thus manipulate either its own memory or (with the proper permissions) the memory of another process. For example, if a process creates a child process, by default it has the right to manipulate the child process’s virtual memory. Thereafter, the parent process can allocate, deallocate, read, and write memory on behalf of the child process by calling virtual memory services and passing a handle to the child process as an argument. This feature is used by subsystems to manage the memory of their client processes. It is also essential for implementing debuggers because debuggers must be able to read and write to the memory of the process being debugged.
内存管理器提供的服务
内存管理器提供一组系统服务用于分配和释放虚拟内存,在进程间共享内存,将磁盘文件映射至内存,将虚拟页刷新到磁盘,检索一系列有关虚拟页的信息,更改虚拟页的保护属性,以及将虚拟页锁在内存中。与其它Windows执行体服务一样,内存管理服务允许它们的调用者提供一个进程句柄,用于指明要被操控虚拟内存的特定进程;调用者因而能够操控其自身内存,或者以适当的权限操纵其它进程的内存;例如,一个进程创建了一个子进程,默认情况下,父进程有权操控它的子进程的虚拟内存;随后,父进程可以通过调用虚拟内存服务并且传递一个该子进程的句柄作为参数,从而能够代表该子进程分配,释放,以及读写内存。这个特性被子系统用来管理它们“客户进程”的内存。这个特性对于实现调试器也是必需的,因为调试器必须能够读写被调试进程的内存。
Most of these services are exposed through the Windows API. The Windows API has three groups of functions for managing memory in applications: heap functions (Heapxxx and the older interfaces Localxxx and Globalxxx, which internally make use of the Heapxxx APIs), which may be used for allocations
virtual memory functions, which operate with page granularity (Virtualxxx); and memory mapped file functions (CreateFileMapping, CreateFileMappingNuma, MapViewOfFile, MapViewOfFileEx, and MapViewOfFileExNuma). (We’ll describe the heap manager later in this
The memory manager also provides a number of services (such as allocating and deallocating physical memory and locking pages in physical memory for direct memory access [DMA] transfers) to other kernel-mode components inside the executive as well as to device drivers. These functions begin with the prefix Mm. In addition, though not strictly part of the memory manager, some executive
support routines that begin with Ex are used to allocate and deallocate from the system heaps (paged and nonpaged pool) as well as to manipulate look-aside lists. We’ll touch on these topics later in this chapter in the section “Kernel-Mode Heaps (System Memory Pools).”
这些服务绝大多数通过Windows API对外暴露。Windows API中有3组函数用于管理应用程序内存:
堆函数(Heapxxx 以及旧版接口 Localxxx 与 Globalxxx,后2者在内部利用 Heapxxx APIs),它们可能被用来分配小于一页的内存;
虚拟内存函数(Virtualxxx),它们可用来操作页面的粒度(详情参考“大页面与小页面”一节);
内存映射文件函数(CreateFileMapping,CreateFileMappingNuma,MapViewOfFile,MapViewOfFileEx,以及 MapViewOfFileExNuma)(本章稍后会介绍堆管理)
内存管理器也向执行体内的其它内核模式组件,以及设备驱动程序,提供了若干服务(例如分配和释放物理内存;锁住物理内存中的页面,用于直接存储器访问[DMA]的信号传输),这些函数名称以 Mm为前缀。此外,尽管不属于严格意义上的内存管理器一部分,一些以前缀 Ex 开头的执行体支持例程用于从系统堆(分页和非分页池)分配和释放内存,以及操控后备列表。
我们将在本章后面的“Kernel-Mode Heaps (System Memory Pools)”(内核模式堆[系统内存池])部分中,讨论这些主题。
Large and Small Pages
The virtual address space is divided into units called pages. That is because the hardware memory management unit translates virtual to physical addresses at the granularity of a page. Hence, a page is the smallest unit of protection at the hardware level. (The various page protection options are described in the section “Protecting Memory” later in the chapter.) The processors on which Windows runs support two page sizes, called small and large. The actual sizes vary based on the processor
architecture, and they are listed in Table 10-1.
大页面与小页面
虚拟地址空间被划分为叫做页面的单元。这是由于硬件内存管理单元(译注:即MMU)以页面为粒度将虚拟地址翻译成物理地址。于是,在硬件级别,一个页就是最小的保护单元。
(本章后续的“Protecting Memory”部分,将描述各种页面保护选项)。运行Windows的处理器支持2种页面尺寸,叫做小页和大页。页面的实际大小根据处理器体系结构而有所不同,表10-1列出了这些值:
TABLE 10-1 Page Sizes(页面大小)
Note IA64 processors support a variety of dynamically configurable page sizes, from 4 KB up to 256 MB. Windows on Itanium uses 8 KB and 16 MB for small and large pages, respectively, as a result of performance tests that confirmed these values as optimal. Additionally, recent x64 processors support a size of 1 GB for large pages, but Windows does not use this feature.
注意& &IA64处理器支持各种动态可配置的页面大小——从4KB到最大256MB。运行在Itanium处理器上的Windows使用8 KB小页和16 MB大页,这是由于一些性能测试确认了这些值是最优的。
同时,最近的x64处理器支持1GB的大页尺寸,但Windows没有使用该特性。
The primary advantage of large pages is speed of address translation for references to other data within the large page. This advantage exists because the first reference to any byte within a large page will cause the hardware’s translation look-aside buffer (TLB, described in a later section) to have in its cache the information necessary to translate references to any other byte within the large page.
If small pages are used, more TLB entries are needed for the same range of virtual addresses, thus increasing recycling of entries as new virtual addresses require translation. This, in turn, means having to go back to the page table structures when references are made to virtual addresses outside the scope of a small page whose translation has been cached. The TLB is a very small cache, and thus large pages make better use of this limited resource.
大页面的主要优势是加快引用页面内其它数据的地址翻译速度。存在这个优势是因为首次引用一个大页面内的任意字节,将导致CPU内部的TLB(转换后援缓冲,后面会讲到)硬件缓存必要的信息,用于翻译对该页面中其它字节的引用。
如果我们使用小页面,需要更多的TLB条目来缓存相同范围的虚拟地址空间翻译结果(译注:例如,对于一个4MB地址空间范围,需要1024个4KB小页面;只需要1个4MB大页面就能覆盖),由于TLB条目数量是固定的,这导致可用于缓存其它范围虚拟地址空间翻译结果的条目越少。换言之,当引用的虚拟地址不在任何TLB条目缓存的小页面负责的范围内时,不仅需要回到页表结构中查找(译注:这就需要更多的CPU时钟周期,因为页表通常在内存中,内存访问比TLB访问至少慢上2个数量级),而且需要回收更多旧的TLB条目用于缓存新的翻译结果。TLB是一种非常小的缓存,在其中缓存大页面能够更高效使用这个有限的资源。
(译注:TLB一般可以存储64个PTE条目,如果使用每个PTE映射4KB地址空间的小页面,那么总共能缓存4*64 = 256KB的物理地址空间,如果使用4MB的大页面,则可以缓存4*64=256MB的物理地址空间,后者明显可以降低TLB未命中从而需要访问内存查询页表的几率,关于TLB缓存地址翻译结果的工作原理,可以参考下面这个视频解说:https://youtu.be/95QpHJX55bM)
To take advantage of large pages on systems with more than 2 GB of RAM, Windows maps with large pages the core operating system images (Ntoskrnl.exe and Hal.dll) as well as core operating system data (such as the initial part of nonpaged pool and the data structures that describe the state of each physical memory page). Windows also automatically maps I/O space requests (calls by
device drivers to MmMapIoSpace) with large pages if the request is of satisfactory large page length and alignment. In addition, Windows allows applications to map their images, private memory, and page-file-backed sections with large pages. (See the MEM_LARGE_PAGE flag on the VirtualAlloc, VirtualAllocEx, and VirtualAllocExNuma functions.) You can also specify other device drivers to be mapped with large pages by adding a multistring registry value to HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargePageDrivers and specifying the names of the
drivers as separately null-terminated strings.
为充分利用多于2GB物理内存系统上的大页面,Windows在核心操作系统映像(Ntoskrnl.exe 与 Hal.dll)以及核心操作系统数据(例如,非分页池的初始部分,描述每个物理内存页状态的数据结构)上,使用大页面映射。
如果I/O地址空间请求(通过设备驱动程序调用MmMapIoSpace函数)能够符合大页面长度和对齐要求,Winodws也将自动使用大页面映射。此外,Windows还允许应用程序对它们的磁盘映像文件,私有内存(private memory),以及page-file-backed (暂译成“页面文件备份”)部分,使用大页面映射。(通过在VirtualAlloc,VirtualAllocEx,以及VirtualAllocExNuma函数参数中指定MEM_LARGE_PAGE标志)
你也可以通过向注册表路径HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Memory Management\&&
添加一个名为 “LargePageDrivers”的多字符串值,然后指定以空字符(译注:0x00)结尾并分隔的驱动程序名称字符串作为其数据,从而配置这些设备驱动程序使用大页面映射。(译注:参照原文,本句有更好的译法请提出,不胜感激)
Attempts to allocate large pages may fail after the operating system has been running for an extended period, because the physical memory for each large page must occupy a significant number (see Table 10-1) of physically contiguous small pages, and this extent of physical pages must furthermore
begin on a large page boundary. (For example, physical pages 0 through 511 could be used as a large page on an x64 system, as could physical pages 512 through 1,023, but pages 10 through 521 could not.) Free physical memory does become fragmented as the system runs. This is not a problem for allocations using small pages but can cause large page allocations to fail.
在操作系统已经运行了一段较长时间后,尝试分配大页面可能会失败,因为用于每个大页面的物理内存必须占据为数众多,物理上连续相邻的小页面(请回顾表10-1),并且物理页面的范围必须进一步按照一个大页面的边界作为起点(例如,在x64系统上,物理页面0~511可能被用作一个大页面,物理页面512~1023按前一个大页面的边界开始,用作另一个大页面,但是,对于页面范围10~521的分配请求会失败,因为跨越了2个大页面,没有按照边界对齐。)
在这种情况下,系统运行期间,空闲物理内存确实会变得碎片化。使用小页面来分配就不会有问题;使用大页面有可能造成分配失败。
It is not possible to specify anything but read/write access to large pages. The memory is also always nonpageable, because the page file system does not support large pages. And, because the memory is nonpageable, it is not considered part of the process working set (described later). Nor are large page allocations subject to job-wide limits on virtual memory usage.
除了对于大页面的读/写访问外,无法指定其它操作。大页面内存也总是不可分页的,因为分页文件系统不支持大页面。这使得它不被认为是进程工作集的一部分(稍后解释),因而在进程虚拟内存使用方面,也没有大页面分配问题导致的工作范围限制。(译注:原文为 Nor are large page allocations subject to job-wide limits on virtual memory usage,如有更准确的译法请提出)
There is an unfortunate side effect of large pages. Each page (whether large or small) must be
mapped with a single protection that applies to the entire page (because hardware memory protection is on a per-page basis). If a large page contains, for example, both read-only code and read/write data, the page must be marked as read/write, which means that the code will be writable. This means that device drivers or other kernel-mode code could, as a result of a bug, modify what is supposed to be read-only operating system or driver code without causing a memory access violation.
If small pages are used to map the operating system’s kernel-mode code, the read-only portions of Ntoskrnl.exe and Hal.dll can be mapped as read-only pages. Using small pages does reduce efficiency of address translation, but if a device driver (or other kernel-mode code) attempts to modify a readonly part of the operating system, the system will crash immediately with the exception information pointing at the offending instruction in the driver. If the write was allowed to occur, the system would likely crash later (in a harder-to-diagnose way) when some other component tried to use the corrupted data.
使用大页面有个令人遗憾的副作用。每个页面(不论大或小)必须通过应用到整个页面的单一保护策略进行映射(因为硬件保护机制是在每一页的基础上进行)。举例来讲,如果一个大页面包含只读代码与可读写的数据,该页面必须被标记为可读写,这意味着代码将是可写的。也就是说,这将导致一个bug:设备驱动程序或者其它内核模式代码能够修改本应该是只读的操作系统或驱动程序代码,而不会造成一个违法的内存访问。如果小页面用于映射操作系统内核模式代码,Ntoskrnl.exe 与 Hal.dll 的只读部分可以被映射成只读页。使用小页面确实会降低地址翻译的效率,但是,如果一个设备驱动程序(或者其它内核模式代码)尝试修改一个操作系统的只读部分,系统将立即崩溃,附带指向驱动程序中违规指令的异常信息。另一方面,如果允许对其进行写操作,稍后当一些其它内核组件试图使用已损坏的数据时,系统将可能崩溃,并且难以被诊断调试。
If you suspect you are experiencing kernel code corruptions, enable Driver Verifier (described later in this chapter), which will disable the use of large pages.
如果怀疑自己遇到了内核代码损坏,启用 Driver Verifier(直译为“驱动验证器”)(本章后面解释),它将禁用大页面。
Reserving and Committing Pages
Pages in a process virtual address space are free, reserved, committed, or shareable. Committed and shareable pages are pages that, when accessed, ultimately translate to valid pages in physical memory.
Committed pages are also referred to as private pages. This reflects the fact that committed pages cannot be shared with other processes, whereas shareable pages can be (but, of course, might be in use by only one process).
保留的与提交的页面
一个进程虚拟地址空间中的页面可能属于下列一种:free(空闲的), reserved(保留的), committed(提交的), shareable(可共享的)。
其中,提交和共享的页面在被访问时,最终将转换(翻译)成物理内存中合法有效的页面。提交的页面也简称为private(私有的)页面。这反映出一个事实,即提交的页面不能被其它进程共享,而可共享的页面则可以(当然,只能同时被一个进程使用)。
Private pages are allocated through the Windows VirtualAlloc, VirtualAllocEx, and VirtualAllocExNuma functions. These functions allow a thread to reserve address space and then commit portions of the reserved space. The intermediate “reserved” state allows the thread to set aside a range of contiguous virtual addresses for possible future use (such as an array), while consuming negligible system resources, and then commit portions of the reserved space as needed as the application runs. Or, if the size requirements are known in advance, a thread can reserve and commit in the same function call. In either case, the resulting committed pages can then be accessed by the thread. Attempting to access free or reserved memory results in an exception because the page isn’t mapped to any storage that can resolve the reference.
私有页面是通过Windows函数VirtualAlloc, VirtualAllocEx, 以及 VirtualAllocExNuma分配的。这些函数允许一个线程预留出地址空间,然后提交部分的预留空间(作为私有页面使用)。这中间的“预留”状态允许该线程留出一系列连续相邻的虚拟地址,以备将来使用(例如数组),同时消耗的系统资源可以忽略不计,然后根据应用程序运行时的需求,提交部分的预留空间使用。或者,如果事先知道要求分配的大小,线程可以在相同的函数调用中申请预留并提交使用。不论哪种情况,由此产生的提交页面稍后可被该线程访问。试图访问空闲的(free)和保留的(reserved)内存会导致一个异常,因为这些虚拟页面没有映射到任何能够解析该引用的存储器位置。(译注:此句经过自行润色,原文为“because the page isn't mapped to any storage that can resolve the reference.”)
If committed (private) pages have never been accessed before, they are created at the time of first access as zero-initialized pages (or demand zero). Private committed pages may later be automatically written to the paging file by the operating system if required by demand for physical memory.
“Private” refers to the fact that these pages are normally inaccessible to any other process.
如果提交的(私有的)页面之前从未被访问过,在首次访问时,作为zero-initialized(初始化为零)的页面(或demand zero)来创建。
如果有对物理内存的需求,私有提交页面随后将自动被操作系统写入分页文件(从而释放内存空间)。“私有的”指一事实:即这些页面对任何其它进程而言,通常是不可访问的。
Note There are functions, such as
ReadProcessMemory and WriteProcessMemory, that apparently permit cross-process memory access, but these are implemented by running kernel-mode code in the context of the target process (this is referred to as attaching to the process). They also require that either the security descriptor of the target process grant the accessor the PROCESS_VM_READ or PROCESS_VM_WRITE right, respectively, or that
the accessor holds SeDebugPrivilege, which is by default granted only to members of the Administrators group.
注意& &诸如ReadProcessMemory 和 WriteProcessMemory此类函数,表面上允许跨进程的内存访问,但这是通过在目标进程的上下文中运行内核模式代码实现的(这被称为“附加到该进程”)此类函数要么需要目标进程的安全描述符授予访问者PROCESS_VM_READ或者PROCESS_VM_WRITE权限,要么访问者必须持有SeDebugPrivilege权限,后者默认仅授予管理员组中的成员帐户。
Shared pages are usually mapped to a view of a section, which in turn is part or all of a file, but may instead represent a portion of page file space. All shared pages can potentially be shared with other processes. Sections are exposed in the Windows API as file mapping objects.
When a shared page is first accessed by any process, it will be read in from the associated mapped file (unless the section is associated with the paging file, in which case it is created as a zero-initialized page). Later, if it is still resident in physical memory, the second and subsequent processes accessing it can simply use the same page contents that are already in memory. Shared pages might also have been prefetched by the system.
共享页面通常映射到“一节”视图,该节又是一个磁盘文件的一部分或者全部,而不是代表磁盘分页文件的一部分。(译注:这句行文非常怪异,原文为“Shared pages are usually mapped to a view of a section, which in turn is part or all of a file, but may instead represent a portion of page file space”)。所有共享页面都有可能被其它进程共享。其中一部分共享页面通过Windows API,作为“文件映射对象“对外暴露。
当一个共享的页面首次被任何进程访问时,将从(磁盘上)关联的映射文件读入内存(直到该部分与页面文件关联,在此情况下,它被作为一个初始化为零的页面[zero-initialized page]创建)。
随后,如果此共享页面仍旧驻留在物理内存中,第二个以及后续的进程访问时,可以简单地使用已经在内存中的页面内容(与磁盘上的相同)。共享的页面也可能已经被系统预取(prefetched)进内存,甚至不需要等待进程首次访问才读入。
Two upcoming sections of this chapter, “Shared Memory and Mapped Files” and “Section Objects,” go into much more detail about shared pages. Pages are written to disk through a mechanism called modified page writing. This occurs as pages are moved from a process’s working set to a systemwide list called th from there, they are written to disk (or remote storage). (Working
sets and the modified list are explained later in this chapter.) Mapped file pages can also be written back to their original files on disk as a result of an explicit call to FlushViewOfFile or by the mapped page writer as memory demands dictate.
本章即将讨论的两部分:“共享内存和映射文件”与“Section Objects”(译注:暂译为“节对象”,欢迎指正),涉及更多有关共享页面的细节。页面通过一种叫做modified page writing的机制被写入磁盘。这种情况发生在页面被从一个进程的工作集中移动到一个叫做modified page list(已修改页面列表)的系统级列表中的时候;该列表中的页面被写入磁盘(或远程存储)。(本章稍后将解释工作集和已修改列表)。用于映射文件的页面也可以被写回它们在磁盘上的原始文件,这通过两种方法实现:显式调用FlushViewOfFile函数;或者通过映射页面回写器(参考前面讲到的6个顶级内核模式线程)根据系统当前的内存需求情况决定是否写入。
You can decommit private pages and/or release address space with the VirtualFree or VirtualFreeEx function. The difference between decommittal and release is similar to the difference between reservation and committal—decommitted memory is still reserved, but released m
it is neither committed nor reserved.
你可以通过VirtualFree 或 VirtualFreeEx函数,回收私有页面并且/或者释放地址空间。“回收的”与“释放”之间的区别,类似于“保留的”与“提交的”之间的区别——回收的内存仍然是被保留的,而释放的内存就已经被释放了(has been freed);它既不属于提交的,也不属于保留的。
Reserving memory is a relatively inexpensive operation because it consumes very little actual memory. All that needs to be updated
or constructed is the relatively small internal data structures that represent the state of the process address space. (We’ll explain these data structures, called page tables and virtual address descriptors, or VADs, later in the chapter.)
保(预)留内存是相对廉价的操作,因为它只消耗非常少的内存。所有需要更新或构建的只是比较小的,用于表示进程地址空间状态的内部数据结构。(我们将在本章后面解释这些被称为“页表”和“虚拟地址描述符,VADs”的数据结构)
One extremely common use for reserving a large space and committing portions of it as needed is the user-mode stack for each thread. When a thread is created, a stack is created by reserving a contiguous portion of the process address space. (1 MB you can override this size with the CreateThread and CreateRemoteThread function calls or change it on an imagewide basis by using the /STACK linker flag.) By default, the initial page in the stack is committed and the next page is marked as a guard page (which isn’t committed) that traps references beyond the end of the committed portion of the stack and expands it.
对于保留一大段地址空间并且按需提交其中一部分来使用,有一个极其常见的例子——每个线程的用户模式栈。当一个线程被创建时,通过在进程地址空间中预留出一段连续相邻的部分来创建栈。(默认是1MB,你可以通过CreateThread 与 CreateRemoteThread 函数调用覆盖默认值,或者使用 /STACK 链接器标志,在整个映像文件的基础上更改默认值)。默认情况下,栈中的初始页面是提交的,并且接下来的页面被标记为保护页(非提交的),引用超出栈中提交部分结尾的页面时,通过陷阱分发机制执行特定的异常处理程序来扩展当前提交的页面。(译注:本句请参考原文,有更好的译法欢迎提出或指正)
EXPERIMENT: Reserved vs. Committed Pages
The TestLimit utility (which you can download from the Windows Internals book webpage) can be used to allocate large amounts of either reserved or private committed virtual memory, and the difference can be observed via Process Explorer. First, open two Command Prompt windows.
Invoke TestLimit in one of them to create a large amount of reserved memory:
实验:保留的与提交的页面
TestLimit工具(您可以从本书的web页面下载)可以用于分配大量的保留,或者私有提交虚拟内存,这两者的差异可以通过进程浏览器(Process Explorer)来观察。首先,打开2个命令行提示符窗口。在其中一个调用TestLimit命令创建一段大量的保留内存:
[FONT="微软雅黑"][SIZE="4"]C:\temp&testlimit -r 1 -c 800
Testlimit v5.2 - test Windows limits
Copyright (C) 2012 Mark Russinovich
Sysinternals -
Process ID: 1544
Reserving private bytes 1 MB at a time ...
Leaked 800 MB of reserved memory (800 MB total leaked). Lasterror: 0
The operation completed successfully.[/SIZE][/FONT]
In the other window, create a similar amount of committed memory:
在另一个窗口,创建相同数量的提交内存:
[FONT="微软雅黑"][SIZE="4"]C:\temp&testlimit -m 1 -c 800
Testlimit v5.2 - test Windows limits
Copyright (C) 2012 Mark Russinovich
Sysinternals -
Process ID: 2828
Leaking private bytes 1 KB at a time ...
Leaked 800 MB of private memory (800 MB total leaked). Lasterror: 0
The operation completed successfully.[/SIZE][/FONT]
Now run Task Manager, go to the Processes tab, and use the Select Columns command on the View menu to include Memory—Commit Size in the display. Find the two instances of TestLimit in the list. They should appear something like the following figure.
现在,打开Windows任务管理器,切换到进程选项卡,在主菜单的“查看”-&“选择列”,勾选“提交大小”来显示提交的内存。在进程列表中找到2个TestLimit实例,应该如下图所示:
Task Manager shows the committed size, but it has no counters that will reveal the reserved memory in the other TestLimit process.
Finally, invoke Process Explorer. Choose View, Select Columns, select the Process Memory tab, and enable the Private Bytes and Virtual Size counters. Find the two TestLimit processes in the main display:
任务管理器显示出提交大小,但是没有计数器揭示关于保留内存的信息。最后,我们调用进程浏览器,在主菜单的“查看”-&“选择列”,切换到进程内存选项卡,然后勾选Private Bytes与Virtual Size计数器的复选框。在主界面中找出2个TestLimit进程:
Notice that the virtual sizes of the two processes are identical, but only one shows a value for Private Bytes comparable to that for Virtual Size. The large difference in the other TestLimit process (process ID 1544) is due to the reserved memory.
注意,2个TestLimit进程的Virtual Size完全相同,但是只有其中一个显示出与Virtual Size大小接近的Private Bytes值。进程ID为1544的TestLimit进程的Private Bytes与Virtual Size差别如此之大,是因为此进程创建保留的内存,因为还没有实际提交使用,因此它不会被Private Bytes计数器考虑在内。
(译注:前文提到,“保[预]留内存是相对廉价的操作,因为它只消耗非常少的内存”,也许就是它的Private Bytes只有2.8M,而不是822M的原因。而对于Virtual Size计数器,不论是保留的还是提交的,都会将其考虑在内,这一点是需要注意的。下面提供在我自己机器上的实验截图,结果基本与上面描述的类似:)
The same comparison could be made in Performance Monitor by looking at the Process | Virtual Bytes and Process | Private Bytes counters.
可以在性能监视器中,查看 Process | Virtual Bytes and Process | Private Bytes 计数器,作出同样的比较。
Commit Limit
On Task Manager’s Performance tab, there are two numbers following the legend Commit. The memory manager keeps track of private committed memory usage on a global basis, termed commitm this is the first of the two numbers, which represents the total of all committed virtual memory in the system.
There is a systemwide limit, called the system commit limit or simply the commit limit,&&This limit corresponds to the current total size of all paging files, plus the amount of RAM that is usable by the operating system.
This is the second of the two numbers displayed as Commit on Task Manager’s Performance tab. The memory manager can increase the commit limit automatically by expanding one or more of the paging files, if they are not already at their configured maximum size.
Commit charge and the system commit limit will be explained in more detail in a later section.
在任务管理器的性能选项卡,“提交(GB)”后面跟着2个数值。内存管理器在全局基础上,跟踪记录私有提交内存的使用情况,这被称为commitment 或 commit charge,即第一个数值,它代表系统中所有提交的虚拟内存总合。还有一个系统级限制,叫做system commit limit 或者简称 commit limit,这个限制对应当前的所有分页文件总大小,增加可被操作系统使用的物理内存总量,即第二个数值。内存管理器可以通过扩展一个或多个分页文件,自动增加commit limit上限。(如果它们尚未配置最大上限)。本章后面部分将详细解释Commit charge 与 system commit limit。
(译注:经过实践发现,能为进程分配的虚拟内存大小受限于“system commit limit”的大小,即系统探测到的物理内存大小加上分页文件的大小,例如,在32位系统上,假设识别到的物理内存总量为3GB,并且设置了初始大小为2GB的分页文件,那么system commit limit的值为5GB;如果所有当前运行的进程被分配的虚拟内存总合达到这个限制,新的进程将无法运行,windows会给出页面文件太小,无法执行应用程序的错误提示;例如,运行vmware至少需要2GB的空闲system commit,以前面的5GB上限为例,如果当前的system commit已经使用了3GB,那么将无法运行vmware,即便勉强启动程序,系统的响应速度也会变得异常缓慢。此时可以通过增大页面文件的默认最小值来扩大“交换区”的大小,好让更多当前用不到的物理内存页能够交换到磁盘[因为磁盘的页面文件交换区已经扩大],从而给vmware的进驻内存准备更多的空间。配置的方法:在计算机上右击 -&属性 -& 高级系统设置 -&高级选项卡 -& 单击性能栏目的“设置”,再切换到“高级选项卡”,单击“虚拟内存”栏目的“更改”。虚拟内存的用词可能会产生误导,实际上它就是用来配置磁盘页面文件大小的。
使用sysinternal的进程浏览器,单击最上方工具栏右侧的第二个矩形区域,就能显示system commit的统计数据。另外还要注意一点:
由于32位windows的内核代码限制了系统识别的物理内存上限为4GB,再扣掉为硬件寻址[CPU芯片内的核心显卡共享系统内存,外围I/O设备自带的硬件缓冲区映射系统内存等等]保留的地址空间,实际能使用的仅有3GB左右的物理内存,下图给出为了寻址各种总线,总线控制器,以及充当集成显卡显存,而保留的内存范围例子:
因此无法通过添加物理内存的容量来提高system commit上限,只能通过增大页面交换文件的方式,而推荐设置的页面文件大小通常是识别出的物理内存的1.5~2倍;
这意味着32位windows 的页面文件大小为4.5~6GB,则system commit limit的值就为7.5~9GB;相反,64位windows内核代码支持CPU的48位地址总线模式,可以寻址16TB地址空间[Intel X64体系结构当前使用48位地址总线,理论上支持256TB地址空间,但是64位windows仅使用其中的44位,因此只能寻址16TB物理内存],以及通过各种非官方渠道开启物理地址扩展[PAE补丁]的32位windows,可以使用36位地址空间,也就是能够识别64GB物理内存,所以在64位windows和32位 PAE windows上,可以通过添加物理内存的容量来提高system commit上限,从而能够“同时”运行更多应用程序;一般而言,增大物理内存会比增大页面交换文件来的有效。)
Locking Memory
In general, it’s better to let the memory manager decide which pages remain in physical memory.
However, there might be special circumstances where it might be necessary for an application or device driver to lock pages in physical memory. Pages can be locked in memory in two ways:
■ Windows applications can call the VirtualLock function to lock pages in their process working set. Pages locked using this mechanism remain in memory until explicitly unlocked or until the process that locked them terminates. The number of pages a process can lock can’t exceed its minimum working set size minus eight pages. Therefore, if a process needs to lock more pages, it can increase its working set minimum with the SetProcessWorkingSetSizeEx function (referred to in the section “Working Set Management”).
■ Device drivers can call the kernel-mode functions MmProbeAndLockPages, MmLockPagableCodeSection, MmLockPagableDataSection, or
MmLockPagableSectionByHandle. Pages locked using this mechanism remain in memory until explicitly unlocked. The last three of these APIs enforce no quota on the number of pages that can be locked in memory because the resident available page charge is obtained when th this ensures that it can never
cause a system crash due to overlocking. For the first API, quota charges must be obtained or the API will return a failure status.
一般而言,最好让内存管理器决定哪些页面保留在物理内存中。然而,可能出现应用程序或设备驱动程序有必要在物理内存中锁定页面的特殊情况。有2种途径可以将页面锁在内存中:
■ Windows应用程序可以调用VirtualLock函数,在它们的进程工作集中锁定页面。页面锁定使用这种机制保留在物理内存中,直到显式解锁,或者直到锁定的进程终止。一个进程可以锁定的页面数量不能超出它的最小工作集大小减去8个页。因此,如果一个进程需要锁定更多页面,它可以通过SetProcessWorkingSetSizeEx函数,增加它的最小工作集(在“工作集管理”部分会提到)。
■ 对于设备驱动程序,可以调用内核模式函数MmProbeAndLockPages,MmLockPagableCodeSection,MmLockPagableDataSection,或者 MmLockPagableSectionByHandle。
页面锁定使用这种机制保留在物理内存中,直到显式解锁。最后3个APIs 强制无配额可被锁在内存中的页面数量,因为当驱动首次加载时,就获取了驻留在内存中的可用页面“装量”;强制无法配额可以确保驱动程序绝不会因为锁定过多的页面导致系统崩溃。对于第一个API(函数),必须获得配额装量,否则该函数会返回一个失败状态。(译注:本段译文不尽理想,有更好的译法请提出或指正)
Allocation Granularity
Windows aligns each region of reserved process address space to begin on an integral boundary defined by the value of the syst

我要回帖

更多关于 计算机硬件系统由哪几部分组成 的文章

 

随机推荐