关于嵌入式Java虚拟机 --- CVM

来源：互联网发布：索尼z3手机优化编辑：程序博客网时间：2024/06/04 19:46

关于嵌入式Java虚拟机 --- CVM

Root Data结构体
设计CVM的一个标准就是可以重新启动，无论这个宿主操作系统是否支持进程。抛开进程的支持，来完成冲洗启动虚拟机的过程需要我们能够释放在启动CVM过程中所有malloc'ed 的内存。为了使之变得简单(而且，不论怎么说这也算的上是一个好习惯) ，我们会确保所有的数据都是可以从内存中的单根树的根节点出发找到。这个根节点结构体是 CVMglobals ，它位于上图中的左边。你可以在 globals.h here (也可以在这个文件中找到 CVMGlobalState )中找到VMglobals 的定义。仔细看看 CVMglobals，你会发现它是所有系统全局数据结构的集合体。把所有的全局变量用一个数据结构记录下来，也可以把这些值恢复为初始状态变的较为简单。比如是通过 memsetting 把所有的字节置为0。 (当然，这需要在我们正确的完成了子树的清理工作后进行)。

GC 以及 Java 堆
从全局的变量中，可以找到含有或者持有GC配置以及管理信息的数据结构。(CVMglobals.gc). 从这里，最终你可以获得Java堆。

CVM 有一个可以嵌入的 GC 结构。这里的可嵌入值得是编译时的可嵌入。者允许试验性质的GC可以进行试验而变得稳定。目前，唯一的具有产品质量的GC是 generational GC (见 here 和 here 相关的 GC 标准实现文件)。

所有的Java 对象, 也就是所有继承自 java.lang.Object, 都是从Java 堆分配的内存。唯一的例外是那些预链接的(ROMized) java对象。这些数据保存在全局数据中。Java堆其自身是从C堆中分配的。所有的其它的数据结构要么是全局数据(也就是.bss段, .data段，或者其它类似的)，要么就是从C堆中分配的。

JIT 与编译的代码
CVMglobals 同样维护着JIT的配置和管理记录。 (CVMglobals.jit)。遍历树，你最终可以找到 JIT code buffer (或者叫做代码缓存code cache).。这里的代码缓存目前是固定大小的。(但是在运行时是可以配置的) 这块内存是在虚拟机启动时分配的。一旦分配了内存，内存的大小就不可以变更了。

但一个Java方法被JIT编译了，那么JIT就会生成比特数据(就是提到的 compiled method) 缓存到JIT的代码缓存中。编译过的方法的元数据 (由JIT生成的) 同样也活储存在代码缓存中.。因此，代码缓存的大小就要规定出来，以确定有多少个方法可以被编译。

Java 对象与类
当一个类文件被加载到内存中以后，它的内容被解析并组织成了优化的数据结构。他被分配到了C堆里面。这个数据结构是 CVMClassBlock，它保存了这个类的所有的元数据。源数据包括了 constantpool, 类属性，类字段，方法的信息和字节码等等。对于每一个 CVMClassBlock，都有一个 java.lang.Class 实例与之对应，这个实例也是从Java 堆中分配的。一旦一个类被正确的加载后，类与Class 的实例会议之成对的存在。 classblock 会有一个指向class对象的引用，反之亦然。当一个类被卸载后者两者都回被释放掉。

每一个CVM中的对象都有两个字的头。第一个字通常是一个指向 classblock 的指针。然而，这里的头信息对于Java 代码来说是不可见的。只能在CVM的C代码里使用。注意：因为 java.lang.Class 继承了java.lang.Object， Class类的实例也将有这两个字的头信息。

Key files to look at are objects.h and classes.h. See here for the files.

Java 线程
为了能执行任何事，虚拟机必须支持线程。每一个线程都表现为 CVMExecEnv (也叫做 ee).。在虚拟机中，ee本质上是一个线程的标识符。所有的线程操作都需要当前线程的ee作为参数使用。见 interpreter.h here 和interpreter.c here.

在ee和java.lang.Thread 实例之间有一对一的映射关系。一旦一个线程被正确的初始化以后，这两者将一直成堆的存在。

同样，ee和JNIEnv.之间也有一一对应的关系。 JNIEnv 是作为ee的一个成员出现的。 ee 与 JNIEnv地址之间的映射基本上只会在偏移量调整的时候用到。

所有的 ee都已经被连接成一条链表了，链表的头是 CVMglobals.threadList。主线程的ee是CVMglobals的成员。其它的都是 malloc'ed。

系统的互斥体
操控虚拟机中的线程需要实现同步，这在虚拟机中的其它的子系统和资源中也是一样的。同步操作通常都是由 CVMSysMutex 完成的。(见sync.h here 和sync.c here)。There are several sysMutexes allocated at VM boot time. These mutexes are not visible to Java code, only VM C code. They are only used by VM code, not Java code.

Each sysMutex has a dedicated purpose (e.g. the CVMglobals.threadLock is for synchronizing the thread operations), and is ranked. In order to prevent deadlock, sysMutexes can only be locked in increasing rank order. When CVM is built with assertions enabled, this rank order will be asserted.

Java Execution Stack
Any thread of execution must have an execution stack. In CVM, each Java thread has 2 physical stacks: a native stack, and a Java stack. The native stack is the one that is allocated by the OS, and is used for C code execution. It holds the activation records (i.e. stack frames) of native code, and VM code including the interpreter loop function. It also holds activation frames for JIT compiled code (with a twist).

The Java stack (also known as the interpreter stack) is used to hold the activation records of Java methods. For each Java method that is executed, a frame will be pushed on this stack. Stack and frame data structures are defined in stacks.h here and stacks.c here .

If you dump a trace of the native stack when executing several Java methods, you will see stack frames for C code and the interpreter loop. If you dump a trace of the Java stack, you will only see stack frames for the Java methods that have been invoked. If you have a native method in the invocation chain, you will see a stack frame in both the native and Java stack. This is because the native method is both a C function and a Java method at the same time.

GC Roots and Root Stacks
In GC terms, CVM is called an exact VM. This means that at the time of GC, we will be able to know definitely where all the object pointers are in the system. This is in contrast with conservative GC systems which requires you to guess whether some piece of memory contains an object pointer or just some random data that resembles an object pointer.

All reachable (and therefore live) objects in the VM can be found by tracing this tree (or trees) of object references called the GC root tree. The tree starts from a root reference. These root references are essentially globals, and are usually stored in data structures called root stacks. An example of this is CVMglobals.globalRoots. Strictly speaking, these data structures need not be stacks. They are actually used as lists. However, our Java stack data structures have properties that fulfills the needs of GC root stacks nicely, and doesn't require us to write additional code (good for code efficiency). So, we just use the stacks.

If an object cannot be found by tracing the root trees, then that object is unreachable and therefore can be reclaimed by the GC.

Note that in traversing a tree, at any point in the traversal, a node can be the root of a new subtree. Hence, the term root or GC root is sometimes used to refer to object pointers / references that are found alone the way in a root scan. GC roots can be found in the root stacks, in thread execution stacks, and in object and class fields.

the End
That should be enough to give you an overall idea of how the major data structures are laid out in CVM. Note: most of the things I told you above is meant to give you a good conceptual model of the lay of the land. In practice, there will be exceptions in some cases for various reasons. Sometimes, these exceptions will break the rules. Other times, they are like extension to the rules. To keep things simple, I left out the exceptions. I may get into those when I talk about each subsystem and/or data structure specifically.

In the above, I also left out many juicy details like ... why allocated a data structure from the C heap vs the Java heap. I'll leave that for subequent discussions.

So, in the next few days (or weeks), I will zoom in on the CVM subsystems and/or data structures (one at a time), and talk about them in detail. This will include mechanical details as well as design philosophies for why things are the way they are (when relevant, of course). Again, feel free to ask questions or make requests for topics. I will try to accommodate as much as I can.

Have a nice day. :-)