java开发C编译器：把函数调用编译成字节码

来源：互联网发布：spss数据标准化公式编辑：程序博客网时间：2024/06/14 19:42

更详细的讲解和代码调试演示过程，请参看视频
用java开发C语言编译器

更详细的讲解和代码调试演示过程，请参看视频
如何进入google,算法面试技能全面提升指南

如果你对机器学习感兴趣，请参看一下链接：
机器学习：神经网络导论

更详细的讲解和代码调试演示过程，请参看视频
Linux kernel Hacker, 从零构建自己的内核

本节，我们研究如何把函数声明和函数调用转换成可执行的java 字节码，在完成本节代码后，我们的编译器能把下面代码编译成可被java 虚拟机执行的字节码，示例代码如下：

void f() {   printf("execute function f()");}void main() {   f();}

假设java 一个类含有如下方法：

public float computeSum(float a, float b) {    float c = a + b;    return c;}

那么上面的 java代码被编译成汇编语言时，可能会转换成下面这个样子：

.method public computeSum(FF)F.limit locals 3.limit stack 2float_1    ;把第一个参数从变量数组压入堆栈float_2    ;把第二个参数从变量数组压入堆栈fadd       ;把堆栈上头两个数值相加，把结果压入堆栈freturn    ;把堆栈顶部元素的数值返回.end method

上一节，我们讲过，java函数在执行时，java虚拟机会专门分配一个调用场景，在这个场景中，含有一个局部变量数组，用来存储输入参数和局部变量，另个一是操作堆栈，所以指令在执行时，都需要把指令要操作的对象或数值压入到操作堆栈上。

指令.limit locals 3 告诉虚拟机，在分配局部变量数组时，分配三个单位大小就可以了，一般来说，一个32位机器，一个单位大小就是4字节，因此.limit locals 3 就是要求虚拟机分配12字节大小的局部变量数组。

同理.limit stack 2 要求虚拟机分配两个单位大小，也就是8字节大小的操作堆栈即可。这两条指令，后面我们在分析虚拟机的代码合法性检测算法时，再详细研究，本节我们知道大概意思即可。

这节，我们需要重点研究的是，函数的声明。 .method 指令的作用是，告诉虚拟机，接下来的代码是函数的声明和实现。在该指令后面跟着的是函数名和参数列表，形式如下：
name(type1 type2 type 3….) type_return

type1 type2 … 描述的是函数的参数类型，type_return 描述的是函数返回值的类型。一个函数，它的调用范围可以是public , protected 以及private，如果他是public 的，那么转换成java汇编时，形式为.method public, 如果是protected，那么转换成汇编时就是.method protected,以此类推。

函数名，结合函数的输入参数类型，以及函数的返回值，这些信息结合在一起，就形成了函数的签名，例如：
public void main(String argv[]) 转换成汇编后，它对应的函数签名为：
.method public main([Ljava/lang/String;)V
其中，左括号[表示数组，Ljava/lang/String对应的是类对象String, 其中开头的L表示接下来的字符串对应的是一个类变量的声明，如果是基础类型，那么开头就不会出现L,例如int a[] 那么转换为汇编为: [I 其中左括号[表示数组，I表示数据类型为整形。

如果返回值数据类型为void,那么就在函数输入参数列表后面跟着void对应的类型，也就是大写的字母V.
下面是一些函数声明转换成java汇编后的样子：
java: public void static main(String argv[]) ->
汇编: .method public static main([Ljava/lang/String;)V

java: public int read(byte[] data, int offset, int len) ->
汇编: .method public read([BII)I

java: public void println() ->
汇编: .method public println()V

通过上面的分析，我们可以得知，我们要编译的函数void f() 应该转换成：
.method public static f()V

由于C语言中，没有类的概念，因此所有除了函数，我们都加上修饰符static.函数中使用的各种操作指令，例如float_1等，我们在后面的章节中在详细研究，这一节，我们知道函数如何声明，以及如何被调用就可以了。

前一节我们看到，调用一个类的成员函数时，需要的指令是invokevirtual,如果要调用的是类的静态函数，那么需要的指令就是invokestatic,具体使用格式如下：

invokestatic 类名/方法签名

有了上面的理论知识后，我们看看如何将C语言的函数调用转换成对应的java虚拟机代码。

我们解释器的一个特点是，读取到一条语句，我们就执行该语句，所以在把C编译成java字节码时，也是读取一条语句才顺便编译成一条语句。于是在本节开头的代码中，函数f 比mai声明的早，但我们的解释器遇到它时并没有立马就将它转换成java字节码，而是等到main函数中，对函数f进行调用时，解释器才会在执行f的过程中，把f编译成java字节码。

我们看看实现代码，生成java汇编语言的类分别是CodeGenerator,以及ProgramGenerator,他们的代码如下：

package backend.Compiler;import java.io.File;import java.io.FileNotFoundException;import java.io.PrintStream;import java.io.PrintWriter;import java.util.ArrayList;import java.util.HashMap;public class CodeGenerator {    private  PrintWriter assemblyFile;    private  int instructionCount = 0;    private  boolean buffered = false;    private  String bufferedContent = "";    protected static String programName = "CSourceToJava";    private HashMap<String, String>nameToDeclaration = new HashMap<String, String>();    public void setNameAndDeclaration(String name, String declaration) {        nameToDeclaration.put(name, declaration);    }    public String getDeclarationByName(String name) {        return nameToDeclaration.get(name);    }    public void setInstructionBuffered(boolean isBuffer) {       this.buffered = isBuffer;    }    public CodeGenerator() {        String assemblyFileName = programName + ".j";        try {            assemblyFile = new PrintWriter(new PrintStream(new                     File(assemblyFileName)));        } catch (FileNotFoundException e) {            // TODO Auto-generated catch block            e.printStackTrace();        }    }    public void emitString(String s) {        if (buffered) {            bufferedContent += s + "\n";            return;        }         assemblyFile.print(s);        assemblyFile.flush();    }    protected void emitBufferedContent() {        if (bufferedContent.isEmpty() == false) {            assemblyFile.println(bufferedContent);            assemblyFile.flush();        }    }    public void emitDirective(Directive directive) {        if (buffered) {            bufferedContent += directive.toString() + "\n";            return;        }         assemblyFile.println(directive.toString());        assemblyFile.flush();        ++instructionCount;    }    public void emitDirective(Directive directive, String operand) {        if (buffered) {            bufferedContent += directive.toString() + " " + operand + "\n";            return;        }         assemblyFile.println(directive.toString() + " " + operand);        assemblyFile.flush();        ++instructionCount;    }    public void emitDirective(Directive directive, int operand) {        if (buffered) {            bufferedContent += directive.toString() + " " + operand + "\n";            return;        }        assemblyFile.println(directive.toString() + " " + operand);        ++instructionCount;    }    public void emitDirective(Directive directive, String operand1, String operand2) {        if (buffered) {            bufferedContent += directive.toString() + " " + operand1 + " " + operand2 + "\n";            return;        }        assemblyFile.println(directive.toString() + " " + operand1 + " " + operand2);        ++instructionCount;    }    public void emitDirective(Directive directive, String operand1, String operand2, String operand3) {        if (buffered) {            bufferedContent += directive.toString() + " " + operand1 + " " + operand2 + " " + operand3 + "\n";            return;        }        assemblyFile.println(directive.toString() + " " + operand1 + " " + operand2 + " " + operand3);        ++instructionCount;    }    public void emit(Instruction opcode) {        if (buffered) {            bufferedContent += "\t" + opcode.toString() + "\n";            return;        }        assemblyFile.println("\t" + opcode.toString());        assemblyFile.flush();        ++instructionCount;    }    public void emit(Instruction opcode, String operand) {        if (buffered) {            bufferedContent += "\t" + opcode.toString() + "\t" + operand + "\n";            return;        }        assemblyFile.println("\t" + opcode.toString() + "\t" + operand);        assemblyFile.flush();        ++instructionCount;    }    public void emitBlankLine() {        if (buffered) {            bufferedContent += "\n";            return;        }        assemblyFile.println();        assemblyFile.flush();    }    public void finish() {        assemblyFile.close();    }}

相比于以前的变化是，这里添加了一个变量buffered,如果它是true,那么输出指令时，先把要输出的代码字符串缓存到变量bufferedContent中。在解释器的实现中，处理函数声明的类为FunctDeclExecutor, 在这个类中，我们根据函数名和参数类型，构造函数的签名，代码如下：

package backend;import java.util.ArrayList;import java.util.Collections;import backend.Compiler.Directive;import backend.Compiler.ProgramGenerator;import frontend.CGrammarInitializer;import frontend.Declarator;import frontend.Specifier;import frontend.Symbol;import frontend.TypeSystem;public class FunctDeclExecutor extends BaseExecutor {    private ArrayList<Object> argsList = null;    private ICodeNode currentNode;    ProgramGenerator generator = ProgramGenerator.getInstance();    @Override    public Object Execute(ICodeNode root) {        int production = (Integer)root.getAttribute(ICodeKey.PRODUCTION);        Symbol symbol = null;        currentNode = root;        switch (production) {        case CGrammarInitializer.NewName_LP_RP_TO_FunctDecl:            root.reverseChildren();            ICodeNode n = root.getChildren().get(0);            String name = (String)n.getAttribute(ICodeKey.TEXT);            if (name != null && name.equals("main") != true) {                String declaration = name+"()V";                generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);                generator.setNameAndDeclaration(name, declaration);            }            copyChild(root, root.getChildren().get(0));            break;        case  CGrammarInitializer.NewName_LP_VarList_RP_TO_FunctDecl:            n = root.getChildren().get(0);            name = (String)n.getAttribute(ICodeKey.TEXT);            if (name != null && name.equals("main") != true) {                String declaration = name + emitArgs();                generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);                generator.setNameAndDeclaration(name, declaration);            }            symbol = (Symbol)root.getAttribute(ICodeKey.SYMBOL);            //获得参数列表            Symbol args = symbol.getArgList();            initArgumentList(args);            if (args == null || argsList == null || argsList.isEmpty()) {                //如果参数为空，那就是解析错误                System.err.println("Execute function with arg list but arg list is null");                System.exit(1);            }            break;        }        return root;    }    private void initArgumentList(Symbol args) {        if (args == null) {            return;        }        argsList = FunctionArgumentList.getFunctionArgumentList().getFuncArgList(true);        Symbol eachSym = args;        int count = 0;        while (eachSym != null) {            IValueSetter setter = (IValueSetter)eachSym;            try {                /*                 * 将每个输入参数设置为对应值并加入符号表                 */                setter.setValue(argsList.get(count));                count++;            } catch (Exception e) {                // TODO Auto-generated catch block                e.printStackTrace();            }            eachSym = eachSym.getNextSymbol();        }    }    private String emitArgs() {        argsList = FunctionArgumentList.getFunctionArgumentList().getFuncArgList(true);        String args = "(";        for (int i = 0; i < argsList.size(); i++) {            Symbol symbol = (Symbol)argsList.get(i);            String arg = "";            if (symbol.getDeclarator(Declarator.ARRAY) != null) {                arg += "[";            }            if (symbol.hasType(Specifier.INT)) {                arg += "I";            }            args += arg;        }        args += ")V";        generator.emitString(args);        return args;    }}

其中，如果函数是没有输入参数的，那么下面的代码会被执行：

case CGrammarInitializer.NewName_LP_RP_TO_FunctDecl:....if (name != null && name.equals("main") != true) {                String declaration = name+"()V";                generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);                generator.setNameAndDeclaration(name, declaration);            }....

在if判断里面，我们根据被调用的函数名，后加一对括号，然后接着一个大写V,B表示返回值类型为void, 这里，我们暂时假设函数的返回值类型都是void。

如果调用函数是有参数的，则进入第二个case,在initArgument调用中，调用函数emitArgs,根据参数的类型输出参数列表，这里，我们暂时只处理数组和整形类型。

当函数f被调用时，解释器才开始对f进行编译，执行函数调用时，解释器才会开始把函数f编译成java汇编代码，执行函数调用的代码在UnaryNodeExecutor中，代码如下：

case CGrammarInitializer.Unary_LP_RP_TO_Unary:case CGrammarInitializer.Unary_LP_ARGS_RP_TO_Unary:....ICodeNode func = CodeTreeBuilder.getCodeTreeBuilder().getFunctionNodeByName(funcName);            if (func != null) {                Executor executor = ExecutorFactory.getExecutorFactory().getExecutor(func);                ProgramGenerator.getInstance().setInstructionBuffered(true);                executor.Execute(func);                ProgramGenerator.getInstance().emitDirective(Directive.END_METHOD);                ProgramGenerator.getInstance().setInstructionBuffered(false);                compileFunctionCall(funcName);                Object returnVal = func.getAttribute(ICodeKey.VALUE);                if (returnVal != null) {                    System.out.println("function call with name " + funcName + " has return value that is " + returnVal.toString());                    root.setAttribute(ICodeKey.VALUE, returnVal);                }            }....

在执行被调函数，也就是语句executor.Execute(func)之前，我们先通过setInstructionBuffered 将CodeGenerator的 buffered变量设置为true,这样，当生成任何代码时，都将会被缓存起来，而不是直接写入到.j文件。

当函数f执行时，解释器开始把f编译成java汇编，在函数f内部调用了printf函数，该函数如何转换成对应的java汇编指令，我们前一节已经解释过了，当函数f解释执行完毕后，程序回到上面代码executor.Execute(func)语句的下一句去执行，在程序中，由于函数内部的代码已经全部编译完毕，因此我们输出指令Directive.END_METHOD 用来表示当前是函数的结尾。接着通过通过调用setInstructionBuffered(false)告诉代码转换器CodeGenerator，接下来直接输出编译代码而不需要缓存。函数compileFunctionCall(funcName)的作用是把函数调用转换为java虚拟机中，对函数进行调用的指令，例如：
f()
会被转换成：
invokestatic CSourceToJava/f()V

接下来，ProgramGenerator也做一些微调，情况如下：

package backend.Compiler;public class ProgramGenerator extends CodeGenerator {    private static ProgramGenerator instance = null;    public static ProgramGenerator getInstance() {        if (instance == null) {            instance = new ProgramGenerator();        }        return instance;    }    private ProgramGenerator()  {    }    public String getProgramName() {        return programName;    }    public void generateHeader() {        emitDirective(Directive.CLASS_PUBLIC, programName);        emitDirective(Directive.SUPER, "java/lang/Object");        emitBlankLine();        emitDirective(Directive.METHOD_PUBBLIC_STATIC, "main([Ljava/lang/String;)V");    }    public void finish() {        emit(Instruction.RETURN);        emitDirective(Directive.END_METHOD);        emitBufferedContent();        emitDirective(Directive.END_CLASS);        super.finish();    }}

最后在解释器的入口函数处，也做一些改变：

package frontend;import backend.CodeTreeBuilder;import backend.Intepretor;import backend.Compiler.ProgramGenerator;public class BottomUpParser {    public static void main(String[] args) {        /*         * 把ProductionManager , FirstSetBuilder 两个类的初始化移到CGrammarInitializer         * 将SymbolDefine 修改成CTokenType, 确定表达式的first set集合运算正确         */        ProductionManager productionManager = ProductionManager.getProductionManager();        productionManager.initProductions();        productionManager.printAllProductions();        productionManager.runFirstSetAlgorithm();        GrammarStateManager stateManager = GrammarStateManager.getGrammarManager();        stateManager.buildTransitionStateMachine();        System.out.println("Input string for parsing:");        LRStateTableParser parser = new LRStateTableParser(new Lexer());        parser.parse();        ProgramGenerator generator = ProgramGenerator.getInstance();        generator.generateHeader();        //java assembly code will created when intepretor iterate every code node        CodeTreeBuilder treeBuilder = CodeTreeBuilder.getCodeTreeBuilder();        Intepretor intepretor = Intepretor.getIntepretor();        if (intepretor != null) {            intepretor.Execute(treeBuilder.getCodeTreeRoot());          }        generator.finish();    }}

当上面代码完成后，我们给定的C语言例子会编译成如下的java汇编代码：

.class public CSourceToJava.super java/lang/Object.method public static main([Ljava/lang/String;)V    invokestatic    CSourceToJava/f()V    return.end method.method public static f()V    getstatic   java/lang/System/out Ljava/io/PrintStream;    ldc "execute function f()"    invokevirtual   java/io/PrintStream/println(Ljava/lang/String;)V    return.end method.end class

有了上面代码后，通过Java汇编编译器，把上面代码转换成java字节码，命令如下：
java -cp oolong.jar COM.sootNsmoke.oolong.Oolong CSourceToJava.j

运行上面命令后，在本地目录下会产生一股CSourceToJava.class文件，接着在控制台上，通过java命令运行该字节码文件后，得到结果如下：

这里写图片描述

根据运行结果，可以检测，我们编译器编出的java汇编语言是正确的。更详实的代码解读和代码调试说明，请参看视频。

更多技术信息，包括操作系统，编译器，面试算法，机器学习，人工智能，请关照我的公众号：
这里写图片描述

阅读全文

0 0