如何阅读大型代码库？

来源：互联网发布：.net跟java的区别编辑：程序博客网时间：2024/04/29 00:15

Casey问我：“对于新手，有什么有针对性的诀窍来阅读大型代码库吗？”

碰巧，我认为这是一个非常好的问题。我觉得想要成为一个优秀的开发者，阅读代码库并弄清清楚内部是怎么回事的能力非常重要。在你的职业生涯中你会中途加入一个现有的项目并被要求迅速融入进去。或者，甚至更难，会有一个项目丢给你让你自己一个人搞清楚。

最坏的情景就是你被带入一个项目，要你替换掉让工程运行失败的“那些肆无忌惮的*杂种”，并且让工程运行起来。不过更常见的情景是你被要求维护一个已经离职的员工写的代码库。最后，当然，如果你用了任何开源的项目，很大的可能是被要求“你可以扩展它让它也能做这个功能吗？”亦或者你只是好奇。

尤其是新手程序员，我强烈建议阅读代码库，看看以下我是怎么做的，然后你需要实际的去阅读代码。

当我接触到新的代码库时，我常常忽略文档和表面的细节。目的是摒弃先入为主的关于它怎么运行的想法。我试图从文件结构上找出项目的结构。仅仅这个就能告诉你很多，我常常试图找出它的结构。这是整个系统的核心吗？它是怎么分割的？等等。

之后我会找到最底层的代码然后开始阅读。我常常用字典序来读。找到一个文件，读完它，然后读下一个文件。我尽量记录下来关于这些东西是如何连接在一起的（你可以在博客里找到关于记笔记的例子），但我做的最多的找到对这个代码的感觉。有很多代码常常是项目风格的一部分，比如预处理检查，日志记录，抓取错误等等。你可以先了解这部分内容，之后就可以忽略它们阅读有趣的部分。

我通常不在某一点上阅读太深，我会试图宏观的找到感觉。比如：这个文件通过调用Y和Z返回了X，但在这个点上阅读每一个细节对我来讲并不真的重要。哦对了，我还记录笔记，很多笔记。往往它们不是真的笔记而更像是问题清单，在这里我理解的越多，加入的问题和写入的回答也就更多。在阅读完我能找到的最底层代码之后，我会做一个纵向的比较。这是最让我能弄清楚事情是如何布局和工作的。这就意味着下一次我来看这部分的时候，对于代码结构我会有更好的想法。

接下来，我会找有意思的部分。系统当中对我有意思的部分而不是被我束之高阁的部分。

这部分内容很多，但其实要做的并不多。我仅仅是通读一遍代码首先找到结构，之后我会认真研读独一无二的部分并找出他们是如何写的。

在这期间，尤其是遇到难点的时候，我会试图寻找任何文档（只要有的话）。对于这一点，我应当首先知道代码是如何组织的，这样我才能更快的阅读文档。

*原作者注：我一开始只写了肆无忌惮，不过这样更有趣。

The DevOps Zone is presented by Puppet Labs and New Relic. Check out PuppetLabs' 1K+configurations and use New Relic for free — they are the answer to DevOps' lack of monitoring tools

Casey asked me:

Any tips on how to read large codebases - especially for more novice programmers?

As it happens, I think that this is a really great question. I think that part of what makes someone a good developer is the ability to go through a codebase and figure out what is going on. In your career you are going to come into an existing project and be expected to pick up what is going on there. Or, even more nefarious, you may have a project dumped in your lap and expected to figure it out all on your own.

The worst scenario for that is when you are brought in to replace “those incontinent* bastards” that failed the project, and you are expected to somehow get things working. But another common scenarios for this include being asked to maintain a codebase written by a person who left the company. And finally, of course, if you are using any Open Source projects, there is a strong likelihood that you’ll be asked to “can you extend this to also do this”, or maybe you are just curious.

Especially for novice programmers, I would strongly recommend that you’ll do just that. See the rest of the post for actual details on how I do that, but do go ahead and read code.

I usually approach new codebases with a blind eye toward documentations / external details. I want to start without preconceptions about how things are working. I try to figure out the structure of the project from the on disk structure. That alone can tell you a lot. I usually try to figure out the architecture from that. Is there a core part of the system? How is it split, etc.

Then I find the lowest level part of the code just start reading it. Usually in blind alphabetical order. Find a file, read it all the way, next file, etc. I try to keep notes (you can see some examples of those in the blog) about how things are hooked together, but mostly, I am trying to get a feel for the code. There is a lot of code that is usually part of the project style, it can be things like precondition checks, logging, error handling, etc. Those things you can learn to recognize early and then can usually just skip them to read the interesting bits.

I usually don’t try to read too deeply at this point, I am trying to get a feeling about the scope of things. This file is responsible for X and do so by calling Y & Z, but it isn’t really important to me to know every little detail at that point. Oh, and I keep notes, a lot of notes. Usually they aren’t really notes but more a list of questions, which I fill / answer as I understand more. After going through the lowest level I can find, I usually try to do a vertical slice. Again, this is most so I can figure out how things are laid out and working. That means that the next time that I am going to go through this, I’ll have a better idea about the structure / architecture of this.

Next, I’ll usually head to the interesting bits. The part of the system that make it interesting to me rather than something that is off the shelve.

That is pretty much it, there isn’t much to it. I am pretty much just going over the code and trying to first find the shape & structure, then I dive into the unique parts and figure out how they are made.

In the meantime, especially if this is hard, I’ll try to go over any documentation exists, if any. At this point, I should have a much better idea about how things are setup that I’ll be able to really go through the docs a lot more quickly.

* I started writing incompetent, but this is funnier.