How to Contribute to Open Source Projects

来源:互联网 发布:不干胶标签设计软件 编辑:程序博客网 时间:2024/05/16 00:40

转载至:http://drdobbs.com/open-source/231000080

Brian Behlendorf, the founder of the Apache Web Server project and a lead developer on Subversion, discusses how to get started on an OSS project — and what to expect

Choosing the Right Project

DDJ: Let's say I'm a developer with some experience and I'm interested in contributing to one of the Apache projects. How do I get started?

BB: What's your motive?

DDJ: To get more experience, working with bright folks who are doing stuff in an area that interests me.

BB: That's not the most usual path. More often, it's a developer who wants help to solve a specific technical problem. After some Google searching, he's found some packages that claims to do x. And, if there's one that's free, that's the one that's going to get evaluated first. And he starts pulling it down and playing with it.

DDJ: Yes. And then what does he do?

BB: Let me talk about a sequence of events that's more likely to happen. The first step is generally to determine whether the software is any good.

When I look at a piece of open source software before I download code, I'm looking to see whether a lot of people are complaining about broken installations, or if there are questions that suggest poor programming practices. And are people getting answers quickly?

Every good open source project has a public discussion forum, email or forum-based, and has developers who have a stewardship mentality about it and care about happy customers, even if they're not paying customers. So, before even touching the code, I would evaluate the community — because there is an awful lot of code that has no community behind it, such as somebody open sourcing something they worked on at their last job, or an overnight hack, with no intention of making it usable.

Evaluate the community, look for activity, look for a release every couple of months, people who've used it and said good things or even bad things.

DDJ: So, if you determine the project fits your criteria, then what?

BB: You read some docs, you watch some people talking. Then you download the code, compile it, install it, and give it a dry run. If it's running and doing some cool things for you, you might show it to your boss. You then deploy it to your production — and you're running. Then you discover there's a bug.

Now, what do you do with it? You probably dive in to figure out what's really going on in there, and in the course of that, you go rooting through the discussion lists and the developer lists. Most projects have a differentiation between the users list and the developers list, and that's so that the developers can stay focused on building new stuff and those who want to help the community and the community themselves can support each other with a basic Q&A somewhere else.

DDJ: Yes.

Start By Contributing Defect Data

BB: In the course of doing this, you probably looked in the issue tracker to see if anyone else has reported a "foo-bar not-found" sort of thing. In the course of narrowing it down, you've either realized that it was a mistake on your part or that you've found a bug.

The bug [might be] a pre-existing, pre-known defect and maybe you can actually add some data to it so the next person can find it more easily than you did. In which case, you want to get an account on the issue tracker and post a comment on that issue, saying, "Hey, this is happening to me, too," and try to contribute, add to the conversation that's there.

This helps because bugs take a long time to get from first being noticed to being resolved. Developers often ignore data from users, trying to recreate the conditions under which things happen. The vast majority of the work in fixing a bug is something that even users who don't understand the code can actually help with. They can actually try to replicate the bug, write a test case, etc.

And that is all extremely valuable. A programmer who is familiar with the code may be able to dive in and fix it. But that's such tedious work and it's high value because it's tedious and no one really wants to do it. So, to start, try looking for the outstanding defects and see whether they need further triage.

DDJ: OK.

BB: Karl Fogel talks about this in his book, Producing Open Source Software, [a book, which without a doubt is the best guide available for running an OSS project. — Ed.] namely, the benefit of marking certain issues as bite-sized tasks — things that developers could take on to understand the layout of the code, how different systems call each other, etc. Because there are often bugs that aren't big architectural defects but off-by-one errors or edge-case kind of things that benefit from a lot of triage.

DDJ: An excellent idea.

BB: Throughout all this, there is the conversation on the users' or developers' list. These messages are the lifeblood of the community. It's the banter across the dinner table that drives the process. Join either the users' or developers' list. Let that simmer in the background. Don't pretend you have to understand every word, just get an ear for the music of the discussions. Eventually, you'll see comments that map to some of the situations that you see. Some of these lists have100 messages a day. You can't read all of them but you can get a feel for the gestalt of the project.

Contributing to Documentation

DDJ: That's a great sequence to start with. I notice that on many projects that are trying to solicit participation, they recommend working on documentation, which always seems to be in short supply. How does that work?

BB: It's amazing to me how people think of documentation as easy or an afterthought, but there's a huge difference between documentation written by someone coming up the learning curve and documentation written by someone who really knows it. I'd say well designed and engineered documentation is more important than well designed and engineered source code. Because that's the ladder people climb up to go from casual first-time user to core user and core developer. And that has to be a solid ladder. A lot of projects try to encourage the developers, when they commit a source code change, to concurrently commit documentation changes. That's a high bar though, because many developers are not English-as-a-first-language, or are not proficient writers.

I'd say the other caveat is I think having new users come in and contribute to training materials is more appropriate. I think the format of training (especially screen capture and video, because it's a form of performance art), really forces you to learn the material: "Here's why Drupal is a kick-ass CMS, and here's how to build your first site with it." There's a saying: People remember 10 percent of what they're told and 90 percent of what they teach.

Working Your Way Up Through the Meritocracy

DDJ: Developers who are contributing out of ambition rather than because they have a specific problem to be solved may believe that the meritocracy provides a certain type of reward. Being a contributor is a feather in the cap if it's an esteemed project. So what typically moves somebody in the community's eyes from being just an occasional contributor into one of the leads, or a formal position on a project?

BB: Some projects, like Apache, have more formal recognition of a developer as a committer — granted certain privileges on the repository. Even though commits can always be backed out, it is generally considered a mark of honor that other developers trust you enough to give you the keys. Other projects give out commit privileges like candy — apparently Gnome — and the premise behind that is it should be easy for everybody to throw their patches into the pool and we'll filter and sort through them later.

That's partly a tool question; Git is easier [for] managing a lot of users who aren't core committers. But being a committer on Apache is a big deal. The decision is made by one committer proposing to other committers on an individual project, "This person has contributed lots of valuable patches in the past, and has been helpful to new users."

There's always some work on a project that goes beyond self-interest. There's talk about aggregated self-interest, but it's actually enlightened self-interest, in that you've got to write code that can be understood by others, and when somebody has a newbie question, helping them find the answer to that question will pay off tremendously. There's always going to be many more users than developers, so it's incumbent on developers to give a little user support, and help new developers over the hurdles in getting their environment running and understanding the code layout. Someone who shows that level of altruism — it doesn't have to be full time — but there are a lot of people.

At Apache, it's a recognition not just for a few good patches, but for a commitment, a communication style, and an understanding of this thing called the Apache Way, which is not clearly defined but generally is do unto others as you'd have them do unto you: Have high quality code, be clear in your communication, and have a team-oriented spirit. That's the criteria on Apache to be awarded commit privileges. And just be human, be on the mailing list, be helpful, help get the bug queue down. No active open source project has no bugs open. There's always something to do there.

DDJ: Jeff Fredrick, who headed up the CruiseControl project for a long time, told me that one of the things that happens is that the people who should become committers generally stand out by the nature of commitment and contributions. There's not a lot of discussion, it's generally pretty clear. Would you agree?

BB: Hmm, I can think of frequent examples of significant private conversations among committers over whether someone should have commit privileges, although that's less controversial because committer privileges can always be revoked.

DDJ: What about not granting privileges?

BB: I think there are some projects that err too much on the side of not granting commit privileges. It can seed various conspiracy theories as to whether it's justified or not. Sun, for example, with Open Office, really never gave a lot of commit privileges outside the Sun developers, because their working style was focused on a small cluster of developers in one physical location, having worked together for 20 years, and they found it hard to trust other developers. So that's a case where they probably erred too much on the side of holding commit privileges too close to the vest.

DDJ: What about using branches and forks?


BB: I do think both Subversion and Git have made it easier for people to maintain branches and forks of code than it used to be, so there tend to be fewer fights over commit privileges. Instead, what you see is people just working. And they'll say, "You've got a good code base, I've got an extra patch, here's my tree, you can pull that patch from my tree, or someone else can build a derivative from that." In some ways, I think this has actually hurt the ability for communities to gel around a single code base. For instance, the Linux kernels that ship with all the different distros out there, it's pretty much a different kernel per distro. Different combinations of patches and settings. I think the Linux foundation does a good job of driving the Linux standard base, and we have much more conformance than we might otherwise have. It's still tough.

Every Apache project has a single code base. It has development branches and current branches and stable branches and all that, but the pool of developers are still focused on building one thing and building it iteratively.

Mistakes New Contributors Make

DDJ: What are the mistakes that you've seen people joining projects frequently make?

BB: Well, a little more than 10 years ago, on the Apache main HTTP developers mailing list, there was a developer who showed up from a well-regarded UNIX vendor for graphics machines, let's say, and they had recently moved to 64 bit. This was before Apache really had a portable runtime that abstracted away a lot of system calls so that it was easier to port to other platforms.

This guy showed up and hadn't really met anybody, and he emailed me and said, "I have a bunch of stuff to contribute, can I do that?" I said, "Yeah, just show up on the list and start posting patches." And I may have given him too lightweight instructions because he, ah, came on the list and said. "Great news, I ported Apache HTTPD to our 64-bit ship and have gotten permission from my company to redistribute these patches; and here is the first one out of 10." The first patch was a couple hundred individual changes. Many of them changed #ifdefs without allowing for the old code to continue to compile. It wasn't really an abstraction so much as a modification.

He said, "Number two will come tomorrow, and number three the next day, and you've got to apply them all in sequence because they're deltas, the first to the last, and that's how I modified and check-pointed my code." So the first patch gets posted and immediately people start saying, "This doesn't look like the right kind of change because it breaks it on this other platform or this messy #ifdef just complicated the code and it would be nice to have a more elegant call that starts out so we don't have to repeat it a billion times in the code."

The next day, when he saw that response, he was flabbergasted. He said, "Wait, I can't deal with this. Guys, I ported this to 64 bit. You can't make me go back and redo all these changes. Besides, my second patch depends on everything in the first one going through, so I can't change anything. I have nine more of these." What he didn't understand is there is intense review of code that goes into an open source project and it's better to show up on the scene and say, "here's what I'd like to do, it's a substantial change and I'm wondering about the right way to go about it," rather than to say, "Good news, I've ported this to the Commodore Vic 20. And here's the changes made. Please commit these in." There's a bunch of different problems with that one; the size of the patches and the dependencies. The other is the attitude: I've written this software, I am God.

[There's] a righteous attitude that some developers get that ends up being fairly self-defeating, because it ends up accomplishing the opposite of what you intend. Instead of building confidence in your solution, it causes people to question it.

DDJ: What other errors do new contributors make?

BB: The other is just being too lazy to search through the discussion archives or to RTFM. Or, as a member of the community, to give a snarky response to somebody and it just escalates, or to give no response, and they interpret that as meaning these developers are stuck up or ignorant or hate newbies or whatever. Communication differences: it's the kinds of things that happen when two people are miscommunicating at long distances, and if they were face to face, wouldn't happen, even if they started off on the wrong foot.

DDJ: In many cases, contributors aren't just communicating across long distances, they're communicating across cultural barriers, too.

BB: Yeah. And you could add that for some projects that have a healthy user participation, a lack of understanding of the need to save face. Many developers are very rigorous and scientific and absolutist: "Your code sucks and you need to go back to school." It can be humiliating and especially in Asian cultures, that's a death sentence — that's somebody who's never coming back to participate. Even more subtle passive-aggressive kinds of things can cause somebody to lose face.

Getting Committers To Respond

DDJ: I was looking yesterday at the Apache Poi project, which is looking for some point men for some of the subprojects. It gives some elaborate directions on how to participate. In a nutshell, build it, find the place you want to make your patch, submit your patch, and then bug the developer mailing list until somebody does something with your patch. It seems to me that that last bit is one where trouble lurks. If you're a newcomer and you're bugging people to respond to your patches, you're likely to rankle people if you don't do it right.

BB: If you're making genuine contributions and no one is responding? If the project is dead, there's not going to be anyone around to resuscitate it. Someone has to stand up and say, "No, this project has to come back alive, get the heart-jumpers."

DDJ: Does that ever happen? Have you seen projects that were completely dead come back to life?

BB: There are projects where it really is one core developer who people turn to when there's a question. Maybe there's an area of the code that no one understands or no one wants to touch so that the question remains unresponded to, while other conversations continue apace. There are far more underappreciated projects than living projects. 


Patience, asking questions, always making sure that you're carrying a tone of appreciative inquiry, are the key. Such as saying, "I think this is the right solution, I'm really curious about what others think of it. If I'm the only one who cares, maybe the code should just be excised."

You also should rabble rouse a little bit and maybe go to the issue tracker and see who else has reported similar defects and maybe try to pull them back into the community. If you want to fire up a moribund project or portion of a project, then go out and speak on it. There are innumerable tech conferences out there these days and plenty of opportunities to speak, especially if you don't care to get paid. Telling people why a particular thing excites you is a great exercise, and a great way to make sure you really know it, too.

DDJ: Thank you. I think you've laid out a helpful roadmap, packed with useful observations and commentary that will help guide potential contributors and give them a sense of what to expect.


Long ago, Brian Behlendorf was the CTO at Wired Magazine. During his work there, he started patching the NCSA Web server. As he added more patches, a community of contributors emerged, which later forked and rewrote the server. This product became the Apache Web Server. He later founded the Apache Software Foundation. He also cofounded Collabnet, where he was a principal contributor to Subversion. He is currently the CTO for the World Economic Forum.