Distributed Systems Topologies

来源:互联网 发布:用友财务软件数据备份 编辑:程序博客网 时间:2024/04/25 21:37

Distributed Systems Topologies: Part 1

from: http://openp2p.com/pub/a/p2p/2001/12/14/topologies_one.html

by Nelson Minar

12/14/2001

The peer-to-peer explosion has reminded people of the power ofdecentralized systems. The promise of robustness, open-endedness, andinfinite scalability have made many people excited aboutdecentralization. But in reality, most systems we build on the Internetare largely centralized.

This two-part article develops a framework for comparing distributed system designs. In this first part, I boil down the design of many systems to their essential topologies and describe how hybrid topologies can be made by combining parts. In the second part, I will introduce seven criteria for evaluating a system design and discuss the relative merits of distributed system designs.

Looking at topology

The peer-to-peer trend has renewed interest in decentralizedsystems. The Internet itself is the largest decentralized computersystem in the world. But ironically in the 1990s many systems built on the Internet were completely centralized. The growth of the Web meant most systems were single web servers running in fabulously expensive collocation facilities. Now with peer-to-peer, the pendulum has swung the other way to radically decentralized architectures such asGnutella. Inpractice, extreme architectural choices in either direction are seldom the way to build a usable system.

With the Internet, we have 30 years of experience withdistributed systems architectures. With all this experience, a high-level framework for understanding distributed systems is helpful for organizing what we have learned.

In this article, I focus on the topology of the distributedsystems -- how the different computers in the system fit together. Four basic topologies are in use on the Internet: centralized anddecentralized, but also hierarchical and ring systems. These topologies can be used by themselves, or combined into one system creating hybrid systems.

Reader beware: This article takes a very high-level view, trying to encompass 30 years of Internet development into a short article. By necessity, the following analysis is in terms of generalities and may not be accurate for any specific case. My hope is that this approach will help the reader look at system design from the top down, and thereby better evaluate choices, such as whether to use a centralized or decentralized approach for a specific application.

Basic distributed systems topologies

Related Reading

Programming Web Services with SOAPProgramming Web Services with SOAP
By Doug Tidwell, James Snell, Pavel Kulchenko
Table of Contents
Index
Sample Chapter
Full Description

The debate between centralized and decentralized systems isfundamentally about topology -- in other words, how the nodes in the system areconnected. Topology can be considered at many different levels:physical, logical, connection, or organizational.

For this analysis, topology is considered in terms of the information flow. Nodes in the graph are individual computers or programs, links between nodes indicate that those nodes are sharing information regularly in the system. Typically, an edge implies that the two nodes are directly sharing bits across a network link. For simplicity, I do not consider the direction of information flow; edges are considered undirected.

Four common topologies will be explained here:centralized, ring, hierarchical, and decentralized. (A fifthdistributed system pattern, group communication, is not consideredin this article.)

Centralized

Centralized systems are the most familiar form of topology,typically seen as the client/server pattern used by databases, webservers, and other simple distributed systems. All function andinformation is centralized into one server with many clientsconnecting directly to the server to send and receive information.Many applications called "peer-to-peer" also have a centralizedcomponent. SETI@Home is a fully centralized architecture with the jobdispatcher as the server. And the original Napster's searcharchitecture was centralized, although the file sharing was not.

Ring

A single centralized server cannot handle high client load, so acommon solution is to use a cluster of machines arranged in a ring toact as a distributed server. Communication between the nodescoordinates state-sharing, producing a group of nodes that provideidentical function but have failover and load-balancing capabilities.Unlike the other topologies here, ring systems are generally builtassuming the machines are all nearby on the network and owned by a single organization.

Hierarchical

Hierarchical systems have a long history on the Internet, but inpractice are often overlooked as a distinct distributed systemstopology. The best-known hierarchical system on the Internet is theDomain Name Service, whereauthority flows from the root name-servers to the server for theregistered name and often down to third-level servers. NTP, theNetwork Time Protocol, creates anotherhierarchical system.

In NTP, there are root time servers that haveauthoritative clocks; other computers synchronize to root time serversin a self-organizing tree. NTP hasover175,000 hosts with most hosts being two or three links away from aroot time source. Usenet is another large hierarchical system, using atree-like structure to copy articles between servers. It isparticularly interesting in that the underlying protocols aresymmetric but in practice, articles propagate along tree-like pathswith a relatively small set of hosts acting as the backbone.

Decentralized

The final topology we consider here is decentralized systems, whereall peers communicate symmetrically and have equal roles. Gnutella isprobably the most "pure" decentralized system used in practice today,with only a small centralized function to bootstrap a new host. Manyother file-sharing systems are also designed to be decentralized, suchas Freenet or OceanStore. Decentralized systems are not new; theInternet routing architecture itself is largely decentralized, withtheBorder GatewayProtocol used to coordinate the peering links between variousautonomous systems.

Hybrid topologies

Distributed systems often have a more complex organization than anyone simple topology. Real-world systems often combine severaltopologies into one system, making ahybrid topology. Nodestypically play multiple roles in such a system. For example, a nodemight have a centralized interaction with one part of the system,while being part of a hierarchy in another part.

Centralized + Ring

As mentioned above, serious web server applications often have aring of servers for load balancing and failover. The server systemitself is a ring, but the system as a whole (including the clients) isa hybrid: a centralized system where the server is itself a ring.The result is the simplicity of a centralized system (from theclient's point of view) with the robustness of a ring.

Centralized + Centralized

The server in a centralized system is itself often a client of one or more other servers. Stacking multiple centralized systems is the core ofn-tier application frameworks. For example, when a web browser contacts a server, the software on that server may just be formatting results into HTML for presentation and itself calling to servers hosting business logic or data. Web services intermediariessuch as Grand CentralNetworks also create several layers of centralized system.Centralized systems are often stacked as a way to compose function.

Centralized + Decentralized

A new wave of peer-to-peer systems is advancing an architecture ofcentralized systems embedded in decentralized systems. This hybridtopology is realized with hundreds of thousands of peers in theFastTrack file-sharing system usedin KaZaA and Morpheus. Most peers have a centralized relationship to a"supernode," forwarding all file queries to this server (much like aNapster client sends queries to the Napster server). But instead ofsupernodes being standalone servers, they band themselves together ina Gnutella-like decentralized network, propagating queries. Internetemail also shows this kind of hybrid topology. Mail clients have acentralized relationship with a specific mail server, but mail serversthemselves share email in a decentralized fashion.

Other topologies

There are limitless possibilities in combining various kinds ofarchitectures. A centralized system could have a hierarchy of machinesin the role of server. Decentralized systems could be built that spandifferent rings or hierarchies. Systems could conceivably be builtwith three or more topologies combined, although the resultingcomplexity may be too difficult to manage. Topology is a usefulsimplifying tool in understanding the architecture of distributedsystems.

Conclusion, and on to evaluation

In this article, I have introduced a way to think of distributed system design. By looking at systems in terms of their topology, it is possible to examine a wide spectrum of systems we have on the Internet today, from centralized to decentralized and with architectures in between. In the second part of this article, I will develop a framework for analyzing the strengths and weaknesses of distributed systems and apply it to the different topologies presented here.

Note: this article is based on a talk given byNelson Minar at theO'Reilly Peer-to-Peerand Web Services Conference in November, 2001. The slides fromthat talk are alsoonline.

Nelson Minar was co-founder of Popular Power.


Distributed Systems Topologies: Part 2

from: http://openp2p.com/pub/a/p2p/2002/01/08/p2p_topologies_pt2.html

by Nelson Minar
01/08/2002

In the first part of this two-part series, I presented a 10,000-foot view of how a framework forcomparing distributed systems, based on system topology, is developed.In this second part, I introduce seven criteria for evaluating asystem design and discuss their relative merits. Systems with hybrid toplogies often seem to demonstrate theadvantages of the various constituent designs that comprise their makeup.

Evaluating System Topologies

In the first part of this series I described distributed systemsin terms of their core topologies: centralized, decentralized, rings,hierarchies, and hybrids. Now I take advantage of this description byusing it to evaluate system designs.

In this second part, I describe seven characteristics of distributed systems thatare commonly used when talking about system design and then analyzeeach characteristic for each of the topologies. As with the topologydescriptions, the same caution about the high-level nature of this analysisapplies to this article. The observations made here aregeneralizations and may not apply to any specific system. The intentis to develop a broad framework for considering system design that canthen be applied to specific domains.

Seven Evaluation Properties

For this article, I boil down all possible ways to evaluatedistributed systems into seven properties. While not exhaustive, thisset is chosen because these properties are often used when talkingabout the advantages or disadvantages of decentralized systems. Theresulting framework is a useful shorthand for thinking about systemdesign.

Manageability
How hard is it to keep the system working? Complex systems require management: updating, repairing, and logging.
Information coherence
How authoritative is information in the system? If a bit of data is found in the system, is that data correct? Non-repudiation,auditability, and consistency are particular aspects of information coherence.
Extensibility
How easy is it to grow the system, to add new resources to it? The Web is the ultimate extensible system; anyone can create a new Web server or Web page and immediately have that contribution be part of the Web.
Fault Tolerance
How well can the system handle failures? Fault tolerance is a necessity in large distributed systems.
Security
How hard is it to subvert the system? Security covers a variety of topics, such as preventing people from taking over the system, injecting bad information, or using the system for apurpose other than which the owners intend.
Resistance to lawsuits and politics
How hard is it for an authority to shut down the system? The designers of Gnutella or Freenet consider their resistance to lawsuits to be one of their best features. Other parties consider this property to be a danger.
Scalability
How large can the system grow? Scalability is often promoted as a key advantage of decentralized systems over centralized, although the reality is more complex.

Evaluating Simple Topologies

With these seven concepts in mind, we can look at each of the basicsystem topologies and evaluate their effectiveness.

Centralized

ManageableYesCoherentYesExtensibleNoFault-TolerantNoSecureYesLawsuit-ProofNoScalable?

The primary advantage of centralized systems is their simplicity.Because all data is concentrated in one place, centralized systems areeasily managed and have no questions of data consistency or coherence.

Centralized systems are also relatively easy to secure: there is onlyone host that needs to be protected. The drawback of centralization isthat everythingis in only one place. If the central server goes down,everything does. There is no fault tolerance, and the system is easyto shut down with a lawsuit. Centralized systems are also often hardto extend -- resources can only be added to the central system.

The scalability of centralized systems is subtle. Scale is clearlylimited by the capacity of the server, and so centralized systems areoften thought of as unscalable. But computers are very fast and asingle computer can often support all the demands of its users.

Forexample, a modest computer running a Web server can easily handlehundreds of thousands of visitors a day. And unlike more complextopologies, the scalability of a centralized system is very easy tomeasure. So while, theoretically, centralized systems are not scalable,in practice they often suffice.

Ring

ManageableYesCoherentYesExtensibleNoFault-TolerantYesSecureYesLawsuit-ProofNoScalableYes

Ring systems typically have a single owner. This concentrationgives them many of the same advantages of centralized systems: theyare manageable, coherent, and relatively secure from tampering.

Theadded complexity of the ring is mitigated by fairly simple rules forpropagating state between the nodes in a ring. But the single-ownerrestriction means rings are also not extensible: a user still needsthe owner's permission to add a resource like a music file or a Webpage into the ring. Similarly, a lawsuit only needs to shut down theowner to shut down the whole ring.

The advantages of rings over centralized systems are faulttolerance and simple scalability. If a host goes down in a ring,failover logic makes it a simple matter to have another host cover theproblem. And well-designed rings are scalable -- one can simply addmore hosts to the ring and expand the capacity nearly linearly.

Hierarchical

ManageablePartiallyCoherentPartiallyExtensiblePartiallyFault-TolerantPartiallySecureNoLawsuit-ProofNoScalableYes

Hierarchical systems have a completely different set of advantagesfrom that of rings. Hierarchical systems are somewhat manageable in that theyhave a clear chain of action. But because these systems have such a broad scope, itcan be hard to correct a host with a problem. Coherence is usuallyachieved with a cache consistency type of strategy; effective, but notcomplete.

Hierarchical systems are extensible in that any host in thesystem can add data, but the rules of data management may limit whatinformation can be added. (For example, the oreilly.com DNS server canadd hosts for oreilly.com, but for no one else.)

Hierarchical systems are more fault-tolerant and lawsuit-proof thancentralized systems, but the root is still a single point of failure.They tend to be harder to secure than centralized systems. If a nodehigh in the hierarchy is subverted or spoofed, the whole systemsuffers. And it's not just the root that is a risk: if data travels upthe branches to the root, then leaf nodes may be able to inject badinformation to the system.

The primary advantage of hierarchical systems is their incrediblescalability -- new nodes can be added at any level to cover for toomuch load. This scalability is best demonstrated in DNS, which hasscaled over the last 15 years from a few thousand hosts to hundreds ofmillions. The relative simplicity and openness of hierarchical systems,in addition to their scalability, make them a desirable option forlarge Internet systems.

Decentralized

ManageableNoCoherentNoExtensibleYesFault-TolerantYesSecureNoLawsuit-ProofYesScalableMaybe

Decentralized systems such as Gnutella have almost the exactopposite characteristics as centralized systems. The far-flungnature of these networks means the systems tend to be difficult tomanage and that data in the system is never fully authoritative. Theyalso tend to be insecure, in the sense that it is easy for a node tojoin the network and start putting bad data into the system.

A primary virtue of decentralized systems is their extensibility.For example, in Gnutella any node can join the network and instantlymake new files available to the whole network. Decentralized systemsalso tend to be fault-tolerant and harder to sue. The failure orshutdown of any particular node does not impact the rest of thesystem.

The scalability of decentralized systems is hard to evaluate. Intheory, the more hosts you add, the more capable a decentralizednetwork becomes. In practice, the algorithms required to keep adecentralized system coherent often carry a lot of overhead. If thatoverhead grows with the size of the system, then the system may notscale well. The Gnutella network suffered this problem in the earlystages, and it remains to be seen if Gnutella can ever scale to themillions of active users that more centralized architectures enjoy.Scalability of decentralized systems remains an active research topic.

Evaluating Hybrid Topologies

System topologies become even more interesting when you combinethem into hybrid architectures. Often, different topologies are chosenfor different parts of a system to get the best of the strengthswithout the weaknesses.

Centralized + Ring

ManageableYesCoherentYesExtensibleNoFault-TolerantYesSecureYesLawsuit-ProofNoScalableYes

Systems that have a ring as their central server often enjoy thebest of the simplicity of centralization with the redundancy of aring. The hybrid system is still easily managed, coherent, and secure;the ring does not add much complexity over a purely centralizedsystem.

This combination still has a single owner and therefore is notparticularly extensible or lawsuit-proof. The key advantage is thatusing a ring as the server adds fault-tolerance and scalability. Thepower and simplicity of the combination of rings and centralizedsystems explains why this architecture is so popular with seriousserver-based applications such as Web commerce and high-availabilitydatabases.

Centralized + Decentralized

ManageableNoCoherentPartiallyExtensibleYesFault-TolerantYesSecureNoLawsuit-ProofYesScalableApparently

A system combining centralized and decentralized systems enjoyssome of the advantages of both. Decentralization contributes to theextensibility, fault-tolerance, and lawsuit-proofing of the system.The partial centralization makes the system more coherent than apurely decentralized system, as there are relatively fewer hosts thatare holding authoritative data. Manageability is about as difficult asa decentralized system, and the system is no more secure than anyother decentralized system.

The amazing story is the scalability ofthis hybrid. Internet email runs very well for hundreds of millions ofusers and has grown enormously since its initial design. FastTrack-based systems have grown very quickly with none of the slowdowns thatplagued Napster or Gnutella in their growth. There is growing interestin this kind of hybrid topology as an excellent architecture forpeer-to-peer systems.

Conclusions

Comment on this articleHave you seen these basic systems combined in any particularly exciting ways not mentioned in this article?
Post your comments

A decentralized system is not always better or worse than a centralizedsystem. The choice depends entirely on the needs of the application.The simplicity of centralized systems makes them easier to manage andcontrol, while decentralized systems grow better and are more resistantto failures or shutdowns.

As for scalability, the story is not clear.Centralized systems have limited scale, but that limit is easy tounderstand. In contrast, decentralized systems offer the possibility ofmassive scalability, but in practice that can be very hard to achieve.

The second conclusion is the power of creating hybrid topologies.In centralized+ring systems, the ring covers many of the drawbacks ofa purely centralized approach, providing easy scalability and faulttolerance. And centralized+decentralized systems are showing powerfulscalability and extensibility while retaining some of the coherence ofcentralized systems.

System designers have to evaluate the requirements for theirparticular area and pick a topology that matches their needs. We are notlimited to a few simple topologies; topologies can be combined to makehybrids. And while centralized systems are doing a lot of the work onthe Internet, there is a lot of exciting potential in decentralizedsystems. In particular, combining decentralized topologies with othersimpler topologies is a powerful approach.

Nelson Minar was co-founder of Popular Power.