关于Scalability和Performance的一些观点
来源:互联网 发布:索尼z5更新7.0挂起网络 编辑:程序博客网 时间:2024/04/29 10:23
最近一直在研究Scalability和Performance,没事就泡在InfoQ上,看了些文章,从中摘入了些比较赞同的观点:
performance and Scalability
Performance is about the resources used to service a single request.Scalability is about how resource consumption grows when you have toservice more (or larger) requests.
We define performance as how rapidly an operation (or operations)complete, e.g. response time, number of events processed per second,etc; whilst scalability - this is how well the application can bescaled up to handle greater usage demands (e.g. number of users,request rates, volume of data).
Decrease processing time
Collocation :reduce any overheads associated with fetching data required for a pieceof work, by collocating the data and the code.
Caching :if the data and the code can't be collocated, cache the data to reducethe overhead of fetching it over and over again.
Pooling : reduce the overhead associated with using expensive resources by pooling them.
Parallelization: decrease the time taken to complete a unit of work by decomposing theproblem and parallelizing the individual steps.
Partitioning: concentrate related processing as close together as possible, bypartitioning the code and collocating related partitions.
Remoting: reduce the amount of time spent accessing remote services by, forexample, making the interfaces more coarse-grained. It's also worthremembering that remote vs local is an explicit design decision not aswitch and to consider the first law of distributed computing - do notdistribute your objects.
Requirements must be known
Target average and peak performance (i.e. response time, latency, etc).
Target average and peak load (i.e. concurrent users, message volumes, etc).
Acceptable limits for performance and scalability.
Partition by Function
themore decoupled that unrelated functionality can be, the moreflexibility you will have to scale them independently of one another
Atthe database tier, we follow much the same approach. this approachallows us to scale the database infrastructure for each type of dataindependently of the others.
Split Horizontally
Different use cases use different schemes for partitioning the data:some are based on a simple modulo of a key (item ids ending in 1 go toone host, those ending in 2 go to the next, etc.), some on a range ofids (0-1M, 1-2M, etc.), some on a lookup table, some on a combinationof these strategies. Regardless of the details of the partitioningscheme, though, the general idea is that an infrastructure whichsupports partitioning and repartitioning of data will be far morescalable than one which does not.
Avoid Distributed Transactions
Thepragmatic answer is to relax your transactional guarantees acrossunrelated systems.It turns out that you can't have everything. Inparticular,guaranteeing immediate consistency across multiple systems orpartitions is typically neither required nor possible. The CAP theorem,postulated almost 10 years ago by Inktomi's Eric Brewer, states that ofthree highly desirable properties of distributed systems - consistency(C), availability (A), and partition-tolerance (P) - you can onlychoose two at any one time. For a high-traffic web site, we have tochoose partition-tolerance, since it is fundamental to scaling. For a24x7 web site, we typically choose availability. So immediateconsistency has to give way.We do employ various techniques to help thesystem reach eventualconsistency: careful ordering of database operations, asynchronousrecovery events, and reconciliation or settlement batches. We choosethe technique according to the consistency demands of the particularuse case.
Decouple Functions Asynchronously
The next key element to scaling is the aggressive use of asynchrony.If component A calls component B synchronously, A and B are tightlycoupled, and that coupled system has a single scalabilitycharacteristic -- to scale A, you must also scale B. Equallyproblematic is its effect on availability. Going back to Logic 101, ifA implies B, then not-B implies not-A. In other words, if B is downthen A is down. By contrast, if A and B integrate asynchronously,whether through a queue, multicast messaging, a batch process, or someother means, each can be scaled independently of the other. Moreover, Aand B now have independent availability characteristics - A cancontinue to move forward even if B is down or distressed.At everylevel, decomposing the processing into stages or phases, and connectingthem up asynchronously, is critical to scaling.
Virtualize At All Levels
Virtualization and abstraction are everywhere, following the oldcomputer science aphorism that the solution to every problem is anotherlevel of indirection. The operating system abstracts the hardware. Thevirtual machine in many modern languages abstracts the operatingsystem. Object-relational mapping layers abstract the database.Load-balancers and virtual IPs abstract network endpoints. As we scaleour infrastructure through partitioning by function and data, anadditional level of virtualization of those partitions becomescritical.The motivation here is not only programmer convenience, butalsooperational flexibility. Hardware and software systems fail, andrequests need to be re-routed. Components, machines, and partitions areadded, moved, and removed. With judicious use of virtualization, higherlevels of your infrastructure are blissfully unaware of these changes,and you are therefore free to make them. Virtualization makes scalingthe infrastructure possible because it makes scaling manageable.
Cache Appropriately
Themost obvious opportunities for caching come with slow-changing,read-mostly data - metadata, configuration, and static data.Morechallenging is rapidly-changing, read-write data. For the most part, weintentionally sidestep these challenges at eBay.
Scalability Worst Practices
The Golden Hammer
Forcing a particular technology to work in ways it was not intended is sometimes counter-productive.Resource Abuse
Dependencies
Dependencies are a necessary evil in most systems and failure to managedependencies and their versions diligently can inhibit agility andscalability.
Dependencymanagement for code has different flavors:1) Compile the entirecodebase together 2) Pick and choose components and services based onknown versions 3)Publish models and services comprised of onlybackwards-compatible changes
Forgetting to check the time
To properly scale a system it is imperative to manage the time alloted for requests to be handled.Runtime
the ability to easily deploy and operate the system in a productionenvironment must be held in equal regard. There are a number of worstpractices which jeopardize the scalability of a system.
1)Hero Pattern
2)Not automating
3)Not Monitoring
NOSQL(Not Only SQL) Alternatives
Partition the Data
By partitioning the data, we minimize the impact of a failure, and wedistribute the load for both write and read operations. If only onenode fails, the data belonging to that node is impacted, but not theentire data store.Keep Multiple Replicas of the Same Data
Mostof the NOSQL implementations rely on hot-backup copies of the data, toensure continuous high availability.The most common configuration withGigaSpaces, is synchronousreplication to the backup, and asynchronous to the backend storage.Dynamic Scaling
Inorder to handle the continuous growth of data, most NOSQLalternatives provide a way of growing your data cluster, withoutbringing the cluster down or forcing a complete re-partitioning.Onealgorithm notifies the neighbors of a certain partition, that anode joined or failed. Only those neighbor nodes are impacted by thatchange, not the entire cluster.Another (and significantly simpler)algorithm uses logical partitions.With logical partitions, the number of partitions is fixed, but thedistribution of partitions between machines is dynamic.Use Map/Reduce to Handle Aggregation
Map/Reduce is a modelthat is often used to perform complex analytics, that are oftenassociated with Hadoop. Having said that, it is important to note thatmap/reduce is often referred to as a pattern for parallel aggregatedqueries
Using Processing Units for Scaling
Washing Your Car The Tier-Based Way:
Washing the Car Using Space-Based Architecture:
Scaling Easily with Self Contained Processing Units:
- 关于Scalability和Performance的一些观点
- 关于Scalability的一些思考与疑问
- 关于软件工程的一些观点
- 关于数学的一些观点
- 关于创业和移动互联网的一些观点
- 关于Peercast一些观点的更新
- 关于互联网的一些观点想法
- 关于架构设计的一些观点
- 关于拖延症的一些观点
- 关于用户体验的一些观点收集
- 关于拖延症的一些观点
- 关于“万物皆数”的一些观点
- performance and scalability
- 如何改善Managed Code的Performance和Scalability系列之二:深入理解string和如何高效地使用string
- [原创]如何改善Managed Code的Performance和Scalability系列之一:由for V.S. for each想到的
- 关于BW Query Performance 的一些问答
- 一些注意的观点和体系结构
- 关于QQ和360打架的观点
- 网线的制作图解
- 密码检测代码收藏整理
- 获取成员函数的指针
- 我恨学校
- Web.config文件的基本原理及相关设置
- 关于Scalability和Performance的一些观点
- 引用 USB启动盘,将DOS工具集成到WinPE的grub - Windows
- 为保障年终项目能顺利运行,项目风险与项目能及时暴露与QA商议编制暂行汇报流程
- 使用SQL重设数据库Sequence的当前值
- spring webapplicationcontext
- Java读取Properties文件-转
- 动态将Js代码写入到Head标签中
- js parsefloat parseint
- 博弈论的局限性(博弈论的诡计)