[读]互联网应用服务扩展的一点经验

来源：互联网发布：软件测试书籍推荐编辑：程序博客网时间：2024/06/05 19:36

互联网应用服务扩展的一点经验

http://blog.rebill.info/archives/wangdi-internet-service.html

FreeWheel：互联网MRM视频广告投放发布平台: B2B：content owner ->content distributiors

广告应用服务器：匹配(用户请求和现有广告的匹配)

Log processor: map-reduce(hapdoop), ETL--->OLAP（数据仓库）

Pusher: OLTP DB cache in memory, Pull from DB, 预处理和数据准备。mmap structured memory dump

总结:

商业模式和业务决定设计方案
服务的高可靠性: 99.99% uptime
峰值比: 5:1 peak to Mean.

1.应用服务扩展
  1.1 无状态的应用服务器：
   把需要状态的信息都编码到url，消除服务器间网络通信依赖
   应用服务可重启

  1.2 复制和多层次cache.

   Master->slave读写分离: 避免Master"长"写锁block slave读锁
   Cache的expire time需要认真考虑

   Dot server: 无逻辑的server, 为了避免广告应用服务器集群全部单机造成对外服务失效。
      Dot server对请求返回一个cache的无广告标准输出.

  日志处理：
    使用google protocol buffer避免自己定义格式和写parser, 同时binary log减少日志体积,      扩展字段方便
    小公司尽量少去重新造轮子.

2.数据仓库扩展：
  De-Normalization 反范式, 允许冗余, SQL逻辑简单，查询性能好，标准BI工具建模容易
  Pivot: 合并相同key的多行数据到一行, 提高
  Long tail roll up(长尾成一个item)

Benchmarking:
  提取mysql slow query log多次平均测量值，每月选择top slow query优化
  InnoDB buffer设置70%机器内存
  不要为了优化而优化, 只有在需要时才考虑:Table partition(分表-垂直分割) and sharding（按客户分库-水平分割）

3.运营原则
  系统容量扩展规划: 为峰值预留50%容量，当系统平均负载>50%, 是扩容的信号。

  N+1 Data center: 数据中心不同地理位置分布，备用ISP和CDN


  监控:
   1. 应用check live
   2. 服务异常警报：错误，延时等
   3. 数据库master-slave同步
   4. Slow query日报
   5. 当日业务运营情况日报

  多阶段部署: 建一个和生产环境等比例缩小的Lab，拓扑结构和生产环境相同, 使用生产环境的真实数据做集成测试。
   分阶段部署，分批分时升级


  测试: DEV vs QA: 1:1

   以自动化回归测试为核心。


Netlog: What we learned about scalability & high availability
http://www.slideshare.net/folke/netlog-what-we-learned-about-scalability-high-availability-430211
Apache+PHP+eAccelerator+Keepalived(for HA)
Ngnix+Lighttpd+CDN: static files(css/js/image/photo/video)
Search: Sphinx, mysql full-text search is very slow.

DB partitioning(sharding): Divide data on primary key,
  How: Mysql partitioning since 5.1

Memcached for sessoin/query result/processed data/generated html
   Cache with TTL/Cache forever with invalidate/Cache forever with update

Global locking: use memcache as locking mechanism

Flooding detection by useing memcache[很通用的高效flooding判断方法]
  User can only redo action A after a timeout
   a guestbook message can only be posted once every
  2 minutes
  User can not do action A more than X times in T
  minutes
   only 12 failed login attempts per hour are allowed

Scalability, Availability & Stability Patterns
http://www.slideshare.net/jboner/scalability-availability-stability-patterns

Scalable Web Architectures: Common Patterns and Approaches
http://www.slideshare.net/techdude/scalable-web-architectures-common-patterns-and-approaches
应用架构设计的3个目标: Scale, HA, Performance.

What is scalability for ?
1. Traffic growth
2. Dataset growth
3. Maintainability

Scalability two kinds:
1. Vertical(get bigger): 有些时候增加一些硬件(内存)的代价要小于重新设计软件或者切分数据
比如Mysql性能不够时可以先加一些内存试试.
2. Horizontal(get more)

Share nothing的server容易扩展.

Queuing: with queue, it is easy to parallel in asynchronus method

Database is the toughest part to scale. Dual Intel64 system wtth 16GB+ of RAM can get you a long way.

Mysql: Master-Master+multi-slave(as hot/hot) is good for HA.
  design schema/access to avoid collision(hashing users to servers)
  No auto-inc columns for hot/hot


Data Federation:
  Simple things first: Vertical partitioning + sharding+ central lookup

Multi-site HA:

GSLB: global server load balancing, easiest are DNS