Generator Fan Failure Triggered AWS Outage
来源:互联网 发布:淘宝优惠券手机连接 编辑:程序博客网 时间:2024/05/16 19:33
Generator Fan Failure Triggered AWS Outage
Last week’s outage at Amazon Web Services was triggered by a series of failures in the power infrastructure in a northern Virginia data center, including the failure of a generator cooling fan while the facility was on emergency power. The downtime affected AWS customers Heroku, Pinterest, Quora and HootSuite, along with a host of smaller sites.
The incident began at 8:44 p.m. Pacific time on June 14, when the Amazon data center lost utility power. The facility switched to generator power, as designed. But nine minutes later, a defective cooling fan caused one of the backup generators to overheat and shut itself down.
“At this point, the EC2 instances and EBS volumes supported by this generator failed over to their secondary back-up power (which is provided by a completely separate power distribution circuit complete with additional generator capacity),” Amazon wrote in its incident report at the AWS Service Health Dashboard.
Breaker Misconfiguration Compounds Issue
“Unfortunately, one of the breakers on this particular back-up power distribution circuit was incorrectly configured to open at too low a power threshold and opened when the load transferred to this circuit. After this circuit breaker opened at 8:57PM PDT, the affected instances and volumes were left without primary, back-up, or secondary back-up power.”
The generator fan was fixed and the generator was restarted at 10:19 pm Pacific time. As is often the case, once power was restored it took some time for customers to fully restore databases and applications. Amazon said a primary datastore for its Elastic Block Storage (EBS) lost power during the incident and “did not fail cleanly,” resulting in some additional disruption.
One the event was resolved, Amazon conducted an audit of its back-up power distribution circuits. “We found one additional breaker that needed corrective action,” AWS reported. “We’ve now validated that all breakers worldwide are properly configured, and are incorporating these configuration checks into our regular testing and audit processes.”
The outage was the third significant downtime in the last 14 months for the US-East-1 availability zone, which is Amazon’s oldest availability zone and resides in a data center in Ashburn, Virginia. The US-East-1 zone had a major outage inApril 2011 and another less serious incident in March. Amazon’s U.S East region also was hit by a series of four outages in a single week in 2010.
- Generator Fan Failure Triggered AWS Outage
- Learn from AWS outage
- Dell开机报Alert! Previous fan failure错误
- 断电 power outage
- aws
- AWS
- AWS
- aws
- AWS
- aws
- aws
- aws
- AWS
- AWS
- TMO (time-triggered message-triggered object)
- fan.exe
- kernel fan
- hao fan
- 让Ubuntu支持GBK编码
- c语言枚举类型详解
- jquerymobile-15 radio buttons和checkboxes
- 九度OJ 教程34 (完全二叉树)树的查找
- Amazon Cloud Outage KOs Reddit, Foursquare & Others
- Generator Fan Failure Triggered AWS Outage
- Activity的生命周期学习体验
- Power Outage Affects Amazon Customers
- 安装ubuntu12注意事项
- San Diego’s AIS Rides Out Power Outage
- UIImageView帧动画
- 程序员必看的六十本书
- Power Surge KOs Washington State Data Center
- Sicily 1028 Hanoi Tower Sequence
June 21st, 2012