Views: 286

Reply to This

Replies to This Discussion

Leap year coding errors used to be one of my trigger points. Anyone can make a mistake, but botching leap years is amateurish incompetence. When I was a development team leader I made sure a contract programmer didn't get a renewal because I caught him hard coding his leap year checks, and he hadn't even bothered to include the next leap year in his very short list of leap years. It would almost certainly not have shown up in testing, and in live it would just have calculated wrong results, rather than failing. It was only revealed because I did a spot check of his code.

Microsoft Apologises for Service disruption due to leap year bug that impacted services in the last couple of days..  

http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azu...

===

I lead the engineering organization responsible for the Windows Azure service and I want to update you on the service disruption we had over the past day. First let me apologize for any inconvenience this disruption has caused our customers. Our focus over the past day has been to resolve the Windows Azure Compute service disruption.  As always we communicate the status of incidents through the Windows Azure Service Dashboard and update that status on an hourly basis or as the situation changes. 

 

Yesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations became aware of an issue impacting the compute service in a number of regions.  The issue was quickly triaged and it was determined to be caused by a software bug.  While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year.   Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue.  The fix was successfully deployed to most of the Windows Azure sub-regions and we restored Windows Azure service availability to the majority of our customers and services by 2:57AM PST, Feb 29th.

       

However, some sub-regions and customers are still experiencing issues and as a result of these issues they may be experiencing a loss of application functionality.  We are actively working to address these remaining issues.  Customers should refer to the Windows Azure Service Dashboard for latest status.  Windows Azure Storage was not impacted by this issue.

 

We will post an update on this situation, including details on the root cause analysis at the end of this incident.  However, our current priority is to restore functionality for all of our customers, sub-regions and services.

 

We sincerely apologize for any inconvenience this has caused. 

 

Bill Laing, Corporate VP Server and Cloud

===

oops....sorry,didn't notice Phil had posted it already!

nop, i didnt find anything :(

The Azure meltdown was apparently sparked by a Leap Day error in software, according to Wednesday afternoon blog post by Bill Laing, corporate VP of Server and Cloud for Microsoft.

http://gigaom.com/cloud/microsoft-azure-falls-down-goes-boom/ & http://www.wired.com/wiredenterprise/2012/03/azure-leap-year-bug/

RSS

Adverts

Ministry of Testing

© 2014   Created by Rosie Sherry.

Badges  |  Report an Issue  |  Terms of Service