Dear Astaro partners and customers,
I profoundly apologize for the events of today. We issued two malfunctioning pattern updates for our flagship product Astaro Security Gateway which caused many of you significant troubles. I, and everyone at Astaro, understand that our most important asset is the trust of our partners and customers and today was a bad day for us.
Please allow me to share the results of our preliminary investigation into what went wrong.
Today at 6:07 CEST our Up2Date servers began distributing Intrusion Prevention System (IPS) patterns (version 12404) which included two rules (numbers 15851 and 16576) that were incompatible with the version of the IPS engine used in ASG 7.5. The IPS stopped working following the update and, on systems with IPS enabled, all traffic was blocked due to the IPS fail-closed policy.
At 9:30 CEST our Up2Date servers began distributing IPS pattern version 12405 which corrected the error and we alerted all of our partners via email about the situation.
Systems with IPS activated could not download the new patterns because all network connectivity was blocked. As is usual, systems with IPS deactivated do not download new patterns either. Immediately our support team began distributing instructions to our partners on how to resolve this problem. After further testing of this solution, we communicated the final instructions on how to resolve the situation to all of our partners via email at 11:44 CEST.
Then, incredibly, at 12:25 CEST our Up2Date servers began distributing Anti Virus (AV) pattern version 12407 which included a signature incompatible with the AV engine our systems are running. On systems with Dual AV Scanning enabled the Web proxy was not working and the Mail proxy was not forwarding email.
Our Up2Date servers began distributing the corrected patterns at 13:20 CEST and the issue corrected itself automatically once the systems loaded version 12410 of the patterns. At 13:15 CEST we notified our partners via the Up2Date Blog about the incident.
How could this happen?
So, after we have had only one such issue in the last two years, how could we incur two within a few hours? Especially since, after the last incident, we have made such significant investments in our team, infrastructure, and processes to ensure we test every update on every platform before we release it?
In this case we manually and automatically tested the patterns, but not on all of the versions of the IPS and AV engines used in the field. Our automated testing framework is in migration right now in order to get it ready for V8, and we lost sight of maintaining full coverage during this period.
What did we learn?
- We have relied on a testing framework that was in maintenance and not ready for production use. This is inexcusable and we will never do this again.
- No error of any system in the field should prevent it from reaching our Up2Date servers so fixes can be deployed without manual intervention. We will fix this.
- We need to define better escalation procedures and inform our partners and customers more quickly using multiple, redundant communication channels. We will implement this.
Once again, I offer my sincere apologies. I sincerely hope that you will accept them and allow us to prove ourselves as a trusted partner to your business going forward. If you have other questions please do not hesitate to contact me by responding to this email.
Jan Hichert Astaro CEO
The original article/video can be found at Letter from the CEO