Business Continuity and Disaster Recovery Plan
SSB BART Group (SSB) provides for storage of copies of all system back-up data in multiple locations, synchronized on a continuous basis. System storage is fault-tolerant and self-healing. Disk failures are repaired in the background without loss of database availability. Database crashes are automatically detected and databases restarted without the need for crash recovery activities. If a database instance fails, the system automatically fails over to one of multiple read replicas.
SSB utilizes Amazon Web Services (AWS) hosting facilities to provide hosted AMP instances. SSB’s database layer is provided by the Amazon Aurora service under AWS. AWS provides multiple data centers commonly referred to as Availability Zones (AZ). SSB utilizes AZs in the development of our infrastructure to ensure redundant physical systems.
Aurora provides automatic failover to one of fifteen live read replicas maintained across multiple Availability Zones (AZ). Application layer servers do not provide automatic fail over and spin-up of alternative servers is a manual process. Generally, AMP’s business use does not meet the threshold of mission criticality to warrant automatic fail over, and manual fail over support has been deemed acceptable by all clients. In addition, the risk of specific application servers failing is relatively limited.
SSB’s divides disaster incidents in the category of items that impact the system data and those that impact the physical hardware and hosting. For incidents that impact system data, recovery can be accomplished in near real-time and occurs automatically via Amazon Aurora. Aurora databases are continuously monitored for health. On database failure, Amazon will automatically restart the database and associated processes. Amazon Aurora does not require crash recovery replay of database redo logs, greatly reducing restart times.
For occurrences that impact physical hardware and hosting, SSB provides manual fail over to redundant systems both hosted in the same AZ as the primary server or alternative AZ as the situation dictates. The specific fail over approach can be specified on a per customer basis and configured. In general, however, it is a push button deployment of an image and is accomplished in a few minutes.
Finally, if AWS experiences a systemic outage an alternate hosting provider – VPSLink – is available. The time for this fail over to occur varies, but is generally on the order of a day.
Fault Tolerance and Healing
Each chunk of the database volume is replicated six ways, across three AZs. This provides for highly fault-tolerant storage while transparently handling the loss of multiple copies without affecting database availability. In addition, storage is self-healing. Data chunks and disks are continuously scanned for errors and replaced automatically.
For immediate recoveries, back-up data and retention is configured to allow restoration of the database state to any point in time during the SSB retention period. The SSB retention period is configured to thirty days. These automated backup are stored in Amazon S3 which is designed to provide 99.999999999% durability of the backups. As the backups occurs on an ongoing, incremental basis they have no impact on database performance.
For long term recoveries SSB keeps full database snapshots on a monthly basis that are guaranteed to be stored for up to two years. These full database snapshots are stored in Amazon S3 with the related high durability targets.
Backup data is periodically checked for consistency to ensure that it can be used to build a new environment. Database backups are installed on a test server, and regression tests are run against the nightly backup.
SSB executes application upgrades on a quarterly basis in a fashion that is separate from the application data. The upgrade process includes full backups of the system prior to upgrade and validation that the upgraded system functions as expected.
Platform upgrades occur on a continuous basis for critical patches and on a quarterly basis for non-critical patches. SSB maintains a strict separation between the technology base and the data that allows upgrades of the platform with no disruption of access.
AMP’s database backups utilize industry standard encoding and storage schemas. SSB does not utilize any proprietary encoding, storage mediums, or database structures.
SSB’s storage retention policy provides for the retention of ongoing backups of all critical data for thirty days. Weekly full system backups are retained for twelve weeks, and a monthly backup is retained for two years.
AMP is not dependent on external data feeds or interfaces to function, and any failures – outside of core network routing failures – are isolated to the specific hosting environment for AMP.
Management Redundancy Planning
SSB maintains a succession and redundancy plan for key management and support personnel for all Services. This includes both senior management responsible for systems as well as support, operations, and development personnel. This plan is updated and reviewed from time to time to ensure its robustness and support.
Segregation of Duties
SSB defines and segregates operational and development duties in a fashion that ensures that no individual person is responsible for critical functions. This segregation is intended to ensure that any errors or malicious activities that could occur are (i) detected in a timely manner, and (ii) can be readily addressed and mitigated in the normal course of business.
SSB’s business continuity plan is provided in English on request. Business continuity plans can be developed on a per user basis at the request of the client.