Saturday, April 22, 2023

Disaster Recovery Plan

A disaster recovery (DR) plan provides a step-by-step procedure for unplanned incidents such as power outages, natural disasters, cyber attacks and other disruptive events. This DR plan is intended to minimize the impact of a disaster on a primary data center by defining a way for the system to continue to operate. A plan includes a procedure to quickly return to an operational state in a production environment.

A disruption to the operational state of the system in production can lead to lost revenue, financial penalties, brand damage, and/or dissatisfied customers. If the recovery time is long, then the adverse business impact of a disaster is greater. A good disaster recovery plan is intended to recover rapidly from a disruption, regardless of the cause.

This DR plan defines 4 basic elements:

  1. Response - A step-by-step procedure to perform in the event of a disaster that severely impacts the primary data center hosting the system in order to failover to a secondary site.
  2. Secondary Site (Backup) - A secondary, backup instance of the system (DR site) in support of business continuity in the event of a disaster.
  3. Data Replication - The data replication mechanism that keeps a secondary site in sync with a primary.
  4. Recovery - An approach to reconstitute the primary data center hosts after an assessment of the damage.

Disaster Recovery defines two primary objectives, Recovery Point Objective (RPO) and Recovery Time Objective (RTO):


Recovery Point Objective (RPO) - The maximum targeted period of time in which data or transactions might be lost from an IT service due to a major incident. For example, the time period elapsed during a data replication interval. 

Recovery Time Objective (RTO) - The targeted duration of time and a service level within which a business process must be restored after a disaster or disruption in order to avoid unacceptable consequences associated with a break in business continuity. For example, 24 hours to restore 95% of the service. 

Recovery

After damage to the primary site is assessed, a procedure to re-constitute the site to an operational state can be followed. A procedure is expected to be completed within 24 hours of a disaster. During this recovery period, a DR site is expected to provide business continuity, in some cases read-only, as user operations can be queued up, but not yet committed. 

No comments:

Post a Comment