1. Introduction

This document outlines the disaster recovery framework aimed at ensuring the timely recovery of project information and supporting systems resulting from unplanned system failures or service disruptions. Our priority is to minimise the duration of disruption and to restrict the amount of data loss to the smallest window possible.

2. Objective

The main objective of this Disaster Recovery (DR) framework is to detail the necessary procedures to recover critical IT systems and data in the shortest timeframe possible while minimizing data loss.

3. Disaster Recovery Team

The Disaster Recovery Team is responsible for executing the Disaster Recovery Plan. This team consists of members from senior management & IT.

4. Incident Response

Any disaster or disruption will be reported immediately to the Disaster Recovery Team and the management. An initial assessment will be performed to identify the severity of the incident.

5. Disaster Recovery Strategies

Different strategies will be employed based on the severity of the incident. The strategies include data backup and recovery, redundant system deployment, and alternative site deployment.

6. Recovery Procedures

6.1. Data Backup and Recovery

We maintain a rigorous data backup policy where data is backed up daily to a remote, secure location. In the event of a data loss incident, the following steps will be taken:

  1. Immediate Incident Response (0-2 hours): Immediate steps will be taken to ensure the integrity of unaffected data and systems. The DR team will ascertain the scale and nature of the loss.
  2. Restoration (2-4 hours): Backups will be used to restore the data on a replacement server or the original server if it is operable.
  3. Validation (4-6 hours): Once restoration is completed, validation procedures will be carried out to ensure data and system integrity.

With these procedures in place, the Recovery Time Objective (RTO) is 6 hours, and the Recovery Point Objective (RPO) is 24 hours, i.e., a maximum of 24 hours of data could be lost.

6.2. Redundant System Deployment

For mission-critical applications, redundant systems are deployed to minimise downtime. If a primary system fails, the workload will be switched to the redundant system immediately.

  1. Immediate Incident Response (0-1 hour): Identify the incident and initiate failover to the redundant system.