Disaster recovery plan

Disaster recovery plan definition

A disaster recovery plan (DRP), disaster recovery implementation plan, or IT disaster recovery plan is a recorded policy and/or process that is designed to assist an organization in executing recovery processes in response to a disaster to protect business IT infrastructure and more generally promote recovery. The purpose of a disaster recovery plan is to comprehensively explain the consistent actions that must be taken before, during, and after a natural or man-made disaster so that the entire team can take those actions. A disaster recovery plan should address both man-made disasters that are intentional, such as fallout from terrorism or hacking, or accidental, such as an equipment failure.

What is a disaster recovery plan?

Organizations of all sizes generate and manage massive amounts of data, much of it mission critical. The impact of corruption or data loss from human error, hardware failure, malware, or hacking can be substantial. Therefore, it is essential to create a disaster recovery plan for the restoration of business data from a data backup image.

It is most effective to develop an information technology (IT) disaster recovery plan in conjunction with the business continuity plan (BCP). A business continuity plan is a complete organizational plan that consists of five components:

1. Business resumption plan
2. Occupant emergency plan
3. Continuity of operations plan
4. Incident management plan (IMP)
5. Disaster recovery plan

Generally, components one through three do not touch upon IT infrastructure at all. The incident management plan typically establishes procedures and a structure to address cyber attacks against IT systems during normal times, so it does not deal with the IT infrastructure during disaster recovery. For this reason, the disaster recovery plan is the only component of the BCP of interest to IT.

Among the first steps in developing such a strategy is business impact analysis, during which the team should develop IT priorities and recovery time objectives. The team should time technology recovery strategies for restoring applications, hardware, and data to meet business recovery needs.

Every situation is unique and there is no single correct way to develop a disaster recovery plan. However, there are three principal goals of disaster recovery that form the core of most DRPs:

  • prevention, including proper backups, generators, and surge protectors
  • detection of new potential threats, a natural byproduct of routine inspections
  • correction, which might include holding a “lessons learned” brainstorming session and securing proper insurance policies

What should a disaster recovery plan include?

Although specific disaster recovery plan formats may vary, the structure of a disaster recovery plan should include several features:

Goals
A statement of goals will outline what the organization wants to achieve during or after a disaster, including the recovery time objective (RTO) and the recovery point objective (RPO). The recovery point objective refers to how much data (in terms of the most recent changes) the company is willing to lose after a disaster occurs. For example, an RPO might be to lose no more than one hour of data, which means data backups must occur at least every hour to meet this objective.

Recovery time objective or RTO refers to the acceptable downtime after an outage before business processes and systems must be restored to operation. For example, the business must be able to return to operations within 4 hours in order to avoid unacceptable impacts to business continuity.

Personnel
Every disaster recovery plan must detail the personnel who are responsible for the execution of the DR plan, and make provisions for individual people becoming unavailable.

IT inventory
An updated IT inventory must list the details about all hardware and software assets, as well as any cloud services necessary for the company’s operation, including whether or not they are business critical, and whether they are owned, leased, or used as a service.

Backup procedures
The DRP must set forth how each data resource is backed up – exactly where, on which devices and in which folders, and how the team should recover each resource from backup.

Disaster recovery procedures
These specific procedures, distinct from backup procedures, should detail all emergency responses, including last-minute backups, mitigation procedures, limitation of damages, and eradication of cybersecurity threats.

Disaster recovery sites
Any robust disaster recovery plan should designate a hot disaster recovery site. Located remotely, all data can be frequently backed up to or replicated at a hot disaster recovery site — an alternative data center holding all critical systems. This way, when disaster strikes, operations can be instantly switched over to the hot site.

Restoration procedures
Finally, follow best practices to ensure a disaster recovery plan includes detailed restoration procedures for recovering from a loss of full systems operations. In other words, every detail to get each aspect of the business back online should be in the plan, even if you start with a disaster recovery plan template. Here are some procedures to consider at each step.

Include not just objectives such as the results of risk analysis and RPOs, RTOs, and SLAs, but also a structured approach for meeting these goals. The DRP must address each type of downtime and disaster with a step-by-step plan, including data loss, flooding, natural disasters, power outages, ransomware, server failure, site-wide outages, and other issues. Be sure to enrich any IT disaster recovery plan template with these critical details.

Create a list of IT staff including contact information, roles, and responsibilities. Ensure each team member is familiar with the company disaster recovery plan before it is needed so that individual team members have the necessary access levels and passwords to meet their responsibilities. Always designate alternates for any emergency, even if you think your team can’t be affected.

Address business continuity planning and disaster recovery by providing details about mission-critical applications in your DRP. Include accountable parties for both troubleshooting any issues and ensuring operations are running smoothly. If your organization will use cloud backup services or disaster recovery services, vendor name and contact information, and a list of authorized employees who can request support during a disaster should be in the plan; ideally the vendor and organizational contacts should know of each other.

Media communication best practices are also part of a robust disaster recovery and business continuity plan. A designated public relations contact and media plan are particularly useful to high profile organizations, enterprises, and users who need 24/7 availability, such as government agencies or healthcare providers. Look for disaster recovery plan examples in your industry or vertical for specific best practices and language.

Benefits of a disaster recovery plan

Obviously, a disaster recovery plan details scenarios for reducing interruptions and resuming operations rapidly in the aftermath of a disaster. It is a central piece of the business continuity plan and should be designed to prevent data loss and enable sufficient IT recovery.

Beyond the clear benefit of improved business continuity under any circumstances, having a company disaster recovery plan can help an organization in several other important ways.

Cost-efficiency
Disaster recovery plans include various components that improve cost-efficiency. The most important elements include prevention, detection, and correction, as discussed above. Preventative measures reduce the risks from man-made disasters. Detection measures are designed to quickly identify problems when they do happen, and corrective measures restore lost data and enable a rapid resumption of operations.

Achieving cost-efficiency goals demands regular maintenance of IT systems in their optimal condition, high-level analysis of potential threats, and implementation of innovative cybersecurity solutions. Keeping software updated and systems optimally maintained saves time and is more cost-effective. Adopting cloud-based data management as a part of disaster recovery planning can further reduce the costs of backups and maintenance.

Increased productivity
Designating specific roles and responsibilities along with accountability as a disaster recovery plan demands increases effectiveness and productivity in your team. It also ensures redundancies in personnel for key tasks, improving sick day productivity, and reducing the costs of turnover.

Improved customer retention
Customers do not easily forgive failures or downtime, especially if they result in loss of sensitive data. Disaster recovery planning helps organizations meet and maintain a higher quality of service in every situation. Reducing the risks your customers face from data loss and downtime ensures they receive better service from you during and after a disaster, shoring up their loyalty.

Compliance
Enterprise business users, financial markets, healthcare patients, and government entities, all rely on availability, uptime, and the disaster recovery plans of important organizations. These organizations in turn rely on their DRPs to stay compliant with industry regulations such as HIPAA and FINRA.

Scalability
Planning disaster recovery allows businesses to identify innovative solutions to reduce the costs of archive maintenance, backups, and recovery. Cloud-based data storage and related technologies enhance and simplify the process and add flexibility and scalability.

The disaster recovery planning process can reduce the risk of human error, eliminate superfluous hardware, and streamline the entire IT process. In this way, the planning process itself becomes one of the advantages of disaster recovery planning, streamlining the business, and rendering it more profitable and resilient before anything ever goes wrong.

Ways to develop a disaster recovery plan

There are several steps in the development of a disaster recovery plan. Although these may vary somewhat based on the organization, here are the basic disaster recovery plan steps:

Risk assessment
First, perform a risk assessment and business impact analysis (BIA) that addresses many potential disasters. Analyze each functional area of the organization to determine possible consequences from middle of the road scenarios to “worst-case” situations, such as total loss of the main building. Robust disaster recovery plans set goals by evaluating risks up front, as part of the larger business continuity plan, to allow critical business operations to continue for customers and users as IT addresses the event and its fallout.

Consider infrastructure and geographical risk factors in your risk analysis. For example, the ability of employees to access the data center in case of a natural disaster, whether or not you use cloud backup, and whether you have a single site or multiple sites are all relevant here. Be sure to include this information, even if you’re working from a sample disaster recovery plan.

Evaluate critical needs
Next, establish priorities for operations and processing by evaluating the critical needs of each department. Prepare written agreements for selected alternatives, and include details specifying all special security procedures, availability, cost, duration, guarantee of compatibility, hours of operation, what constitutes an emergency, non-mainframe resource requirements, system testing, termination conditions, a procedure notifying users of system changes, personnel requirements, specs on required processing hardware and other equipment, a service extension negotiation process, and other contractual issues.

Set disaster recovery plan objectives
Create a list of mission-critical operations to plan for business continuity, and then determine which data, applications, equipment, or user accesses are necessary to support those functions. Based on the cost of downtime, determine each function’s recovery time objective (RTO). This is the target amount of time in hours, minutes, or seconds an operation or application can be offline without an unacceptable business impact.

Determine the recovery point objective (RPO), or the point in time back to which you must recover the application. This is essentially the amount of data the organization can afford to lose.

Assess any service level agreements (SLAs) that your organization has promised to users, executives, or other stakeholders.

Collect data and create the written document
Collect data for your plan using pre-formatted forms as needed. Data to collect in this stage may include:

  • lists (critical contact information list, backup employee position listing, master vendor list, master call list, notification checklist)
  • inventories (communications equipment, data center computer hardware, documentation, forms, insurance policies, microcomputer hardware and software, office equipment, off-site storage location equipment, workgroup hardware, etc.)
  • schedules for software and data files backup/retention
  • procedures for system restore/recovery
  • temporary disaster recovery locations
  • other documentation, inventories, lists, and materials

Organize and use the collected data in your written, documented plan.

Test and revise
Next, develop criteria and procedures for testing the plan. This is essential to ensure the organization has adopted compatible, feasible backup procedures and facilities, and to identify areas that should be modified. It also allows the team to be trained, and proves the value of the DRP and ability of the organization to withstand disasters.

Finally, test the plan based on the criteria and procedures. Conduct an initial dry run or structured walk-through test and correct any problems, ideally outside normal operational hours. Types of business disaster recovery plan tests include: disaster recovery plan checklist tests, full interruption tests, parallel tests, and simulation tests.

RPO vs RTO

The recovery point objective, or RPO, refers to how much data (in terms of the most recent changes) the company is willing to lose after a disaster occurs. For example, an RPO might be to lose no more than one hour of data, which means data backups must occur at least every hour to meet this objective.

The RPO answers this question: “How much data could be lost without significantly impacting the business?”

Example: If the RPO for a business is 20 hours and the last available good copy of data after an outage is 18 hours old, we are still within the RPO’s parameters.

Recovery time objective or RTO refers to the acceptable downtime after an outage before business processes and systems must be restored to operation. For example, the business must be able to return to operations within 4 hours in order to avoid unacceptable impacts to business continuity.

In other words, the RTO answers the question: “How much time after notification of business process disruption should it take to recover?”

To compare RPO and RTO, consider that RPO means a variable amount of data that would need to be re-entered after a loss or would be lost altogether during network downtime. In contrast, RTO refers to how much real time can elapse before the disruption unacceptably impedes normal business operations.

It is important to expose the gap between actuals and objectives set forth in the disaster recovery plan. Only business disruption and disaster rehearsals can expose actuals—specifically Recovery Point Actual (RPA) and Recovery Time Actual (RTA). Refining these differences brings the plan up to speed.

Strategies and tools for a disaster recovery plan

The right strategies and tools help implement a disaster recovery plan.

Traditional on-premises recovery strategies
The IT team should develop disaster recovery strategies for IT applications, systems, and data. This includes desktops, data, networks, connectivity, servers, wireless devices, and laptops. Identify IT resources that support time-sensitive business processes and functions so their recovery times match.

Information technology systems require connectivity, data, hardware, and software. The entire system may fail due to a single component, so recovery strategies should anticipate the loss of one or more of these system components:

  • Secure, climate-controlled computer room environment with backup power supply
  • Connectivity to a service provider
  • Hardware such as desktop and laptop computers, networks, wireless devices and peripherals, and servers
  • Software applications such as electronic mail, electronic data interchange, enterprise resource management, and office productivity

Data and restoration
For business applications that cannot tolerate downtime, actual parallel computing, data mirroring, or multiple data center synchronization is possible yet costly. Other solutions for mission critical business applications and sensitive data include cloud backup and cloud-native disaster recovery, which reduce the need for expensive hardware and IT infrastructure.

Internal recovery strategies
Some enterprises store data at multiple facilities and configure hardware to run similar applications from data center to data center when needed. Assuming off-site data backup or data mirroring are taking place, processing can continue and data can be restored at an alternate site under these circumstances. However, this is a costly solution, and one that demands an internal solution that is itself infallible.

Cloud-based disaster recovery strategies
Cloud-based vendors offer Disaster recovery as a service (DRaaS), which are essentially “hot sites” for IT disaster recovery hosted in the cloud. DRaaS leverages the cloud to provide fully configured recovery sites that mirror the applications in the local data center. This allows users a more immediate response, allowing them the ability to recover critical applications in the cloud, keeping them ready for use at the time of a disaster.

Vendors can host and manage applications, data security services, and data streams, enabling access to information via web browser at the primary business site or other sites. These vendors can typically enhance cybersecurity because their ongoing monitoring for outages offers data filtering and detection of malware threats. If the vendor detects an outage at the client site, they hold all client data automatically until the system is restored. In this sense, the cloud is essential to security planning and disaster recovery.

Does Druva offer a cloud disaster recovery plan?

With Druva’s cloud-native disaster recovery plan, workloads on-premises or in the cloud backup directly to the Druva Cloud Platform, built on AWS. This eliminates recovery complexities by enabling automated runbook execution and one-click disaster recovery. Druva’s cloud-native disaster recovery includes failover and failback, either back to on-premises systems or to any AWS region or account without hardware, a managed DR site, or excessive administration.

Learn more here: https://www.druva.com/cloud-disaster-recovery/

Druva and sfApex deliver the best of SaaS data protection and management for Salesforce | Learn how sticky-promo-icon-carrot Druva and sfApex | Learn more sticky-promo-icon-carrot topbar-DxP-logo-navy