DEV Community

Zippy Wachira
Zippy Wachira

Posted on

Navigating Disaster Recovery in the Digital Age: Choosing the Right Approach – Part 4

Over the last three blogs, we have established the groundwork for creating a strong backup and disaster recovery (DR) solution. In Part 1, we explored the fundamentals of disaster recovery in today’s world and introduced six key factors to consider when developing a Backup/DR solution. In Part 2, we examined the differences between backup and disaster recovery, as well as the advantages and disadvantages of third-party versus AWS-native solutions. Then, in Part 3, we took a closer look at other crucial considerations, including scheduling, automation, RTO/RPO, and how the client’s physical or virtual environment influences the choice of tools.

Now, it’s time to shift gears and get into the heart of this series: the case study that inspired it all.

This blog will revisit the real-world client scenario that posed this exciting challenge. We'll use the elements covered in the previous sections to examine the client's needs, constraints, and objectives. How do the considerations we’ve explored shape the final solution? What trade-offs were necessary, and how were they balanced?

By the end of this post, you’ll have a clear picture of how theory meets practice when designing a customized DR/backup solution—and why no two solutions are ever quite the same.

Let’s dive into the case study and start piecing it all together!

Recap of the Case Study

Our client sought to conduct a Proof of Concept (PoC) for a disaster recovery solution on AWS for three critical on-premises systems, each with unique characteristics and requirements:

  • ERP Application: The crown jewel of their operations, hosted in a virtualized environment. This system was mission-critical, demanding a stringent Recovery Time Objective (RTO) of 1–2 hours.
  • Information System: A physical server housing essential data and workflows.
  • Library System: Another physical server, supporting key business functions but with more flexible recovery requirements.

Challenges with the Existing Solution
The client’s existing backup approach relied on native backup software to perform daily full backups. These backups were retained for only 24 hours before being discarded. Unfortunately, this setup introduced significant limitations:

  • Limited Retention: The 24-hour backup retention window left the systems vulnerable to data loss if issues went undetected for longer periods.
  • Unreliable Recovery: The manual restore process was cumbersome and prone to failures, undermining their ability to recover effectively inwhen needed
  • Critical ERP Recovery Needs: The ERP system required an RTO of 1–2 hours, a demand far beyond what the current setup could reliably support.

Requirements for the New Solution
The client’s objectives were clear: they needed a comprehensive and reliable disaster recovery solution that could include:

  • A robust backup system to ensure reliable and complete backups.
  • Efficient and dependable restoration processes to minimize downtime and avoid failed restores.
  • A solution for the ERP system that would support a stringent RTO of 1–2 hours and both on-premises and cloud-based recovery options.
  • Flexible RPO/RTO metrics for Other Systems

In addition to presenting a technical challenge, this case study offered a chance to create a solution that satisfied a variety of operational requirements while striking a balance between cost, complexity, and dependability.

Solution Analysis

For this case study, we are primarily going to be looking at two possible solutions: AWS Elastic Disaster Recovery Service (DRS) and Veeam Backup and Replication Service. For each of the factors, we will evaluate how well each solution aligns with the client’s requirements gradually narrowing down the options to determine the most suitable final solution.

1. Backup vs Disaster Recovery
Take a moment and think about it. Based on the details of the client’s requirements and existing setup, would you classify their need as a Backup solution or a Disaster Recovery (DR) solution?

The client initially requested a DR solution, but as we’ve discussed in previous blogs, clients often use “backup” and “disaster recovery” interchangeably. So, let’s dig deeper.

For starters, we know that in their on-premises environment, the client seemed to be operating a simple backup and restore system:

  • They performed manual daily backups using native software.
  • These backups were used to restore data to their systems in the event of a failure.

However, certain aspects of their request strongly indicated a need for disaster recovery rather than a simple backup solution:

  • Stringent RTO for the ERP System: The client emphasized a Recovery Time Objective (RTO) of 1–2 hours for their critical ERP system. This requirement points to a more comprehensive backup and aligns with DR strategies designed to minimize downtime.
  • Efficient Recovery Options: The customer wanted better recovery methods, especially for the ERP system. In earlier blogs, we noted that DR involves restoring not just data but also the entire system and application infrastructure to operational status—a significant distinction from backups.
  • Cloud Recovery Considerations: In the event of a recovery, the client made it clear that they were open to running the ERP system on the cloud. This is indicative of a DR strategy as it entails restoring and operating crucial workloads in a different location.
  • Focus on Business Continuity: The ERP’s critical nature indicates that the client is likely prioritizing seamless business continuity, a hallmark of DR solutions. With its manual procedures and extended restoration times, a backup system by itself would not be suffice.
  • Tailored Solutions for Other Systems: While the ERP has stringent recovery metrics, the client’s more flexible RTO/RPO requirements for other systems suggest they are looking for a solution that balances DR for critical workloads with backup solutions for less critical ones.

Now, let’s compare AWS Elastic Disaster Recovery Service (DRS) and Veeam Backup and Replication Service to assess their suitability for meeting the client’s DR requirement.

Image description

From the above, both Veeam and DRS align well with the client’s needs, each excelling in different areas. DRS offers fast and efficient failover capabilities, making it an excellent choice for the ERP system’s stringent recovery requirements. Meanwhile, Veeam delivers a robust backup solution, ensuring reliable and comprehensive backups for all three systems. Furthermore, Veeam also supports recovery, adding versatility to its functionality.

1. Scheduling and Automation
Considering the client’s request for a comprehensive backup process and enhanced recovery mechanisms, what level of scheduling and automation do you think would best suit their needs?
In their on-premises setup, the client relies on manual daily backups, with the following characteristics:

  • Backups are initiated and managed manually.
  • The restore process is also manual and prone to errors, with instances of incomplete backups and unsuccessful recovery attempts.

This manual approach introduces inefficiencies and increases the likelihood of human error, both of which are particularly problematic for critical systems like their ERP application.

Several factors in the client’s requirements strongly indicate the need for automation in their backup and recovery processes:

  • Comprehensive Backup Requirement: The client’s stated desire for a more comprehensive solution implies they need a system that goes beyond basic backups, incorporating robust policies for retention, versioning, and automated execution.
  • Focus on RTO/RPO Metrics: Automation directly supports achieving the low RTO for their ERP system by streamlining recovery steps and reducing delays caused by manual processes.
  • Desire for Enhanced Recovery: The issues faced in their manual restore process (incomplete backups, failed restores) further highlight the necessity of an automated recovery system that removes guesswork and error.

Now, let’s compare AWS Elastic Disaster Recovery Service (DRS) and Veeam Backup and Replication Service to assess their suitability for meeting the client’s scheduling and automation requirements.

Image description

From the above, we see that while DRS does provide automation for data replication and failover, it has limited flexibility due to its ‘always-on’ replication model. Veeam on the other hand, is more customizable and provides high flexibility in tailoring backup processes as well as automating backup and recovery.

In this part of our blog series, we’ve taken a deep dive into the Backup vs Disaster Recovery factor and explored the importance of Scheduling and Automation when evaluating solutions. By applying these factors to the case study, we’ve not only clarified the client’s requirements but also laid the groundwork for comparing two potential solutions: AWS Elastic Disaster Recovery (DRS) and Veeam Backup and Replication.

The analysis so far shows us that while DRS excels in automation and seamless disaster recovery, Veeam shines in customizable scheduling and robust backup capabilities. But the decision-making process doesn’t end here.

In the following blogs, we’ll continue to dissect the remaining factors.
So, stay tuned as we work toward identifying the most effective solution for the client’s needs.

Which solution do you think is pulling ahead so far – DRS or Veeam? Take your pick and don’t forget to check back for the next installment!

Top comments (0)