Companies generally have three alternatives: create a secondary data center, partner with a professional DR service provider, or use the public cloud for recovery.
When devising a disaster recovery (DR) plan, the first step is to decide where you will restore your operations if disaster hits your primary data center. In my experience, assisting companies with cyber recovery plans, they usually have three main options for their DR site. Let’s examine the benefits, drawbacks, expenses, and risks associated with each option.
The first alternative is to establish your own secondary DR data center in a location different from your primary site. Many large corporations choose this option; they construct a DR infrastructure that replicates their production system so that it can theoretically take over instantly.
The advantage of this method is control. Since you own and manage the hardware, you determine compatibility, capacity, security measures, and other aspects. You’re not dependent on a third party. However, the disadvantage is the expense. Maintaining idle redundant infrastructure is not inexpensive. You’ll need to buy, install, and service a second set of servers, storage, network equipment, and more. Real estate, electricity, cooling system – all basic data center costs are duplicated.
Continuing maintenance of a disaster recovery (DR) site can often be likened to the upkeeping of a swimming pool that is not in use: a task that requires constant attention yet offers little immediate value. When alterations or additions are made to the production hardware, these must be matched in the DR environment. Without consistent diligence, the DR site can easily be overshadowed by the main focus of operations, leading to untended changes, configuration discrepancies, and, ultimately, a less smooth recovery process in the event of a disaster. Regular DR testing and site syncing are quintessential to avoid such a scenario. For instance, if a disaster strikes, you don’t want to discover your recovery process is less seamless than anticipated.
Another method is to work with an external service provider who will create and manage your recovery site. Businesses such as SunGard thrived on this model where the responsibility is effectively transferred to them. Instead of setting up your own infrastructure, you essentially book the DR data center capacity with the provider.
SunGard, along with similar companies, creates and manages large recovery sites, designed purposefully to host the infrastructure and data of their clients. When there is a disaster, clients can rely on the provider to restore systems and resume operations. The costs associated are generally lower than running a self-managed DR facility, as the providers can optimize resources across a communal client base.
The challenge with this strategy is that it is not without risk. There is an inherent “shared” aspect in the model that could cause potential issues. Should a significant event occur that impacts a large area, multiple clients could be competing for the same DR resources at once. This could result in a degraded recovery service if the provider underestimated demand or overextended their capabilities.
Furthermore, although shared hardware can be perfectly suitable for testing purposes, you might require configurations similar to your production environment for genuine recovery work. Despite many providers promoting flexibility and adaptability, your options might be limited if your needs are highly specific or if your settings are exceptionally unique.
Utilizing the public cloud is a third option for housing your DR infrastructure. Market leaders such as AWS and Azure boast virtually unlimited capacity that can be scaled to meet even tremendous demands when disaster strikes. Similar to normal operations, your computing and storage needs during recovery are available on-demand.
The inherent scalability of the public cloud offers protection against the “oversubscription risk” associated with traditional DR providers. Except for instances of massive region-wide service disruptions, which are infrequent, cloud providers have ample reserve capacity to manage sudden increases in client resource requests.
One key benefit of this approach is that it offers flexibility in cost and usage. Payment is only required for the cloud infrastructure that is utilized during testing phases or actual recovery incidents. In periods of “DR idle” where backup replication is the focus, expenses are reduced significantly. Unlike conventional solutions, the capacity of the cloud can be scaled up or down to perfectly align with your requirements – eliminating the need to speculate on future demands.
This capability encourages more thorough and frequent DR testing. Given the speed and cost-effectiveness of managing servers on the cloud, tests that were previously deemed too expensive or resource-intensive are now practical. This allows for quick validation of current backups without incurring unnecessary expenditure. Consequently, testing frequency increases, enhancing DR readiness in the process.
The cloud is also beneficial for the automation and scripting of infrastructure. Using infrastructure-as-code tools, server configurations, resource deployment rules, network layouts, and additional elements can be predetermined. Consequently, during a DR invocation, all the aspects of cloud construction can almost occur automatically based on your templates and preferences, eliminating the need for manual intervention.
However, making use of the public cloud for recovery operations is not without its challenges. The most glaring disadvantage when compared to alternatives is the incessant need for a network connection to the data center of the cloud provider. Interruptions in the local internet connectivity, usually caused by the disaster, can result in cloud access being halted.
Occasionally, internet redundancy might already be established. However, if it’s not, teams might need to explore alternatives such as satellite network connections to ensure continued cloud connectivity during emergencies. This could add to the overall cost and complexity of setup.
A crucial factor to consider is managing the disaster recovery (DR) data replication into the cloud account beforehand. For launching production servers as needed, continuous backup data and VM image streaming into cloud storage is necessary. The costs for replication software, network bandwidth, and cloud storage are not negligible.
Lastly, some apps and databases that function flawlessly on-premises may run into problems in a cloud infrastructure setting. Allocating ample time for testing these systems is essential to uncover potential design issues that might hinder DR readiness during crucial times.
Developing a thorough disaster recovery plan requires a considerable amount of time, resources, and continuous budget. However, neglecting this area could increase business risk significantly more than the immediate expense. According to data from the U.S. Federal Emergency Management Agency, approximately 40% of small businesses shut down permanently after experiencing a disaster due to a lack of preparation for recovery.
Your original content is clean and does not need any modifications. As no links, iframe, embed, form, img, style, noscript were found so no actions were taken. It’s ready to be published as it is.
My advice after assisting countless companies in designing resilient cyber recovery programs boils down to this: Comprehensive readiness isn’t a project with a completion date. It’s an ingrained business philosophy that perseveres. Appoint DR planning owners, conduct failure scenarios, clearly define policies and document procedures. Make readiness checks and tests routine. Embrace disaster preparation as a standard cost of doing business in this modern age full of risk and uncertainty. A culture centered on resilience in the face of disaster or disturbance is key to survival and success.