Business Continuity Plan (BCP) and Disaster Recovery (DR) plans are frequently the most important & yet least attended parts of implementing & maintaining SAP. The focus of the implementation is getting the business requirements mapped into the system. Post go-live, the entire focus shifts to ‘hyper-care’ - ensuring that the implementation works as expected, and any previously unforeseen or unplanned scenarios are rapidly implemented. Until one day, there is a disaster - and no users can log into the SAP system. That is a situation that can - and should - definitely be avoided.
Business Continuity necessitates that organizations take adequate preventive measures to ensure that the business continues to function uninterrupted despite facing a disaster situation.
Minimizing the effects of a jolt and recovering any lost asset is the essence of a BCP. The Disaster Recovery strategy is a key component of the Business Continuity Plan.
Need for a Disaster Recovery Strategy
Before we can even contemplate a DR plan for SAP businesses, let us understand why it is so important.
- The foremost reason is the direct cost to the company by way of loss of operations. The financial costs of SAP application failure to a Fortune 1000 company is estimated at around $1 million per hour. By that measure, the cost of a large-scale disaster could run into billions. For relatively smaller companies, the numbers are relatively lower, but these are losses nonetheless.
- SAP environment downtimes can bring entire organizations to a halt, leading to huge losses in man-hours within and outside of the organization. The ripple effects are manifold.
- Without a BCP, an organization’s productivity can grind to a halt, also causing an impact on supplier/customer relationships, along with a loss of reputation.
Disaster Recovery for SAP
So, how does one go about preparing for a disaster? Set up a robust plan, of course! Here are eight things to consider when creating a BCP for SAP:
1. Planning the Basic Infrastructure
Power, connectivity, hardware and software make up your base infrastructure layout. They key point to remember is - redundancy, redundancy and redundancy.
- Among the first things to ensure as part of an effective DR plan for a SAP-based business is to maintain an ecosystem that has uninterrupted power supply. Plan for redundancies - multiple providers, and power generators. With an adequate supply of diesel to keep them running for at least 48 hours.
- Redundancies also need to be planned for Internet connectivity. A single connection is fraught with risks, and at least two separate companies providing connectivity will mitigate the risk of connection downtimes. This also needs to be setup to automatically switch using the adequate network hardware.
- Next, ensure that all your hardware has built-in redundancy. The last thing you would want to experience is to be left with data backups & nowhere to install them. This is very crucial to get you back on your feet without delay, in the event of a disaster. Regular audits to ensure that these systems are functioning as expected is the first and most important aspect of any DR plan.
- Lastly, ensure all software an all devices - OS, Switches, Security Patches, Application patches - are all up-to-date and align these with adequate technical support from the requisite vendors. This protection is required to ensure your are protected against bugs, vulnerabilities & ransomware attacks.
2. RTO and RPO – the two pillars of DR management
Two key parameters form the crux of DR planning for SAP.
RTO stands for Recovery Time Objective, and RPO stands for Recovery Point Objective. Practical, pre-defined & pre-approved RPO & RTO are essential to charting your DR plan.
- RTO refers to the maximum period for which the applications or systems may remain unavailable. How are you getting alerted of a disaster? Once the team is alerted, how long would it take before they can switch to the SAP Disaster Recovery Site? Can the switch-over be automatically scripted? Cloud providers such as AWS offer scripting services to automate DR. At least a couple of hours should be a minimum RTO for SAP if such scripting is not set up.
Ask these questions:
- How long can businesses wait before the SAP system is available to use again?
- What is the impact on business during the down time & how can this window be shortened?
- Realistically, how long will it take before the DR team can enable the Disaster Recovery setup & inform the business users?
- RPO refers to the amount of data loss that is acceptable in the event of a disaster. When disaster strikes - whether in the form of connectivity loss or something catastrophic such as fire or flood, the SAP system will go down instantly. The only way to recover would be to switch to a Disaster Recovery setup - which will be a separate physical setup altogether. RPO would be determined by the last backup taken to the DR site. RPOs can be in seconds, or hours - this depends upon your DR strategy.
Ask yourself these questions:
- How much data can business users afford to lose?
- Which are key milestones in terms of transactions that must be captured?
- Realistically, which data Recovery Point can your DR strategy offer?
3. SAP Disaster Recovery Plans and the technologies used
SAP DR Plans begin by planning backups of the database that stores all the information managed by the SAP application. The technologies used for this purpose can be traditional or advanced.
i. Traditional Technologies
Traditionally, data backups were copied to Tape Drives, or to Network Attached Storage (NAS) devices. Tape drives would then be stored in fire-proof safes, and rotated to an external location on a weekly basis. NAS backups from the main center would be sent to the DR center via a secure and trusted network connection. This is a time-proven strategy, very sturdy and reliable - but takes much longer to get the system operational again. Advanced technologies reduce recovery time and are also less expensive.
ii. VMware SRM for SAP DR
VMware Site Recovery Manager is a technology where the SAP logs and dynamic data are stored on both the original server and a virtual machine as well. This method of DR works if the primary server is located on VMware. The backup SAP logs are stored on VMware farms at the DR site. The two VMware appliances are in sync. This technology is ideal when the emphasis is on achieving very low RTO and RPO levels.
However, the downside is reliance of VMWare technology & vendor lock-in. It would be difficult to move to any other format of DR planning. This also comes with the added negatives of flexibility & cost.
iii. Cloud Disaster Recovery
Discus recommends a Cloud-based DR strategy. This is, of course, the de-facto option for SAP on Cloud, and also ideal for co-located or on-premise SAP users. A Cloud DR strategy offers the perfect balance of flexibility & price. With a DR On AWS, you pay only for servers that required for DR and not for an entire parallel infrastructure.
iv. HANA-specific DR
With SAP HANA (High-performance Analytic Appliance) databases, SAP now provides the capability to process huge amounts of real-time data in the shortest possible time. It uses in-memory computing to achieve this feat. DR for HANA-specific applications require a different approach. Replication of HANA from the main data center requires a HANA appliance of equal capacity at the DR center as well. A DR On AWS is also recommended for SAP HANA appliances, and can also offer the maximum flexibility & control.
v. DR as a Service (DRaaS)
DRaaS is a third-party service that helps its customers replicate and recover vital data from their applications and systems. It helps build a robust IT system that streamlines network usage and mitigates damage caused due to outages. The production environment on the primary infrastructure is replicated on a secondary DR infrastructure, built and maintained by a service provider. Scalability and flexibility are the main features of this DR plan.
It needs to be noted that with many DRaaS providers, you will need to clarify exactly what is covered in the DRaaS service.
4. Geographical Location of DR Site or DR center
After choosing the method of DR to be implemented for your SAP, the next most important aspect is to choose the ideal DR center that provides a safe house for your data.
The DR center should in a different seismic zone. A rule of thumb is to have at least a 300 km (~200 miles) distance between two data centers. This helps mitigate the risk of seismic activity at these centers.
The primary physical risks that a data center would face include those having large footprints - flood, riots, earthquakes & blackouts to name a few. Then there are localized disasters - fire, connectivity drops, power outages & infrastructure failure. The distance also shields against political instability in a region.
Larger Cloud providers such as AWS and Microsoft Azure offer data centers across the globe, and multiple locations within the same country also. Some countries’ laws mandate that financial data needs to reside in the company’s country of registration. That, and regulations such as General Data Protection Regulation (GDPR) also need to be on your checklist of things to consider when planning the DR location.
5. Regular validation of the DR plan & DR drills
The success of a DR plan lies in its consistent ability to function. Creating a review & testing routine of your DR plan is a must to ensure that it does not fail you in your hour of need.
There are three parts of this validation.
- On a daily basis, set up scripts and/or manual checks to ensure that data is being replicated/backed up to your DR site. If this data does not get replicated, all other efforts will be in vain.
- Validate the plan periodically - ask questions such as:
- Are the RPO & RTO still valid for the business? Can these be improved?
- Are we still in tune with the current legal framework?
- Do new technologies or frameworks now exist that should be incorporated?
- Perform DR drills - these are similar to fire drills - and a quarterly drill is recommended. Simulate disaster scenarios - for example where the DC becomes inaccessible, and all systems need to point to the DR site. Further:
- Test Network redundancy
- Test rerouting of user traffic to the DR site
- Perform a login & transaction/report run to ensure the DR site is, indeed, in good health
The test results must align with the set parameters. Deviations must be addressed, corrections made and documented immediately.
Staff and employees that are a part of the DR plan should also be a part of the testing process. Define their roles and responsibilities and integrate them with the process to avoid pitfalls. Training should include simulations of disaster scenarios, regular drills, and performance report to keep the personnel alert and ready for any eventuality.
6. Map & document your dependencies
Your SAP production environment should be scanned and audited for its IT assets - along with hardware & software, include human resources as well. Map the process with the personnel - your Core Team Members (CTM) involved – this helps create a flowchart of dependencies, that further helps in training and cross-training too. Identify the various business processes and the applications that each of them depends upon. This helps to identify the RTO of each application and in effect that of the entire business.
7. Integrating Information Security with your DR Plan
Securing your IT assets against virus attacks is a must. Your data center and DR center are also IT assets and must be protected with the best antivirus solutions available. However, security recovery and disaster recovery are very different. While security recovery is about protecting your information from attacks, DR deals with business continuity.
Consider adding deep scanning solutions (companies like Trend Micro offer on-premise & AWS-friendly solutions) for proactive monitoring of any potential threats. It is better to cull the threat as soon as it is identified rather than after it has infected your setup.
It should be noted that while *nix based systems are less vulnerable to threats, security attacks & vulnerabilities still can be introduced at the Application Layer.
An effective DR plan is one where the response criteria and procedures are well-defined and documented. This should include severity definitions, escalation rules, handling sensitive information, a response matrix and step-by-step instructions on how to achieve complete recovery, and restore normal operations.
From time to time, this matrix also has to be validated - from ensuring that the people identified for certain roles are still available for the task, to whether any of the IPs or Passwords have changed since the last DR drill.
At Discus, we specialize in setting up & maintaining DR infrastructure on AWS and other Cloud or co-location providers. Additionally, we offer proactive monitoring of your infrastructure as part of the SAP Proactive Basis Support - please get in touch with us to learn more.