This article will explain the special challenges in protecting large ERP and CRM application data residing on mainframe database servers. It will examine why more and more Fortune 500 companies have chosen DB2 OS/390 database servers for implementing CRM and ERP applications and why in doing so IT staff have new hurdles to overcome to safeguarding and leveraging their considerable IT investment. This article will contrast the various strategies for managing backup and recovery on a subsystem level.
Why Companies Choose DB2 OS/390 Servers for CRM and ERP
As IBM says, many companies are finding that DB2 for OS/390 and z/OS is the "super-server of choice for the twenty-first century providing enterprise-wide data management for e-business, business intelligence, CRM, and ERP applications such as Siebel, Vantive, Baan, PeopleSoft, and SAP R/3." DB2 for OS/390 and z/OS delivers enterprise values -- high availability, scalability, performance, and capacity. It provides extensive connectivity which makes it easy for client-server and Web-oriented applications to take advantage of the high availability, capacity, and security that z/OS provides. It offers large data capacity and high transaction performance. DB2 supports transactions arising from Web servers, CICS, IMS transaction management, MVS batch jobs, and via distributed connections from remote clients on numerous platforms, making it easier to integrate legacy data with new CRM and ERP applications.
DB2 data sharing takes advantage of the Parallel Sysplex architecture which is the most scalable and robust clustering architecture that exists. Data sharing improves price for performance, improves the availability of DB2, and extends the processing capacity of the system.
Many companies are finding that the high availability, scalability, performance, security, and manageability of DB2 on the mainframe can provide better service to their customers and at the same time lower their total cost of ownership.
Challenges in protecting ERP and CRM systems on the mainframe
Large number of objects
Many ERP and CRM systems have thousands of DB2 objects. For example, Version 4.5 of SAP R/3 applications can have more than 7,000 table spaces and more than 19,000 indexes. This magnitude of objects represents a huge challenge for backup and recovery analysis and sequential processing at the object level. In later versions of SAP R/3 applications, the number of objects rise dramatically:
* SAP R/3 4.6 about 23,000 indices, 20,000 tables
* SAP R/3 4.7 about 37,000 indices, 45,000 tables
System level recovery
ERP and CRM applications typically maintain complex referential integrity relationships between objects. These relationships are frequently not defined to the database or documented to the customer, so it is difficult to split the objects into groups that can be recovered independently. Therefore, to maintain integrity, any recovery must include all of the application data in the DB2 subsystem.
Difficult to find a recovery point
Recovering data usually brings to mind various catastrophic disaster scenarios, natural or machine oriented. It goes without saying that the ability to execute a reliable disaster recovery is critical to every organization. However, the need to execute a fast and accurate point-in-time recovery in some ways is even more critical, because the need to recover from a human error, a failed application, or failed batch job is far more frequent and actually poses the greater threat to data integrity and system availability.
When performing a point-in-time recovery, you can't just recover to an arbitrary point in time, because transactions may be in progress at that point in time and the database may be in an inconsistent state. Step one to performing a point-in-time recovery is locating a point in time on the system when all the data is consistent. This is referred to as a quiet point. Ideally, you want to select a quiet point that is just prior to the point in time in which the error occurred. This is the desired recovery point for the recovery.
However, finding a quiet point within a large, highly dynamic ERP or CRM environment is nearly impossible. In the past, a traditional method for obtaining a point of data consistency is through use of the QUIESCE utility. The QUIESCE utility has a limit of 1165 tablespaces that can be specified in a single command. Because many ERP and CRY systems have more than 1165 tablespaces, the QUIESCE utility cannot be used to establish a single quiet point. Furthermore, attempting to execute a QUIESCE on a busy system will frequently cause transactions to fail with timeout errors.
Long time to recover
DB2 was originally designed to handle a relatively small number of large objects. Many improvements have been made over the years, but it is still the case that if you use the DB2 utilities in a conventional way to recover at the object level tens of thousands of objects, it can take many hours.
Dynamically created objects
In many ERP and CRM systems objects are created and dropped dynamically without the knowledge of the database administrator. To manually determine what objects need to be recovered is extremely difficult.
Expert DB2 skills required
Managing a DB2 system with thousands of objects using conventional techniques requires expert level DB2 skills and is very labor intensive.
Strategies for managing backup and recovery on a subsystem level
Conventional backup and recovery
In theory, you can use the conventional DB2 COPY and RECOVER utilities to protect an ERP or CRM database on the mainframe. These tools have been available for more than a decade, and when used properly, they are very reliable. However, due to the large number of objects, it is almost impossible to manually ensure that all of the objects have been backed up. If a point-in-time recovery is required, it is almost impossible to find a recovery point where the database is consistent, so it is frequently necessary to do a conditional restart recovery. Performing a conditional restart is a complex manual process that involves many steps (about 100) and requires a high DB2 skill level. After the conditional restart, many table and index spaces may be in an inconsistent state (possibly hundreds or thousands) and must be recovered manually. Without tools, it is very difficult to determine which tablespaces have been left in an inconsistent state, so the DBA may have to recover all of the spaces in the system, which can take many hours. The complexity of the conditional restart process makes it very error prone and greatly increases the risk of prolonged down time.
For disaster recovery, you have to manually generate jobs to recover thousands of table and index spaces, manually submit these jobs, and manually monitor their execution. A disaster recovery on a system with thousands of objects can take many hours.
In summary, using conventional DB2 backup and recovery techniques for ERP and CRM systems is very expensive in terms of system availability, system resources, and DBA resources. Many companies have decided that it is not practical to use conventional backup and recovery techniques for ERP and CRM systems.
Recovery using intelligent hardware
Another strategy for backup and recovery of ERP and CRM systems is to use intelligent hardware. Using intelligent hardware together with the DB2 SET LOG SUSPEND command, you can make an I/O consistent backup of your DB2 system by splitting mirrors or making volume snapshots. This backup technique is very efficient in that it requires very few host resources (CPU or I/O). One disadvantage of this approach is while the DB2 log is suspended, no application can do updates, and some applications may time out.
If you need to recover to the time that the backup was made, you can use the intelligent hardware to instantly (or at least almost instantly) restore the primary disk volumes from the mirrors. You can then do a DB2 restart, and no recovery is required. This recovery strategy is very effective for recovering to the time when the backup was made.
If you need to recover to a point in time between the current time and the time the backup was made, you can use the hardware to restore the primary disk volumes from the mirrors and then you can manually submit LOGAPPPLY ONLY recover jobs to apply the changes made from the time the backups were made until the recovery point. You may have to manually recover hundreds or thousands of objects manually.
For disaster recovery, after you make backups using split mirrors or volume snapshots, you can copy those volumes to tape and send them to your disaster recovery site. If you need to do a disaster recovery, you can copy the tapes to volumes at the recovery site and restart DB2. If you need to recover to as close to the current time as possible, you can manually submit LOGAPPPLY ONLY recover jobs to apply the changes made from the time the backups were made until the end of the last available log. You may have to manually recover hundreds or thousands of objects manually.
In summary, if you can afford to lose all of the updates after the backup was made, you can deal with application time outs caused by a DB2 log suspend, and you have the required intelligent hardware, using intelligent hardware for backup and recovery may be a good solution. If you need to be able to recover to an arbitrary point in time or you cannot afford to lose much data during a disaster recovery, using intelligent hardware without other tools may not be practical.
Recovery automation tools
A third strategy for backup and recovery of ERP and CRM systems is to use a recovery automation tool. Tools are available which can completely automate backup and recovery of your DB2 system.
For backups, they can detect which spaces have changed and only backup the changed spaces. On a typical system this can cut the backup time from many hours to less than an hour. Also, they can automatically generate, submit, and monitor the required jobs, greatly reducing the risk that some objects are not backed up, and greatly reducing the DBA resources required.
If a point-in-time recovery is required, the tool can automatically find a recovery point where the database is consistent, and it can automatically recover just the spaces which have changed. Some tools can also choose the optimum recovery strategy e.g. backing out changes vs. doing a forward recovery. Typically, this can reduce the recovery time from many hours to less than an hour. Also, it can automatically generate, submit, and monitor the required jobs, which speeds up the process and ensures that all of the data will be recovered to a consistent point. Furthermore, these tools support recovery to an arbitrary point in time.
For disaster recovery, tools are available that can completely automate the disaster recovery process. They can automatically generate, submit, and monitor the required jobs. Multiple parallel job streams are supported to reduce the recovery time. All of the data up to the end of the last available log can be recovered, so that no data is lost.
Many companies have found that recovery automation tools quickly pay for themselves through the savings due to increased availability, reduction in system resources required for daily processing, and reduction in DBA time required.