James Thew - Fotolia
Governing data and measuring and monitoring data quality have always been important to companies, and as a result, they spend lots of time and money governing data quality and processes. Data quality in the source system is more important than ever when a new technology like SAP HANA enters the picture.
In SAP HANA, it is possible to report directly against source data rather than staging data through multiple layers of extracting, transformation and loading (ETL) processing. This is both powerful and transformative for a business in terms of real-time reporting directly from a system of record. However, if the Data Quality has traditionally been remediated in latent ETL processing en route to a data warehouse or business intelligence (BI) system, remediation is not occurring in real time in the source.
Like in any scenario, however, reporting with any degree of confidence requires quality data. This is especially true in the case when Business Suite running on HANA is the primary system of record. Now that you can report on Suite data directly from HANA, the data in the source has to already be of high quality.
In the past, customers have often solely relied on data redundancy demanded by the performance limitations to also police data quality issues back in the source. Some data quality issues corrected during the data exchange process to redundant staging layers, and BI constructs often make a full round trip back to the source. This helps to correct the data quality issues in source data but happens with a penalty of latency. The need for redundant data in staging database layers and business intelligence constructs in SAP HANA was eliminated to increase performance standards. In this scenario, better performance is met by the elimination of this process, and through the use of the SAP HANA in-memory platform. However, without attention to data quality and proper governance and monitoring of source data, SAP HANA will only deliver bad or incorrect data quickly.
Luckily, organizations already have designs they need to deal with data quality issues, through these same latency-driven ETL processes within their business intelligence (BI) systems. What must occur is an adaptation of these designs from an "after the fact" latency-driven process into a parallel real-time "effort at the point of entry" process. These mechanisms must be accounted for or at a minimum properly understood in this new "real-time enterprise" before SAP HANA can fill all of these performance gaps and be a transformative force for the enterprise. Real-time POE data quality processing was once a nice to have, or something to tackle in the future, but in a post SAP HANA landscape it is a necessity.
How is real-time governance different? Most obviously, real-time governance happens in real time. This introduces several challenges. Many master data management activities are currently reactive processes for organizations. Operations like matching and cleansing are often based on transactional latency. Data must be collected and processed to be placed into cleansed containers with quality, deduplicated data. That is not to say that the need for this process goes away when real-time processing with SAP HANA enters the picture, but new data quality gaps introduced by real-time processing must be thought through and gaps accounted for with other means of remediation, so that quality data will be available for reporting with SAP HANA.
Organizations must pay attention to this aspect. You have spent years putting these corrective measures in place in a reactionary mode that requires a natural latency. This shift to a real-time enterprise will not happen overnight. So, don't expect to address more than your source data can deliver with processing speed alone. Even speed that allows Google-style data introspection will not yield proper answers if the data is simply wrong.
There are some great aspects that SAP Data Services Data quality management introduces to counter the challenge of dealing with data quality in real time. These data quality tools exist in SAP Data Services, and include matching (deduplicating) and data as well as various types of address cleansing in either batch processing or real time.
SAP Data Services employs real-time services that can be called up as data is created in SAP Business Suite on HANA. These real-time SAP Data Services data quality services do not provide the only answer, but they may provide a certain supplement that is needed to fill a temporary data quality gap that the real-time enterprise introduces. This way the immediate issues may be handled upfront by SAP Data Services as data is created in the SAP Business Suite, and then complex matching and new data from multiple sources can be handled with batch processing after the fact.
Real-time data quality solutions
As discussed earlier, SAP HANA introduces new real-time challenges and disrupts some existing data quality reactionary solutions. The obvious choice is to fix all of the data at the point of entry. While this is a lofty goal, organizations in most cases are not mature enough in a data quality or data governance cycle even if the source of data is centralized like in SAP Business Suite. Sometimes using some of the built-in real-time data quality capabilities that SAP Data Services introduces are a means to move toward that ultimate correction at the point-of-entry-style active governance. The matching and data and address cleansing discussed earlier offer many core SAP Data Quality functions:
- Address cleansing for 240 countries
- Address suggestions
- Person and firm data cleansing with country-specific data quality cleansing package ready for use out of the box
- Side of street confirmation of an address
- Complex matching transforms to provide a vehicle to deduplicate data in a completely customizable manner
These data quality functions can be called from the SAP Business Suite so that any reporting application can benefit from these enrichments in real time. They do not replace your MDM/data hub solution. Instead, think of them as a powerful supplement to enhance a process that always has inherent latency. You can now get better data faster, and these techniques offer a powerful supplement to the organization's existing data quality and data management functions. This supplement that SAP Data Services' real-time capabilities provide is like a real-time "MDM-lite" to complement your existing processes.
These advancements offer many new possibilities. Much better-quality operational reporting can be exposed. So, while SAP HANA does offer disruption and a time to potentially rethink data quality in the enterprise, there are still some solutions offered in SAP HANA to mitigate data quality challenges in the near term.
About the author:
Don Loden is a principal consultant with full lifecycle data warehouse and information governance experience in multiple verticals. He is an SAP-certified application associate on SAP Data Services, and he is very active in the SAP community, speaking globally at numerous conferences and events. He has more than 14 years of information technology experience in the following areas: ETL architecture, development and tuning; logical and physical data modeling; and mentoring on data warehouse, data quality, information governance and ETL concepts. You can contact Don by email at email@example.com and find Don on Twitter @donloden.
Read about SAP BusinessObjects EIM 4.0 and its new Data Services features
Learn how to use SAP Information Steward to see how BEx query changes affect reports