Building a foundation with SAP Data Services

In this SAP Press book chapter excerpt, learn how to build a strong data foundation using the functionality in the SAP Data Services package.

The SAP Data Services tool can help companies create sturdy foundations for their data. In this SAP Press book chapter excerpt, find an introduction to SAP Data Services software and find out how to integrate the software with SAP and non-SAP systems. You'll also read tips for SAP data cleansing and data validation.

SAP’s Solutions for Enterprise Information Management

This excerpt from Enterprise Information Management with SAP by Ginger Gatling, Corrie Brague, Ryan Champlin, Helmut Stefani, Niels Weigel, George Bryce, Srikant Dharwad, Andreas Engel, Will Gardella, Simer Grewal, Ina Felsheim, Stéphane Haelterman, Eric Hamer, Rob Jackson, Mike Keilen, Markus Kuppe, Terry McFadden, Louann Seguin, Akshay Sinha, Eric Stridinger, and Anthony Waite is reprinted here with permission from SAP Press, copyright 2012. Download a PDF of this chapter.

SAP’s solutions for EIM have both great breadth and depth, spanning the full capabilities for managing information from its acquisition, through active use, until retirement and destruction. This chapter discusses all of the currently available products and solutions SAP offers for EIM. The portfolio will continue to grow as SAP continues to make investments in this area.

Chapter 1 discussed EIM in SAP’s overall portfolio; Figure 4.1 shows the same portfolio view but also lists the EIM products, which are explored further in this chapter.

 SAP’s Solutions for EIM

Figure 4.1 shows SAP’s solutions for EIM in schematic form. At the top of the graphic are applications that depend on the data that the SAP solutions manage; at the bottom are the data sources and the types of data, which both feed the information management products and are managed by them. The middle of the graphic is dominated by the set of enterprise information management solutions offered by SAP, all beneath a box labeled “Information Governance.” (Information governance is not a product, but a discipline that is supported by multiple products. It is discussed in more detail in Chapter 3.) The following products are included in SAP’s solutions for EIM:

  • SAP Data Services
    Hereafter referred to as Data Services, this is a product that has combined SAP Data Integrator and SAP Data Quality Management into one, even though the solutions can still be delivered independently if needed. However, most customers require both products, so this book will focus on Data Services and refer to Data Integrator as the data integration capabilities delivered with Data Services. SAP Data Quality Management will be referred to as the data quality capabilities delivered with Data Services. A third major capability built into Data Services is called text data processing.
  • SAP Information Steward
    Hereafter referred to as Information Steward.
  • SAP NetWeaver Master Data Management
    Hereafter referred to as SAP NetWeaver MDM.
  • SAP Master Data Governance
    Hereafter referred to as SAP MDG.
  • SAP NetWeaver Information Lifecycle Management
    Hereafter referred to as SAP NetWeaver ILM.
  • Enterprise Content Management (ECM) by OpenText
    ECM by OpenText is a group of related products. ECM products included in SAP’s solutions for EIM are:
  • SAP Archiving by OpenText
  • SAP Document Access by OpenText (includes SAP Archiving by OpenText)
  • SAP Extended Enterprise Content Management by OpenText (hereafter referred to as SAP Extended ECM). SAP Extended ECM includes both SAP Document Access by OpenText and SAP Archiving by OpenText.
  • SAP Invoice Management by OpenText
  • SAP Document Presentment by OpenText

While there are many OpenText products included with SAP’s solutions for EIM, this book will focus on SAP Extended ECM and its components: SAP Document Access by OpenText and SAP Archiving by OpenText.

  • Sybase Replication Server (hereafter referred to as Replication Server)
    Replication Server supports the enterprise data management environment by replicating and synchronizing databases such as Sybase Adaptive Server Enterprise (ASE), Oracle, Microsoft SQL Server, and IBM DB2 Universal Database (UDB) database transactions.
  • Sybase Power Builder (hereafter referred to as Power Builder)
    Power Builder is a modeling tool that offers a model-driven approach to improve business intelligence and information architecture. Enterprise architects use it to bring together many of the moving parts that make up the enterprise: systems, applications, business processes, and requirements.

In addition to the software products mentioned above, there is a solution called data migration that is built on top of existing EIM products. It is not listed in Figure 4.1 because it is not standalone software, but a solution built on Data Services.

This chapter will provide a brief introduction to SAP’s capabilities for EIM, whether they be software products (such as Data Services, Information Steward, SAP MDG, etc.), solutions built on software products (such as data migration), or disciplines (such as information governance). Part II of the book will provide greater details on some of the capabilities; however, due to the extent of the capabilities, not all of them will be covered in depth.


We do not focus on Sybase solutions in this book. For more information on Replication Server and Power Builder go to

Chapter 1 discussed the use of EIM for on-boarding, active use, and off-boarding of information. The following is a list of SAP’s EIM capabilities for each area, all of which will be discussed in more detail in this chapter:

  • On-boarding
    Includes Data Services, Information Steward, SAP NetWeaver ILM, SAP NetWeaver MDM, and SAP MDG as well as the use of data migration and the on-boarding of content with SAP Extended ECM.
  • Active use
    Includes Data Services, Information Steward, SAP NetWeaver MDM, SAP MDG, SAP Document Access by OpenText, and SAP Extended ECM.
  • Off-boarding
    Includes Data Services, SAP Document Access by OpenText, SAP Extended ECM, and SAP NetWeaver ILM.

4.1 SAP Data Services as a Data Foundation

Data Services is the primary tool to extract, transform, and load data from one or more source systems into one or more target systems; Data Services lets you improve, integrate, transform, and deliver trusted data to critical business processes across the enterprise for both SAP and non-SAP systems. Data Services can be used in almost any scenario that requires you to move, enrich, transform, or cleanse data, and in this regard functions as the technology foundation for a coherent enterprise information management strategy. The following sections provide an overview of the capabilities delivered with Data Services.

4.1.1 Basics of SAP Data Services

Data Services is used for data migration, systems migration, data synchronization, application data cleansing, loading data warehouses and data marts, and for query, reporting, analysis, and dashboard data provisioning. The three major capabilities of Data Services are data cleansing, data validation, and text data processing. The core of Data Services is the data services engine (see Figure 4.2, which shows Data Services’ capabilities and typical uses). As mentioned previously, Data Services combines two products: Data Integrator for extraction, transformation, and loading (ETL) and SAP Data Quality Management for data validation and data cleansing. The third major capability of Data Services is text data processing and is referred to simply as text data processing.

The left side of Figure 4.2 shows the data sources supported by Data Services. Data Services can access data from a wide variety of applications and file sources and can consume almost every type of data—structured, semi-structured, and unstructured—from those sources. Data Services can be called in batch mode (for example, to extract and deliver data for reporting in the data warehouses) or via client applications such as SAP ERP, SAP Customer Relationship Management (CRM), or custom applications to perform data transformation and cleansing in real time.

Notice in the figure that Data Services shares a common technology layer with the SAP BusinessObjects Business Intelligence (BI) platform. This allows for common user provisioning; advanced user management; password and security policies; use of external authentication mechanisms such as Active Directory, lightweight directory access protocol (LDAP), or SAP NetWeaver Identity Management; and granular access control.

 Data Services Architecture

You can see in Figure 4.2 how Data Services can access many types of systems and applications and can work with many kinds of data and how that data can be profiled. The connectivity options and profiling capabilities of Data Services are described below.

Connectivity Options for SAP Data Services

Data Services supports many connectivity options for structured and unstructured data. These include SAP applications, databases, other vendor applications, and pure files (Excel, mainframe, etc.). Details of the connectivity options are discussed in Chapter 7.

Data Profiling in SAP Data Services

Data profiling is the practice of determining the overall quality of data and finding data anomalies. While Information Steward is the primary tool for data profiling, technical data profiling can be done directly in Data Services; you can go beyond simple viewing of the data to conduct an analysis of your data. You can build better jobs in Data Services by understanding the following types of information:

  • Frequency distribution
  • Distinct values
  • Null values
  • Minimum/maximum values
  • Data patterns (e.g., Xxx Xxxx99, 99-Xxx)
  • Comparison of values between data sets
  • Drill-down to view specific records

The profiling in Data Services allows you to quickly assess the source data to discover problems and anomalies, such as:

  • Twenty-one percent of the employees in the Human Resources source system do not have a country associated with their ID.
  • There are four genders entered: 60% are male, 30% are female, and the other 10% are either “unknown” or have a question mark in the gender field.
  • Benefits are available for all new employees after one month of service, but 35% of three-month employees still have no benefit ID assigned to their HR records.

You can also quickly detect patterns, distinct values, and null values for zip codes, product codes, sales items, and other key data fields to better understand your data.

4.1.2 SAP Data Services Integration with SAP Applications

Data Services provides seamless integration with SAP applications as an integration and data quality tool. Specific examples include SAP CRM, SAP ERP, SAP NetWeaver MDM, SAP MDG, SAP NetWeaver Business Warehouse (BW), and SAP HANA. Data Services is used as a service to the applications, called only to perform a specific function when needed. It can also be used to load data into the applications. Next we discuss a few examples in a bit more detail.

SAP Data Services for SAP Data Quality Management with SAP ERP, SAP Customer Relationship Management, and SAP NetWeaver Master Data Management

Figure 4.3 shows common uses of Data Services with the SAP Business Suite. One common use of Data Services with the SAP Business Suite is the deep integration with Business Address Services (BAS). BAS is an SAP NetWeaver capability embedded in the ABAP application server. BAS provides flexible dialog integration for standard functions such as creating, changing, displaying, and finding addresses. It is a reusable component across the SAP Business Suite that is used heavily in SAP ERP (such as SAP ERP Central Component [ECC] 6.0) and SAP CRM. With Data Services integration to BAS, as addresses are updated, the data quality capabilities in Data Services are used to correct the addresses and check for duplicates. Figure 4.3 shows Data Services integration with BAS.

 Data Services with SAP Applications

In Figure 4.3, notice that Data Services is used for data migration to the SAP Business Suite. SAP provides a robust data migration solution for mapping and validating source data against the SAP target system using Data Services. Data Services is used to migrate each application object (for example, materials, sales orders, cost centers, etc.). For each object, the data is cleansed, validated against the required configuration in the SAP target system, and loaded into the SAP system. Reports are provided using SAP BusinessObjects Web Intelligence so users can monitor, remediate, and govern the data migration project.

Additional integration with Data Services for SAP applications includes integration with SAP NetWeaver MDM and SAP MDG. SAP NetWeaver MDM uses Data Services for loading data as well as cleansing names and addresses, de-duplication and automatic consolidation, and data validation. Data Services provides realtime cleansing, matching, and consolidation activities for SAP MDG, providing re-usability between SAP NetWeaver MDM and SAP MDG. Figure 4.4 shows Data Services integration with SAP MDG.

 Data Services Integration with SAP MDG, SAP ERP, and SAP CRM

Next we will discuss how Data Services is used with SAP HANA, SAP NetWeaver BW, and the SAP BusinessObjects BI platform.

SAP Data Services for SAP HANA, SAP NetWeaver Business Warehouse, and the SAP BusinessObjects Business Intelligence Platform

A key strength of Data Services is its easy integration with data warehouses and databases for analytics. This is true for Data Services integration with SAP NetWeaver BW, SAP HANA, and the entire SAP BusinessObjects BI platform.

Data Services is the preferred tool for loading non-SAP data into SAP HANA. In fact, the data integration capabilities of Data Services are included with SAP HANA. Additionally, SAP is making major improvements for a seamless user interface between Data Services and SAP HANA. This is explained in more detail in Chapter 3 and Chapter 7. Figure 4.5 shows the integration of Data Services with SAP NetWeaver BW and SAP HANA.

 Data Services Integration with SAP NetWeaver BW and SAP HANA

Data Services can also be used for loading data into SAP NetWeaver BW. Existing SAP NetWeaver BW customers can now easily apply data quality transformations when loading data into SAP NetWeaver BW. They can use one tool to define all extraction, validation, and cleansing rules to load all data (SAP and non-SAP) into SAP NetWeaver BW. This is important to understand because everyone faces data quality issues, and you don’t want to re-implement your ETL jobs to add data quality. Non-SAP NetWeaver BW customers can get native access to the SAP Business Suite, including delta changes, without going through SAP NetWeaver BW.

In addition to integration with databases and data warehouses, Data Services also works natively with the SAP BusinessObjects BI platform. Data Services is used for the provisioning of data for reports, dashboards, ad hoc queries, OLAP (online analytical processing) analyses, and data exploration. Data Services provides access and integration of disparate data from virtually any data source, structured or unstructured. This data can be built up in a data warehouse or data mart to provide historical trending and analysis for more accurate decision making. Data Services also enables the understanding of information context (data lineage) to help you make more confident decisions. The integration with the SAP BusinessObjects BI platform is standard functionality that comes out-of-the-box.

4.1.3 SAP Data Services Integration with Non-SAP Applications

Data Services has a much longer history of integration with non-SAP applications than it does with SAP applications. It started as a pure ETL tool that moved data from any source to any source. It has grown into a full-scale data foundation that includes extraction, transformation, loading, data quality, and text data processing, with deep integration into SAP applications while maintaining the ability to integrate into any application. In your enterprise there could be few or many non- SAP systems where integration, data quality, and data validation are required. Data Services is well equipped for the requirement of moving and transforming data between diverse non-SAP systems. Figure 4.2, showed the connectivity options for Data Services, ranging from Microsoft Excel spreadsheets to Oracle applications to mainframe connectivity. Refer back to Figure 4.3: You can also see the reference to data migration, data synchronization, and data loading. In this book, we will focus on data migration to SAP target applications when we discuss data migration. When migrating to SAP applications, the target canonical data formats, field requirements, and so on, are delivered by SAP. However, Data Services can easily be used to migrate to a new home-grown application or some other niche application as well; in this case, however, you need to define the target structure.

Figure 4.3 also showed that external systems and human input are linked to the data quality capabilities in Data Services. This is very common for Data Services integration with non-SAP applications. One very specific example is the SAP Data Quality Management SDK (software development kit). SAP has many software partners who develop their own software but use part of SAP’s solution within their solution. The SAP Data Quality Management SDK provides developers with a lightweight integration method to integrate the robust capabilities of the data cleansing and validation capabilities directly in their own custom applications.

Non-SAP integration with Data Services includes loading third-party data in SAP HANA and SAP NetWeaver BW, as well as extracting data to go in other data marts, data warehouses (such as Sybase IQ), and applications.

Data Services’ native capabilities for dealing with both SAP and non-SAP data provide great flexibility so that Data Services can be embedded in SAP and non-SAP applications and used across the SAP family of solutions where data cleansing, validation, and integration are critical for the application.

4.1.4 Data Cleansing and Data Validation with SAP Data Services

SAP Data Quality Management is a key capability in Data Services and will be covered in detail in Chapter 8. Data quality capabilities include address cleansing, data standardization, data validation, data correction, data enrichment, and matching. Figure 4.6 shows an example of the entire data quality process.

 Example of the Data Quality Process

To begin, the input record is parsed into its component parts and standardized. Figure 4.6 shows the following example:

  • First Name: Bob
  • Last Name: oldstead
  • Address1: 175 Riivington Ave
  • Address2: Suite 2
  • City: Manhatten
  • State: new yourk
  • Zip Code: 10002

Errors in the record are corrected; for instance, “Oldstead” is capitalized, and the street name, city, and state are edited. After this step, the record looks like this:

  • First Name: Bob
  • Last Name: Oldstead
  • Address1: 175 Rivington Ave
  • Address2: Suite 2
  • City: Manhattan
  • State: New York
  • Zip Code: 10002

Data Services then searches for a matching record in the target application and finds the following match:

  • Name: Robert E. Oldstead
  • City, State: Manhattan, NY
  • Zip Code: 10002
  • Email: [email protected]
  • Phone: 847 442-5555

The two records are consolidated:

  • First Name: Robert
  • Last Name: Oldstead
  • Address1: 175 Rivington Ave
  • Address2: Suite 2
  • City: Manhattan
  • State: New York
  • Zip Code: 10002
  • Phone: (847) 442-5555
  • Email: [email protected]

Finally, the record is enhanced with a normalized zip code and geographical coordinates to provide a complete record:

  • First Name: Robert
  • Last Name: Oldstead
  • Address1: 175 Rivington Ave
  • Address2: Suite 2
  • City: Manhattan
  • State: New York
  • Zip Code: 10002-2517
  • Longitude: 40.7325525
  • Latitude: -74.004970
  • Phone: (847) 442-5555
  • Email: [email protected]

While this example focused on cleansing a customer record, the data quality process can be applied to business partners, material products, services, and many other types of data that need parsing, standardization, and data cleansing.


Dig Deeper on SAP data management