Sergey Nivens - Fotolia

SAP boosts data integration with SAP Data Hub and Vora

SAP Data Hub and Vora are both data integration tools, but Data Hub has a broad mission to manage data from different sources, while Vora focuses specifically on big data lakes.

SAP wants to cover as many data integration bases as possible with the recent release of the new SAP Data Hub and...

updates to SAP Vora.

SAP Data Hub and Vora both attempt to solve similar data integration challenges and provide businesses a means to extract value out of the reams of data they are collecting, but the specific goals for each product are quite different, according to Ken Tsai, SAP's global vice president and head of cloud platform and data management product marketing.

The recently released Data Hub has the much broader mission, because it is intended to help organizations manage complex data landscapes by building pipelines between a variety of data sources, Tsai said. SAP Vora, which has been on the scene for two years, provides a way to get at data stored in Hadoop data lakes via an Apache Spark framework. SAP Data Hub uses Vora underneath the covers, but the products are not the same.

Similar products, different aims

"SAP Data Hub has a much bigger purpose in terms of building up data flow in order to ensure a more efficient data operation, rather than just doing computing, which SAP Vora aims to do," Tsai said. "Both are very complementary, and we're seeing good results so far from SAP Vora. And the idea for SAP Data Hub wouldn't have come about if we hadn't been investing in building computing solutions in the Hadoop big data space and seeing what customers needed beyond the vast computing engine directly into Hadoop."

SAP Data Hub is important now because locating data centrally is not feasible anymore, Tsai said, and it centralizes the data management while keeping the data in the source repositories.

"We're targeting not only the developers, but also the enterprise architects, data scientists [and] business analysts," he said. The IT department today is evolving into multiple zones -- the application zone, the data warehouse zone, the data lake zone -- and each one of them needs to participate in a kind of data flow, as data has to flow from one zone to the other."

Pharmaceutical manufacturer McKesson Corp. is one SAP customer that has deployed SAP Data Hub to consolidate data across multiple systems to derive one single source of truth from that data.

"Our work is about helping our customers improve patient care and driving efficiencies across the healthcare value chain," Adam Fecadu, chief information architect at McKesson, based in St. Paul, Minn., said in a press release. "It starts with relentless focus on helping our customers and partners solve their toughest challenges. With numerous data sources, types and IT landscapes, we need a unified data solution across departments and business units to produce actionable insights and continuous innovation. SAP Data Hub is aligned with this vision."

SAP Vora dives into the big data lake

Vora is designed to allow businesses to process big data from Hadoop data lakes and derive business value from the data, Tsai said. SAP Vora 2.0 has been rearchitected using Kubernetes containers to improve the scalability and reduce the complexity in deployment.

CenterPoint Energy, an electricity and natural gas utility based in Houston, is using SAP Vora along with SAP HANA to manage and analyze data that it gets from smart meters. Its application uses HANA to track and analyze the health of its infrastructure and grid in real time and moves older data into Hadoop. Vora is used to process and analyze the historical data in Hadoop to determine usage patterns and trends, and this data can be combined with the current HANA data, allowing insights that result in more proactive energy delivery and pricing, Tsai said.

Processing data where it lives

Data Hub is a good direction for SAP, because it allows users to work on data where it lives without having to move it, according to Stewart Bond, research director of data integration software at IDC.

Data is getting too big to move around anymore, and people don't want to move the data around.
Stewart Bondresearch director of data integration software, IDC

"It's kind of a departure from where SAP's been in the past, where you have to pull the data into the SAP environment to be able to work with it. But it's also similar to what we're seeing in the rest of the market," Bond said. "Data is getting too big to move around anymore, and people don't want to move the data around. And the data that is getting moved is a subset. Organizations that use the big data repositories like Hadoop preprocess data before it ends up going into an enterprise data warehouse. And in the preprocessing, things get filtered out, things get cleansed, things get put into smaller shapes -- data sets that are smaller than what we have in big data."

SAP Vora is similar, but tries to solve a different problem, Bond explained.

"Vora has been about plugging into the Hadoop big data ecosystem, whereas Data Hub is more of a broader data play, with more of a variety of data sources that they want to connect to and capabilities for working with data in motion or data pipelining," Bond said. "They're leveraging the investment that they have in Vora by making that technology and those capabilities available in Data Hub for those times when Data Hub solutions need to tap into a Hadoop ecosystem to do something with the data. But I think they are slightly different problem spaces, and they might be going after a slightly different audience."

These tools are important because they are creating more opportunities for businesses to solve problems with technology, said Ezra Gottheil, senior analyst with Technology Business Research, a research and analysis firm based in Hampton, N.H. This is a confluence of the core technologies becoming more manageable, the convenience of the cloud as a platform and the next-generation technologies like big data and internet of things.

"There are more different-shaped Lego blocks that are available for those who are creating applications to build with, so everybody is extremely eager to get those tools in the hands of not just developers, but business people," Gottheil said. "SAP makes applications, and they make some pretty specialized ones as well, but they can only begin to address all the applications that are needed. So, they'll have to come from customers and third parties, too, but putting out the tools and promoting them is the way they get at that market."

Need to keep up with the competition

SAP faces a steep competition curve in the market, Bond said, and needs to have a fully developed product to keep up. Oracle, for example, has the Data Integration Platform Cloud, which brings data integration, data quality and data governance to a cloud platform.

"Data Hub is doing something very similar, so the challenge that they're up against is that they're talking about going to market with data governance, data pipeline and data integration, but there are still parts of that three-pronged story that need to be developed," Bond said. "The data governance piece was demonstrated in the launch, but what was demonstrated was more technical-level data governance and really wasn't that business metadata. So, it's going to be critical that when they go to market that they have a truly competitive product, because their competitors are going to be there as quickly as they are."

Next Steps

Read how SAP Data Services can be used alone or with other SAP products to provide data integration and more.

Find out how SAP EIM can improve data quality, big data support and SAP HANA integration.

SAP Geographical Enablement Framework extends the integration of geospatial data from GIS into HANA applications.

Dig Deeper on SAP data management