Sergey Nivens - Fotolia

SAP HANA big data strategy leans heavily on open source Hadoop tools

Apache Spark analytics framework underpins HANA Vora, the front-end query tool SAP users will need to link HANA technology to data in the Hadoop distributed file system.

The amount of digital data created annually is exploding. To process the massive volume of data generated by enterprise applications, as well as the information flowing in from a variety of external sources, organizations need a broad range of analytical capabilities. Some are now turning to the HANA big data tools offered by SAP.

"Big data is only going to get bigger and richer, as well as originate and flow from an increasing number of sources, both internal and external," according to a report from Forrester Research titled "Ultra-Fast Data Access Is The Key To Unleashing Full Big Data Potential."

Enterprises need a "modern data analytics strategy that provides a ubiquitous, real-time data access layer to all relevant data from all different sources," the report noted.

To meet the needs of these enterprises, SAP is continuing to invest in providing business users with access to advanced analytics tools that use its HANA in-memory, column-oriented, relational database management system, said Anne Moxie, senior analyst at Boston-based Nucleus Research.

Werner Hopf, CEO of Dolphin Enterprise Solutions Corp., agreed with Moxie's assessment. Dolphin is an SAP partner based in Morgan Hill, Calif.

"SAP invested a ton of development over the past two or three years to extend HANA capabilities, so it can also be used as the underlying database for transaction processing systems," he said.

For example, last September, SAP announced HANA Vora, a new in-memory query engine for Hadoop that addresses the challenges companies face as they manage distributed big data, Moxie said.

HANA on its own, however, is not well suited to very large data volumes because it's not cost-effective to put large amounts of information in memory, said John Appleby, U.S. general manager at London-based global consultancy and SAP partner Bluefin Solutions, a Mindtree company. "We're pleased that SAP has embraced Hadoop."

'First-class citizen' HANA Vora is main HANA big data tool

HANA Vora, which was made generally available in March, allows companies to analyze data stored in Hadoop, enterprise systems and other distributed data sources, according to SAP. HANA Vora makes use of and extends the Apache Spark execution framework to provide enriched interactive analytics on enterprise and Hadoop data, helping companies in various industries glean more insight from their big data.

To make HANA more capable for big-data projects, SAP allows for a close relationship with Hadoop data.
Anne Moxiesenior analyst, Nucleus Research

"To make HANA more capable for big-data projects, SAP allows for a close relationship with Hadoop data," Moxie said. "Connecting different data sources and combining the data that your organization has with Hadoop data … allows for a more complete view. That enables data scientists to have access to all of the data to draw the correct assumptions."

CenterPoint Energy, an electric and natural-gas utility based in Houston, is one of the first SAP users to implement the HANA big data platform and HANA Vora to bring together its highly distributed enterprise data framework.

Hadoop will enable CenterPoint Energy to cut the information technology costs associated with increasing big-data storage requirements, while HANA Vora will allow for more informed business decisions through analytics, SAP said.

CenterPoint Energy, which delivers power to more than 2.3 million consumers in six states, collects electronic meter data every 15 minutes for energy use reporting -- and that means hefty data-storage costs.

In six weeks, SAP and CenterPoint Energy built a testing environment that processed over 5 billion data records with Hadoop, HANA and HANA Vora, according to SAP. After that successful test deployment, CenterPoint Energy opted to implement and standardize on the HANA big data platforms.

"Our initial analysis proved that SAP HANA paired with SAP HANA Vora is the right solution for us moving forward operationally," said Gary Hayes, CIO and senior vice president of CenterPoint Energy, in a statement.

HANA Vora has a strong ability to handle structured as well as transactional data running in the HANA enterprise computing platform, said Irfan Khan, CTO of SAP's global customer operations.

"But by deploying Vora on a cluster of machines that are running the Spark foundation, and, at the same time, sitting on top of the storage foundation of, say, Hadoop, we can push a variety of different types of work directly from HANA as a computing platform," Khan said. The result is "much more of a business-coherent view of what's going on."

HANA Vora sits as a "first-class citizen" inside of the Spark foundation, allowing SAP to either push down very specific types of analytical workloads into Spark storage, or bring back contextual information into the transactional core to provide much more meaningful insight to customers, Khan said.

From a big-data analytics perspective, the main challenge with in-memory systems such as HANA is the cost-to-value ratio, said Dolphin's Hopf. Main memory is expensive, and enterprises quickly reach a data volume where the cost simply outweighs the benefits of the analyses they can do for various scenarios.

That's why adding Hadoop support was essential in making HANA big data practical, according to Hopf. Incorporating some of the HANA database technology in the HANA Vora analytics front end and having it sit on top of Hadoop and Spark "allows customers to run high-performance analytics on subsets of data that can be stored in fairly massive Hadoop data lakes," he said.

Hadoop and Spark to play essential role in HANA big data projects

According to Moxie, combining HANA Vora with Hadoop and Spark is a big step in giving businesses full access to all of their data. As the internet of things (IoT) grows, Spark will be effective for the distributive processing and extracting of data sets needed for that sort of work.

"Spark is really critical for IoT applications and analyzing in that sense, but with HANA Vora you're able to facilitate a lot of those new IoT initiatives because companies can analyze their data a little bit more easily," she said.

For example, an SAP customer in the agriculture industry is using sensor data in the field as well as satellite images to predict sugar cane yields. Both IoT sensor data and satellite images are stored in Hadoop and analyzed by HANA Vora in conjunction with the HANA platform, which is used for predictive analytics to optimize water and fertilizer and achieve better yields.

Appleby said Bluefin's customers are keenly interested in using HANA Vora for information lifecycle management. In such scenarios, the companies are using SAP ERP or something similar and want to put read-only information they need for legal or business purposes -- but which contains data that is no longer sensitive from a performance perspective -- into "cold" storage. He said he expects SAP to clarify its roadmap for information lifecycle management in the September timeframe.

According to Khan, SAP's HANA big data strategy will focus on becoming much more integrated with the open source movement, using a first-class computing capability in HANA and HANA Vora to bring more business capabilities to Spark. "This in itself is a very meaningful sort of activity for our customers because no customer wants to operate on islands of data -- they want to have that coherent view, and this is exactly what our focus has been," he said.

Next Steps

Determine HANA memory requirements

Get a handbook on SAP third-party analytics

Understand SAP cloud analytics

Dig Deeper on SAP business intelligence