Photo-K - Fotolia

HANA Vora user friendliness calls for in-house Hadoop, Spark expertise

Experts discuss such considerations as HANA deployment options and technical differences and offer other tips and best practices for a successful HANA big data project.

Big data is becoming more and more important as companies look to glean insight from all their structured and unstructured data to make better decisions and to be more competitive.

To help them analyze both internal and external data and gain real-time insights, some SAP users are turning to HANA, the vendor's in-memory platform for processing high volumes of data in real time, or HANA Vora, a connector between HANA and Hadoop, the open source, Java-based programming framework for large data sets. However, the HANA platform does not require HANA Vora to run or vice versa.

HANA Vora is an in-memory query engine designed to make big data from Hadoop more accessible and usable for enterprises. It plugs into the Apache Spark framework, enabling companies to run analytics on data that's stored in Hadoop.

Deploying and running HANA or HANA Vora, though, can be challenging. In that context, experts have offered some best practices to help businesses prepare to implement and use these tools for big data projects.

Start with the basic deployment decision

To start, SAP users have the option of deploying HANA in the cloud or on premises, said Hyoun Park, chief research officer at Boston-based Blue Hill Research.

Deciding which implementation path is right for each user depends on how they are currently treating their enterprise data.

"If an SAP client believed that it needed to keep its data secure, or could not move it out of the country for any reason, or did not have strong cloud security and governance capabilities, then they would typically keep their HANA instance and their SAP data on premises," Park said.

However, many companies are much more comfortable with the cloud today and have already put the data and applications necessary to run their businesses in the cloud. Therefore, the choice of deploying HANA on premises or in the cloud is more of a preference based on IT standards and governance, as well as risk management and compliance, according to Park.

Additionally, it's essential for organizations to know how they will be using HANA before attempting to choose whether to deploy in the cloud or on premises, as different use cases may require -- or be optimized on -- different platforms, said Kiersten Williams, a business analyst at Panorama Consulting Solutions based in Denver.

If companies are transitioning from older SAP systems to SAP HANA, they also have to understand that not all data is going to move from their traditional platforms onto HANA, said Noel Yuhanna, an analyst at Forrester Research, which is based in Cambridge, Mass.

"HANA is a very optimized platform, especially for memory, so you can't just move everything from what you have and bring the kitchen sink into the HANA platform," Yuhanna said.

Because HANA is a very expensive platform, it doesn't make sense for enterprises to bring over data that they're not going to use, according to Yuhanna. Therefore, organizations have to plan and understand the data that's going to sit on the HANA platform.

"If you don't know your data in HANA, you may be wasting resources, and it may be very costly and maybe not even worth running on HANA," Yuhanna said. "You also have to know what kinds of queries you anticipate asking."

Working with HANA is like working with a typical database, and the necessary skills are based on what database administrators have been working with for a while, Park said.

Selecting a partner to help with the implementation is also crucial, Williams said. Companies should choose their partners based on fit with their industries, the intended focuses of their projects and recommendations from previous customers.

Depending on a company's use case, its partner and its position on capital expense, the deployment options will be different, Williams said.

It's also important for businesses to dedicate the right internal teams to the project and to determine their technical and organizational readiness for transitioning to HANA, assessing those factors at several points throughout their projects, she added.

Another key consideration is memory sizing: planning for the memory required to support applications which provides the basis for the hardware recommendations.

"SAP has built the Quick Sizer tool to aid in this and it provides a much better estimate than the rule-of-thumb calculation," Williams said. "Also, don't just size the current system, size for future needs as well."

Training time and cost should also be built into the original project budget to ensure success, she added.

HANA Vora calls for training, realignment of data and analytics teams

When it comes to deploying HANA Vora, organizations need to clearly define the requirements for their projects because the hardware recommendations, system landscapes and configurations depend on doing so, and it will directly affect the success of their implementations, according to Williams.

Yuhanna agreed with the need for organizations to define their user requirements, adding that they shouldn't integrate HANA with Hadoop and Spark just because a connector is available.

"I would recommend that the data architects who are building [on the] HANA Vora platform look at the use cases of what they're going to be achieving with the HANA Vora implementation," Yuhanna said. "Is it going to be internet of things, are they doing fraud detection, customer intelligence, customer analytics?"

Having an understanding of their data is also critically important for organizations looking to implement HANA Vora, said Shawn Brodersen, associate vice president for application services at HCL Technologies Ltd., a systems integrator based in Noida, India.

It's not so much the technology that becomes a barrier to success, but an organization's understanding of its own data and how it's going to manage it, he said.

"When you look at the HANA Vora-Hadoop landscape and you're trying to implement on that, you're still dealing with the concept of data tiering, hot data versus cold data and how you access that," Brodersen said. "There needs to be a good, integrated data lifecycle management strategy."

In addition, implementing HANA Vora requires that companies unite their big data teams with the traditional analytics and application teams that have been around their IT departments for decades, according to Park.

"Vora will require working with your big data team, often working with either a data scientist or someone who is conversant in Hadoop because Hadoop is a different skill than working with a traditional database," Park said.

With its recent $125 million acquisition of Altiscale, a provider of big data as a service based in Palo Alto, Calif., SAP will now have a preferred solution for providing Hadoop to its users, he said.

The deal will help SAP users who want to start using big data, since they can now work with a combination of Altiscale and HANA Vora to connect large amounts of streaming data to their existing historical data for greater insights, Park added.

"If companies are planning to use Vora, this is a good opportunity for database administrators to start learning how Apache Spark works because they will have to start learning how Hadoop-based data gets processed and analyzed," Park said.

Next Steps

Understand the SAP HANA big data options

Learn about SAP third-party analytics

Read about HANA Cloud Integration

Dig Deeper on SAP business intelligence