SAP announced today that it is offering new packages that integrate Hadoop software with its analytics and database technologies, including its much-touted HANA in-memory platform and Sybase IQ server, a columnar database and enterprise data warehouse.
SAP also announced packages integrating Hadoop and SAP Data Integrator and across the SAP BusinessObjects business intelligence (BI) suite.
The new packages, being offered through partners including Cloudera, Hitachi Data Systems, Hortonworks, HP and IBM, give customers a comprehensive data warehousing solution for real-time analysis on large data sets coming from various sources, according to SAP.
Hadoop is a free, Java-based programming framework used to process large data sets over a distributed computing environment, using multiple machines, according to SAP. It is part of the Apache project sponsored by the Apache Software Foundation.
The packages enable SAP customers to integrate Hadoop into their existing BI and data warehousing environments in a number of different ways, SAP said in a statement.
For more on SAP and big data:
Read what was on the minds of SAP TechEd 2012 attendees
Learn why the NBA is a "fan" of SAP HANA
Organizations can use Data Integrator to read data from Hadoop Distributed File Systems (HDFS) or Hive databases and load relevant data into HANA or Sybase IQ, according to SAP. That also means BI users can continue to use their existing reporting and analytics tools, SAP said.
Customers can also run queries across Sybase IQ and Hadoop environments or Hadoop MapReduce jobs across Sybase IQ environments. The company added that BusinessObjects BI users can query Hive databases, which gives them the ability to directly explore Hadoop environments.
SAP is supporting the integration between Hadoop and technologies like HANA, saying the two are complementary, according to David Yonker, who heads SAP's “big data” strategy.
For one, Hadoop is effective at scanning or processing large amounts of data at once, Yonker said. "If you want to scan an entire petabyte of data, it does it incredibly well. Does it do it in real time? No, but it does it incredibly well."
In comparison, an in-memory database like HANA is used not to read all the data, but a subset that's relevant to the specific query, according to Yonker.
"And so, there's a use for both of them. They fit together very nicely," Yonker said.
Yonker cited the Tokyo-based Mitsui Knowledge Industry, a bioinformatics company currently running a proof-of-concept project that combines Hadoop and SAP HANA to help determine treatment methods for cancer, since different people respond differently based on their DNA.
The process first involves using Hadoop to analyze the patient's data, and then compare it to a normal strand of data, which can take up to two days because of the amount of data involved.
"You've got to read all the [DNA] data [for both the healthy and ill patient]. You can't just read a subset of the data." With Hadoop, the company has been able to cut processing time down to 20 minutes and expects to reduce it to half that, he said.
The next step is using HANA to compare those differences to other patients' DNA samples. "In this step you're comparing those anomalies to 10-20 million DNA samples. It's highly iterative. You're essentially running 10-20 million queries against the system."
This is only the most recent move to Hadoop systems for SAP. In its July announcement it was making BusinessObjects 4.0, Feature Pack 3 generally available, it also announced the software supported HiveQL, a simple SQL-like query language used with Hadoop.
“Now, within that same exact information design tool you’ve been using all along, we support HiveQL, so you can start to bring in, mash up, information from Hadoop to your BI environment,” Jason Rose, vice president of business intelligence marketing for SAP, said at the time.