The massive amounts of corporate data that companies can collect today, plus business mandates to get faster and less expensive answers to database mining queries, have SAP looking into the possibility of offering its customers a column-based, in-memory database.
Neither columnar databases nor in-memory applications are new. Column-based databases have been around since 1972. In-memory applications date back to the 1960s. But SAP is blending these two old technologies and coming up with something new and intriguing.
Columnar databases have several advantages over relational databases. For one, users can extract information much faster from a columnar database than a traditional DBMS. Because columnar databases can store a given amount of raw data in much less space, the area that has to be accessed is much smaller than in a database with row-based storage.
Column-based DBMSs are also self-indexed, eliminating the overhead for indexing, which can double the size of a RDBMS. In addition, column-based DBMSs are simple to update, because column-based database updates are done as insert-only operations, i.e., by simply adding fields to strings.
Despite these advantages, column-based databases have essentially been niche products, according to Jacob Gsoedl. Gsoedl is director of business systems at Cavium Networks, a Mountain View, Calif., provider of semiconductor products that enable intelligent processing, but he also writes about databases and storage. He says that companies such as Sybase Corp. and Vertica Systems have sold a few thousand units of their column-oriented DBMSs, Sybase IQ and Vertica Analytic, respectively.
One reason that market share has eluded column-based databases is because "they're not a general-purpose database," said Nigel Pendse, a business intelligence and OLAP analyst and the editor of The OLAP Report. On the other hand, he said, "a relational database does a large number of things, although it doesn't do any of them very well. It's mediocre at all of them."
In-memory databases eliminate disk access
In-memory technology is increasingly being applied to BI, data warehouses and DBMSs. In-memory databases, also known as main memory databases, have one big advantage over traditional "on-disk" DBMSs: speed. In-memory databases eliminate disk access by storing data in main memory, so it can be queried in the database itself. Traditional DBMSs cache frequently requested data in memory for faster access, but automatically write data updates, insertions and deletes through to the hard disk.
Three recent developments are giving in-memory databases a boost: 64-bit computing, multi-core servers and the plunging cost of memory chips. Although 64-bit computing allows access to far more RAM than the 2GB or 4GB available on 32-bit systems, the real breakthrough is the dropping price of RAM.
"Cheap memory and cheap hardware has made it orders of magnitude less expensive – and much quicker -- to mine these new in-memory, column-based databases," said Josh Greenbaum, principal with Enterprise Applications Consulting, which focuses on how enterprise software intersects with business requirements. "Before cheap RAM, it cost $1 million to run 1TB of data in a relational database. Today, that same 1TB running in a columnar database costs $10,000."
One of the early products in the current wave of in-memory software offerings was QlikView from BI vendor QlikTech. Microsoft Excel has an in-memory structure. MicroStrategy9 includes standard in-memory ROLAP, which operates directly on the main server, minimizing disk-access delays and improving query-response times.
SAP's NetWeaver Business Warehouse Accelerator is an in-memory BI product. Oracle offers an in-memory database called TimesTen for Oracle DBMS customers who want accelerated data analysis. And in-memory capability is expected to be part of Microsoft's upcoming Project Gemini release.
Despite all their potential advantages, column-based, in-memory databases have heretofore been considered appropriate only for analytic applications, as a replacement for a data warehouse.
But during his keynote speech at SAP's 2009 Sapphire conference, Hasso Plattner, chairman of the supervisory board and co-founder of SAP, demonstrated a $6,000 rack-based blade server system with 144 GB of RAM running SAP's Explorer analytical interface powered by an in-memory, column-based database. In a recent column, Greenbaum wrote, "That's enough RAM to load up a column-based database that would require two terabytes as a relational database."
During the Sapphire demo, Explorer was able to process 280 million records in less than a second. Plattner suggested that this same combination of technology could be used to power the core transactional system at the heart of SAP's Business Suite, and vowed that SAP "can do the same thing to run the transactions of the back office." But can it?
"It's a much bigger challenge to use [a columnar database] as a transactional datastore for OLTP applications, although companies like Vertica have added mechanisms such as a separate read write store to their products in an attempt to close the write performance gap to that of contemporary relational databases," Gsoedl said.
Pendse pointed out that Oracle's TimesTen is "about ten times quicker than a normal DBMS, and no one's using that for transactions."
The volatility issue of in-memory DBMSs
Part of the problem with using in-memory databases for transactional applications is that unlike a disk-based DBMS, in-memory databases are volatile. It's one thing to lose an analytic query, but another to lose a transaction. "To run a transactional system in a serious commercial business, every transaction must be committed to disk, or at least non-volatile memory," Pendse said. "If you pull the plug on a computer with an in-memory database, you've lost everything."
"And you will have a crash," he added. "Even the most secure data center computer goes down. And it's not just the hardware you have to worry about. If you have a device driver that's faulty, your computer will crash."
If SAP were to develop Plattner's concept as a product, "that would fundamentally change the economics and the politics of the SAP market," Greenbaum said. But user acceptance might be the stumbling block. "This would require users to throw out their Oracle databases," he said. "There are a lot of people who have set their careers on Oracle. It's going to be hard to get companies to back off Oracle."
Perhaps nearly impossible. "There's a reason why most sites run software that's years behind the latest release, particularly things like transaction systems," Pendse said. "If it's working well, they don't update the system. People are running on DBMSs that are years out of date, that are buggy, that are not patched. The bottom line is that if have a version of the software that works, they have almost no motivation to update."
Hard for SAP users to digest all their data
Plattner noted that the average SAP customer has seven to 10 years' worth of data on disk. But, he noted, "how we digest that data is slow, and it's getting slower because of the increased sizes of databases." What he wants is the ability to extract information from databases instantaneously. "For a company the size of SAP, with $15 billion in revenue, I want to answer every question you have with Excel or [SAP] Business Explorer in less than one second."
Since Sapphire, SAP has been mum about when an in-memory columnar database for transactional processes might actually be released to customers. "SAP isn't ready to release details about its future plans," Jacob Klein, SAP vice president, solution management, data and analytic engines, wrote in an email to SearchSAPcom.
However, Klein did say that SAP's current in-memory column-based database is designed to serve as an "acceleration layer for [BI] scenarios running on top of an existing data warehouse." He said that in time SAP will extend these capabilities so that the in-memory data layer can serve as a primary persistence layer, and that this would involve "building new capabilities into the current TREX C++-based kernel."
According to Klein, at that point, transferring data from older SAP ERP or database applications, as well as migration of the actual database tables, to the new column-based DBMS would be seamless. "The next-generation data layer," he said, "will leverage existing data access APIs, so ERP applications, and their extensions, can be deployed on top of the new layer without disruption to the application."
About the author: Rich Friedman has covered various aspects of IT for more than 25 years. He can be reached at firstname.lastname@example.org.
Additional reporting for this story was done by Peter Bochner, site editor, SearchSAP.com.