Data mining is headed for the masses.
Yes, the struggle continues to move data from the back end, where the hardcore techies figure algorithms, to the front end, where most math skills are limited to factoring sales commissions. Some major vendors are responding, though, with efforts to push predictive analytics into the operational side of the house.
The biggest indicator that data mining tools are going mainstream is the interest of RDBMS vendors to incorporate data mining algorithms directly into the database management system, according to Andrew Braunberg, data mining analyst at Current Analysis, based in Sterling, VA. "The database vendors want to enable some level of model building directly against the database, saving the IO requirements,'' Braunberg said. ''They will not put pure-play analytics players out of business anytime soon,'' he added. "But it is an important trend to watch.'' The most recent releases of SQL Server, Teradata and Oracle have all included data mining functions. Direct database access is a departure from the traditional route, where data mining engines pull the data and build models on a dedicated server. SAS, for example, extracts data from a company's database, and essentially re-hosts the information so it can be mined.
At least one vendor—PolyVista— is showing up on some radars as being far ahead of the pack when it comes to direct database mining and improved user interaction. One way to explain what PolyVista does is to consider their integrated knowledge discovery tools in three parts: OLAP (the slice-and-dice tool); data mining (extraction and analysis); and visualization. According to industry specialists, PolyVista currently has data mining intellectual property unique to them.
"What PolyVista does that no one else does is allow the asking of a global question to all levels of a cube at one time,'' explained Barry Grushkin, CTO of Machine Intelligence Company, a provider of analytic solutions, and an Intelligent Enterprise columnist. (A cube is an OLAP term that refers to the model created by the intersection of numerous dimensions).
''The point is that optimization, or extreme value questions, can be asked of all possible levels at the same time and a search will be done,'' Grushkin said. Taking advantage of XML, PolyVista offers up to 60 dimensions—such as geographic area, product line and demographics—and hundreds of combinations.
What's most important, said Grushkin, is that ''PolyVista allows the search for the right aggregates to solve a problem. It will tell you which regions deserve investment and which do not, given a certain cost benefit, or risk return. It functions as a way to draw your attention to significant differences that deserve further investigation.'' Grushkin did note, though, that other systems, such as Oracle and Teradata, have ''far more industrial strength'' and are designed for mega data applications.
A key customer for PolyVista is Compaq, where technical fellow John Landry has become enamored with the small company. Compaq runs PolyVista on SQL, and is a SAP customer. Before choosing PolyVista, Landry said he went comparison shopping. At first, he considered the obvious: SAS. ''That turned out to be way too difficult,'' Landry said. ''It was going to require too many software-type programmers. SAS was not going to relieve us from the burden of technical resources, and the turnaround time was slower (than with PolyVista), he said. ''And, it wasn't interactive enough for us.''
PolyVista Technical Services vice president Bob Ford said a big difference between SAS and PolyVista is their target audiences. ''We are targeting the broader base of knowledge workers and business analysts with our OLAP/Discovery mining techniques whereas SAS primarily targets the super-user analysts and degreed statisticians.''Now, Landry said, Compaq is turning early warning signals into action items in their supply chain.''We didn't' even dream of that before. We have figured out how to set up OLAP to do data quality, and then we put PolyVista on top of it,'' he said. ''One of our biggest lessons was that the best way to use OLAP and PolyVista is to build yourself a cube and just go at it.''
Feedback on this story? Send your comments to News Editor Ellen O'Brien