Lawrence Livermore National Laboratory is not just a test bed for advanced research into everything from nuclear...
weapons to climate tracking -- it's also the testing ground for a state-of-the-art storage system. Today the Livermore, Calif., lab finds itself in a beta of a radical new approach to data storage and access, an approach the lab needs in order to conduct its research. If the test succeeds, it will not just introduce new file system storage innovations but will perhaps also confirm that Linux is a viable supercomputer platform choice.
Livermore has teamed up with San Jose, Calif.-based BlueArc Corp. to tackle a persistent challenge that has had no visible solution until now. NFS and NTFS are ubiquitous file system access protocols suited to most mainstream enterprise needs but which are inadequate for the demands of a research organization. Because the lab builds simulation software to test everything from the effectiveness of a nuclear missile to climatic change patterns, it needs computationally intensive applications to store gigantic volumes of data that require quick and easy access.
To help solve these issues the two are turning to a project dubbed "Lustre". Lustre involves the deployment of a collection of 700 Linux nodes, each equipped with dual Intel Pentium 4 processors. The clusters include 64 of BlueArc's network-attached storage (NAS) devices, each carrying 24 disk drives.
The key is an access driver built by BlueArc and called OST-Object Storage Target. The idea is to assign a numbered tag to discrete collections of data that can be retrieved regardless of where in the environment the data is stored or how it is organized.
Dr. Geoff Barrall, CTO, BlueArc Corporation, uses an analogy involving a person checking his hat or coat. When the patron retrieves his belongings, the employee returns the items, with no concern as to what the items actually are. In the same way, OST tags information that needs to be stored and assigns a unique identifier to it. The information could be text or images. "Machines always remember everything. So we give the machine a number (the ticket). We can have a very efficient way of storing data that doesn't require directories," says Barrall, adding that the ticket can be viewed as a metadata-defining data structure. The user turns in the ticket to retrieve data perhaps distributed throughout a collection of file servers.
Given the scope and intensity of data creation, access and storage at the lab, officials there needed to ensure that scientist productivity is not compromised by an ill-equipped infrastructure. "Typically, in the past, every time we brought in a new cluster it had associated with it its own disk resources, and the way we communicated between the computational engine and the visualization engine where we have long-term storage of data was through FTP," says Mark Seager, a manager in the advanced simulation and computing effort at Livermore. "That has proven cumbersome and very slow. We ended up making multiple copies of the data."
The lab is moving quickly from giga scale data sets to terabyte data sets, and petabyte data sets are planned for 2004. Seager saw immediately that the lab needed a new paradigm in file access.
"The network really is the computer, and the file system lives on the network and the computing resources plug into that file system in a scalable way. That's very difficult to do today without a scalable file system," says Seager. The lab wants to scale the amount of storage and scale the amount of bandwidth to an application. The effective throughput of OST-based access is 128G bits per second, so it becomes abundantly clear that the network could end up being the new choke point. The technology allows for better bandwidth management as well. "Lustre allows us to scale across the enterprise parallel bandwidth to a single application," he says.
While Livermore's circumstances might be unique, BlueArc insists there is a larger market for its file access and storage technology tied to a supercomputer consisting of islands of Linux boxes. Barrall says that any industry that involves a lot of mathematical computation is ripe for a more efficient file access system. Such industries include life sciences, oil and gas exploration, and movie post-production.
"What's great for the enterprise is if they can have a global file system so that all the servers look the same as each other," says Barrall. "When the user accesses the data they don't care which server they go to. They just get the fast access to the data."
This would be good news for companies considering a big commitment to Linux as an alternative to Windows or proprietary Unix platforms.
For more information on Livermore labs, take a look at its Web site.
For more on BlueArc, find information here.
More information on this topic:
- Linux inroads into storage continue
- Q&A: Storage big guns back standards, Linux
- Best Web Links: Linux