A Boeing jet engine generates 10 terabytes of data for every 30 minutes of operation. That’s one terabyte, or the digital equivalent of the entire print collection of the US Library of Congress. Multiply that by the number of jet engines that propel more than 87,000 flights over the United States every day, and you’ve got a lot of data.
While some speculate that big data won’t translate into analytic therapy, it’s still inextricably linked to the Internet of Things (IoT). If the predictions of companies like Cisco are correct, by 2020 we’ll have about 50 billion internet-enabled “things” constantly tweeting, and while they may not all be generating data on jet engine clips, it’s definitely constitute a data tsunami.
So, what are we going to do with all this data? The harvested information has to have value in some way or it would not have been harvested in the first place. On the other hand, not all data is created equal. The vast majority of data collected may be archived and forgotten until it is occasionally reported as needed, accessed once or twice, and then forgotten again. According to research by Enterprise Strategy Group, this “infrequently accessed” information (also known as Tier 3 or “cold” data) accounts for 80 percent of recorded data. And, while the average cost of memory has dropped dramatically over the years, at the scale of big data, the cost of a few cents per gigabyte can add up quickly.
The obvious answer to this information overload is cold data storage alternatives, which are cheaper and larger than data used for regular access. As a result, companies typically opt for one of two solutions: legacy tape libraries, or more recently the cloud.
Tape libraries have been in use for decades and are ideal for storing large amounts of data at a fraction of the cost. They can also be considered “green” because the tape drive only spins when in use (which saves power), and being located inside allows relatively quick access to cold data. However, tape libraries also have some disadvantages, including high up-front costs for medium to large storage systems, difficult remote access, the potential for tape degradation, and the vulnerability of maintaining archives at a single on-site location. Instead of “data tsunami” think “data” and “tsunami”).
Companies exploring the possibilities of cloud storage have made up for some of the tape library’s misgivings by offering unlimited storage space, low cost, and remote capacity that protects against theft, natural disasters, and more. However, the main feature of cloud solutions is that retrieving data is often time-consuming and can become expensive, depending on the amount of data retrieved. For example, a service like Amazon Glacier takes at least 3-5 hours to retrieve a dataset (downloadable within 24 hours), and charges by gigabyte per month if more than 5% of a given data is retrieved .
The intersection of the two looks set to improve and include hardware and software elements that optimize access while keeping the cost per GB of storage as low as possible.
Refrigeration: Big Data on Ice
Software-defined storage (SDS) is a new term, but from a technical perspective, it is similar to software-defined networking (SDN) in that the hardware logic is abstracted into a software layer that manages the storage infrastructure. Essentially, this means that storage functions or services such as deduplication, replication, snapshots, and thin provisioning can be virtualized, enabling converged storage architectures that run on commodity hardware. As a result, cost-effective storage strategies can be implemented that combine the accessibility and efficiency of tape libraries with the scalability and remote capabilities of the cloud.
For example, RGS Cold Storage, powered by Storiant, is a local storage solution for Tier 3 data based on off-the-shelf hardware from RGS, a business unit of Avnet, Inc. The rack-level appliance is fully integrated with 60 HDD bays providing petabyte-scale capacity and leverages OpenZFS-based Storiant software (formerly SageCloud) to interface with the private cloud. Storiant data management software also improves access performance, reducing the retrieval time of stagnant data to 30 seconds, while allowing HDDs to slow down when not in use to significantly reduce power consumption. The scalable RGS cold storage architecture costs $0.01 per GB of storage per month and is cost-optimized for most big data deployments.
While storage management technologies such as SDS help lay the foundation for valuable business analysis, they also ensure that financial and computing resources are available for regularly executed “Tier 1” data. In an environment where too much information can actually be a bad thing, it’s important to keep some of that information in a deep freeze.
Reviewing Editor: Guo Ting