Intelligent Infrastructure

Mass data and next-gen workloads Intelligent Infrastructure

Big data requires new approaches to storage and archiving

Big data requires a smart approach to archiving

A lot of enterprises are betting on data analytics projects to improve decision-making processes, learn more about customers and meet changing business requirements. Unsurprisingly, the volume of information passing through data centers is rising rapidly as companies upgrade their systems to support these new initiatives.

The most recent Cisco Global Cloud Index predicted that global data center IP traffic would reach nearly 8 zettabytes by the end of the 2017. If this prediction pans out, then traffic will exhibit a 25 percent compound annual growth rate until that time, creating additional challenges for managing and interpreting data.

Accordingly, there will be many business opportunities for storage and networking vendors to help companies manage growing data quantities. IDC recently projected that the overall market for big data will grow six times faster than IT as a whole in 2014, topping $16 billion in value.

The impact of big data on storage infrastructure
As data volumes surge, it's not just operations teams, IT departments and developers that are being impacted. Cloud infrastructure is being pushed to its limits as organizations support demanding applications such as virtual desktop infrastructure, database indexing and online gaming, while also overseeing proper warehousing and archiving of digital assets over the long term.

In verticals such as cable television and broadband Internet media, it's easy to see how the cloud and increasingly sophisticated networks are changing the storage game. Operators not only need enormous capacity in order to store files and objects, but they also have high requirements for compliance and security. There's also the overarching incentive to locate and extract what's valuable and store everything else at the lowest possible cost, if not discard it entirely.

"[W]ith dynamic IP addressing more traffic is generated, with records that need to be stored and maintained," stated UPC Nederland IT business analyst Jan Pince van de Aa, according to ComputerWeekly. "And when you add roaming and hotspots you have a shorter timeframe and lots of traffic, so the question is how to best store it for analysis and compliance purposes."

UPC's database currently holds 5TB, and that volume is expected to double by the end of 2014. Despite this expectation, it's difficult for many organizations to know how much to invest in cloud storage solutions, since data can be amassed through rapid, unpredictable processes, ultimately requiring more space and money.

In this context, managers look for products and services that are cost-effective and flexible. Rather than procure proprietary appliances that are difficult to modify and don't interact well with other infrastructure, they're turning toward software-defined solutions that intelligently partition data and facilitate a vendor-agnostic environment.

Software-defined storage as a way to control big data sprawl
Software-defined storage creates a powerful virtual software layer, with which local and cloud-based data requests can be distributed to a pool of storage resources. The virtual instance controls data flow to a heterogeneous infrastructure that may include many different types of media, usually SSDs for performance and HDDs and/or tape for storage.

Such an arrangement makes for better utilization of storage infrastructure, while enabling data to be pushed to the ideal repository. Organizations achieve granular control over performance and information exchange, plus the versatility of the virtual and physical layers working in tandem reduces the prospect of vendor lock-in. Data can be migrated from one array to another with minimal reliance on the underlying equipment.

What does all of this mean for handling big data projects? For one, SDS enables enterprises to get more from economical, industry-standard cloud hardware, and the high diversification and utilization of storage assets means that they're not always fighting the uphill battle of just buying more disks to keep up with increasing volumes. The latter approach is both costly and counterproductive, since it encourages over-collection of data rather than smart management and tiering of workflows.

"Storage sprawl was actually becoming a bit of an issue," observed Bill Kleyman for Data Center Knowledge. "Just how many more shelves or disks will you need to help support your next-generation data center? The answer doesn't always revolve around the physical. Software-defined storage allows organizations to better manage storage arrays, disks and repositories. This means understanding where IO-intensive data should reside and how to best optimize information delivery."

So SDS means that data sets can be distributed in ways that correspond to their respective I/O requirements. For example, valuable SSD read/write cycles aren't wasted on data that could be easily handled by slower media. Moreover, cloud infrastructure can leverage the intelligence and flexibility of software in order to integrate with other data centers, increasing scalability and giving organizations the ability to economically adjust to rising traffic and get more out of related analytics projects.

Mobile and archiving: Two sides of the big data coin
Building cost-effective and flexible infrastructure is vital because of how much data now passes through the cloud, as well as the number of different devices from which it originates. The latest Cisco Visual Networking Index found that monthly mobile data traffic would surpass 15 exabytes by 2018, and the number of mobile devices will exceed the world's population sometime this year.

Keeping up with mobile traffic is just one part of the modern enterprise's challenges in orchestrating beneficial analytics projects. To stay ahead of the curve, many companies are turning to archiving solutions that leverage inexpensive media for cold storage. Tape has long been a popular option in this space, although disks are gaining ground. Some vendors are working on optical platters that have a shelf life of up to 50 years, easily surpassing that of current-generation linear tape solutions.

This search for the optimal archiving technology underscores how even basic questions of what type of storage medium to use are still relevant to enterprises. Getting a handle on big data, and on the increase in network traffic at-large, is a matter of discovering efficient processes and being smart about hardware procurement. There's no silver bullet solution to creating infrastructure that is well adjusted to the new data landscape, but SDS is a start since it gives stakeholders options for finding what's right for their particular operations.